When to use each data format in binary logistic regression

In binary logistic regression, you can enter data in two different formats: Binary Response/Frequency format and Event/Trial format. The format of the data for analysis should usually match the way that you collected the data.

Why use data in event/trial format?

You usually collect and analyze data in Event/Trial format because you can collect many trials at once. For example, an engineer produces a batch of 200 integrated circuits. All the circuits in the batch have to use the same process settings. These 200 circuits are 200 trials. If the engineer collects data on another batch with the same settings, the data are a separate row.

Example of event/trial format

In Event/Trial format, the response variable uses two columns. One column contains the number of successes or events of interest. The other column contains the number of trials.

In this worksheet, Successes contains the number of events, which indicates how many circuits passed an electrical function test. Trials contains the number of trials, which indicates the total number of chips that were produced for that combination of predictor variables. Temperature is a continuous predictor. Raw Material is a categorical predictor. The first row in the worksheet shows a batch of 200 chips made at a temperature of 1500 from raw material from Supplier B. 180 of these circuits passed the electrical function test.
C1 C2 C3 C4
Successes Trials Temperature Raw Material
180 200 1500 Supplier B
200 200 1400 Supplier A
196 200 1500 Supplier A
197 200 1400 Supplier B
190 200 1400 Supplier A
193 200 1400 Supplier B
198 200 1500 Supplier A
185 200 1500 Supplier B

Note

If the data are in Event/Trial format in the worksheet, but the number of trials per row is small, the trustworthiness and interpretation of the statistics changes. For example, if every row has 1 trial, then the number of events per row is either 0 or 1. The analysis of these data is the same as if you have Binary response/Frequency data without a frequency column.

Why use data in binary response/frequency format?

You usually collect and analyze data in Binary Response/Frequency format because you can record the outcome of each separate trial as the outcome occurs. For example, a marketing consultant surveys consumers as they leave a supermarket about whether the consumer bought a new brand of cereal. As each consumer answers, the consultant records their individual information.

Example of data in binary response/frequency format

In Binary Response/Frequency format, the response variable uses one column. The response column has only two values, one of which indicates the event and the other of which indicates the nonevent.

In this worksheet, Bought is the response and indicates whether a consumer purchased a new brand of cereal. The response event is Yes. Income is a continuous predictor and Children is a categorical predictor. The first row in the worksheet shows that the first consumer the consultant asked had children, had an income of $37,000, and bought the new brand of cereal.
C1 C2 C3
Bought Income Children
Yes 37 Yes
No 47 Yes
Yes 34 No
Yes 58 No

You can include a frequency column for data in Binary Response/Frequency format. For the clearest interpretation of the residuals versus order plot, combine only consecutive observations. The combination of non-consecutive observations can create or hide patterns on the residuals versus order plot.

In this worksheet, the response and predictor variables are the same as the previous example but the data also include a frequency variable. Frequency contains the count of consumers that correspond to the combination of response and predictor values in each row. The first row in the worksheet shows that 2 consumers with children and with an income of $40,000 bought the new brand of cereal. If these were not the first two consumers in the survey, then the order of the data in the worksheet differs from the order of collection. Patterns on the residuals versus order plot can be hidden or meaningless for the reordered data.
C1 C2 C3 C4
Bought Income Children Frequency
Yes 40 Yes 2
No 40 No 12
Yes 45 Yes 1
No 45 No 6