In binary logistic regression, you can enter data in two different formats: Binary Response/Frequency format and Event/Trial format. The format of the data for analysis should usually match the way that you collected the data.
You usually collect and analyze data in Event/Trial format because you can collect many trials at once. For example, an engineer produces a batch of 200 integrated circuits. All the circuits in the batch have to use the same process settings. These 200 circuits are 200 trials. If the engineer collects data on another batch with the same settings, the data are a separate row.
In Event/Trial format, the response variable uses two columns. One column contains the number of successes or events of interest. The other column contains the number of trials.
C1 | C2 | C3 | C4 |
---|---|---|---|
Successes | Trials | Temperature | Raw Material |
180 | 200 | 1500 | Supplier B |
200 | 200 | 1400 | Supplier A |
196 | 200 | 1500 | Supplier A |
197 | 200 | 1400 | Supplier B |
190 | 200 | 1400 | Supplier A |
193 | 200 | 1400 | Supplier B |
198 | 200 | 1500 | Supplier A |
185 | 200 | 1500 | Supplier B |
If the data are in Event/Trial format in the worksheet, but the number of trials per row is small, the trustworthiness and interpretation of the statistics changes. For example, if every row has 1 trial, then the number of events per row is either 0 or 1. The analysis of these data is the same as if you have Binary response/Frequency data without a frequency column.
You usually collect and analyze data in Binary Response/Frequency format because you can record the outcome of each separate trial as the outcome occurs. For example, a marketing consultant surveys consumers as they leave a supermarket about whether the consumer bought a new brand of cereal. As each consumer answers, the consultant records their individual information.
In Binary Response/Frequency format, the response variable uses one column. The response column has only two values, one of which indicates the event and the other of which indicates the nonevent.
C1 | C2 | C3 |
---|---|---|
Bought | Income | Children |
Yes | 37 | Yes |
No | 47 | Yes |
Yes | 34 | No |
Yes | 58 | No |
You can include a frequency column for data in Binary Response/Frequency format. For the clearest interpretation of the residuals versus order plot, combine only consecutive observations. The combination of non-consecutive observations can create or hide patterns on the residuals versus order plot.
C1 | C2 | C3 | C4 |
---|---|---|---|
Bought | Income | Children | Frequency |
Yes | 40 | Yes | 2 |
No | 40 | No | 12 |
Yes | 45 | Yes | 1 |
No | 45 | No | 6 |