In binary logistic regression, you can enter data in two different formats: Binary Response/Frequency format and Event/Trial format. The trustworthiness and interpretation of some statistics in the output depend on the format of the data. For more information on when to use each data format, go to When to use each data format in binary logistic regression.

For binary logistic regression, the format of the data affects how you interpret the deviance R^{2} and the adjusted deviance R^{2} values. In Event/Trial format, each observed value represents the probability of the event for all the trials in that row of the data. Usually, this probability is for many trials and is between 0 and 1. In contrast, each observation in Binary Response/Frequency format usually represents only 1 trial. The observed value for a single trial is either 1 or 0.

In general, the difference in the data formats makes the total deviance in the data different. For Event/Trial data, the deviance relates to the discrepancy between the predicted probabilities and the observed probabilities. For Binary Response/Frequency format, the deviance relates to the discrepancy between the predicted probabilities and the outcome of 0% or 100% for each trial. The deviance R^{2} and the adjusted deviance R^{2} are usually higher for data in Event/Trial format.

A graphical illustration clarifies the difference. In these plots, symbols represent the observations in the data and the curve represents the predicted values in the model. For Event/Trial data, the symbols fall close to the line. The deviance R^{2} value for the Event/Trial data is about 96%. The model predicts the average probabilities very well.

For the Binary Response/Frequency data, the observations are close to the predicted line only when the line is close to 0% or 100%. The deviance R^{2} value for the Binary Response/Frequency data is about 56%. The relationship between the predicted probabilities and the individual cases is not as strong.

For binary logistic regression, the format of the data affects whether the deviance goodness-of-fit tests is trustworthy. The p-value for the deviance goodness-of-fit test usually decreases as the number of trials per row decreases. Data in the Binary Response/Frequency format usually have few trials per row. Thus, the deviance goodness-of-fit test is likely to indicate a poor fit when the data are in Binary Response/Frequency format even when the fit is good. The deviance goodness-of-fit test is also likely to falsely indicate a poor fit when the data are in Event/Trial format but the number of trials per row is small.

The Hosmer-Lemeshow test does not depend on the format of the data. When the data have few trials per row, the Hosmer-Lemeshow test is a more trustworthy indicator of how well the model fits the data.

Compare these two sets of results for the same data in different formats. For these data, the form of the model is correct. The response information, coefficients, and the results of the Hosmer-Lemeshow test are the same. The conclusion for the deviance goodness-of-fit test depends on the data format.

In these results, the data are in the Binary Response/Frequency format with no frequency column. The analysis uses 500 rows of data. Each row represents 1 trial. At the 0.05 level of significance, the p-value for the deviance goodness-of-fit test indicates that the model fits poorly. This p-value leads to the incorrect conclusion that the model format is incorrect. If you collect data in the Binary Response/Frequency format, the deviance goodness-of-fit test is often not trustworthy.

Method
Link function Logit
Rows used 500

Response Information
Variable Value Count
Y Event 160 (Event)
Non-event 340
Total 500

Coefficients
Term Coef SE Coef VIF
Constant -2.107 0.282
X 0.0904 0.0121 11.97
X*X -0.000889 0.000115 11.97

Goodness-of-Fit Tests
Test DF Chi-Square P-Value
Deviance 497 552.03 0.044
Pearson 497 504.42 0.399
Hosmer-Lemeshow 3 3.76 0.288

In these results, the data are in the Event/Trial format. The analysis uses 5 rows of data. Each row of the data represents 100 trials. At the 0.05 level of significance, the p-value for the deviance goodness-of-fit test does not find evidence of a poorly-fitting model. If you collect data in Event/Trial format, the deviance goodness-of-fit test is usually trustworthy.

Method
Link function Logit
Rows used 5

Response Information
Event
Variable Value Count Name
Event Event 160 Event
Non-event 340
Trial Total 500

Coefficients
Term Coef SE Coef VIF
Constant -2.107 0.282
X 0.0904 0.0121 11.97
X*X -0.000889 0.000115 11.97

Goodness-of-Fit Tests
Test DF Chi-Square P-Value
Deviance 2 3.78 0.151
Pearson 2 3.76 0.152
Hosmer-Lemeshow 3 3.76 0.288

For binary logistic regression, the format of the data affects whether the Pearson goodness-of-fit tests is trustworthy. The approximation to the chi-square distribution that the Pearson test uses is inaccurate when the expected number of events per row is small. Data in the Binary Response/Frequency format usually have few trials per row. Thus, the Pearson goodness-of-fit test is likely to be inaccurate when the data are in Binary Response/Frequency format.

The Hosmer-Lemeshow test does not depend on the format of the data. When the data have few trials per row, the Hosmer-Lemeshow test is a more trustworthy indicator of how well the model fits the data.

Compare these two sets of results for the same data in different formats. For these data, the form of the model is incorrect. The true model contains the interaction between X1 and X2. The response information, coefficients, and the results of the Hosmer-Lemeshow test are the same. The conclusion for the Pearson goodness-of-fit test depends on the data format.

In these results, the data are in the Binary Response/Frequency format with a frequency column. The analysis uses 18 rows of data. Each row represents 250 Bernoulli trials. At the 0.05 level of significance, the p-value for the Pearson goodness-of-fit test indicates that the model fits the data. This p-value leads to the incorrect conclusion that the model is adequate. If you collect data in the Binary Response/Frequency format, the Pearson goodness-of-fit test is not trustworthy.

Method
Link function Logit
Frequency Frequency
Rows used 18

Response Information
Variable Value Count
Y Yes 2645 (Event)
No 1855
Total 4500

Coefficients
Term Coef SE Coef VIF
Constant -0.612 0.110
X1 0.1585 0.0375 1.00
X2 0.3290 0.0377 1.00

Goodness-of-Fit Tests
Test DF Chi-Square P-Value
Deviance 4497 6004.05 0.000
Pearson 4497 4504.84 0.464
Hosmer-Lemeshow 7 66.06 0.000

In these results, the data are in the Event/Trial format. The analysis uses 9 rows of data. Each row of the data represents 500 trials. At the 0.05 level of significance, the p-value for the Pearson goodness-of-fit test does indicates that the model does not fit the data. If you collect data in Event/Trial format, the Pearson goodness-of-fit test is usually trustworthy.

Method
Link function Logit
Rows used 9

Response Information
Event
Variable Value Count Name
Y Event 2645 Yes
Non-event 1855
Trials Total 4500

Coefficients
Term Coef SE Coef VIF
Constant -0.612 0.110
X1 0.1585 0.0375 1.00
X2 0.3290 0.0377 1.00

Goodness-of-Fit Tests
Test DF Chi-Square P-Value
Deviance 6 67.48 0.000
Pearson 6 66.06 0.000
Hosmer-Lemeshow 7 66.06 0.000