Interpret the key results for Binary Logistic Regression

Complete the following steps to interpret a regression analysis. Key output includes the p-value, the odds ratio, R2, and the goodness-of-fit tests.

Step 1: Determine whether the association between the response and the term is statistically significant

To determine whether the association between the response and each term in the model is statistically significant, compare the p-value for the term to your significance level to assess the null hypothesis. The null hypothesis is that the term's coefficient is equal to zero, which indicates that there is no association between the term and the response. Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that an association exists when there is no actual association.
P-value ≤ α: The association is statistically significant
If the p-value is less than or equal to the significance level, you can conclude that there is a statistically significant association between the response variable and the term.
P-value > α: The association is not statistically significant
If the p-value is greater than the significance level, you cannot conclude that there is a statistically significant association between the response variable and the term. You may want to refit the model without the term.
If there are multiple predictors without a statistically significant association with the response, you must reduce the model by removing terms one at a time. For more information on removing terms from the model, go to Model reduction.
If a model term is statistically significant, the interpretation depends on the type of term. The interpretations are as follows:
  • If a continuous predictor is significant, you can conclude that the coefficient for the predictor does not equal zero.
  • If a categorical predictor is significant, you can conclude that not all the level means are equal.
Coefficients
Term
Coef
SE Coef
Z-Value
P-Value
Key Result: P-Value

In these results, the dosage is statistically significant at the significance level of 0.05. You can conclude that changes in the dosage are associated with changes in the probability that the event occurs.

Assess the coefficient to determine whether a change in a predictor variable makes the event more likely or less likely. The relationship between the coefficient and the probability depends on several aspects of the analysis, including the link function. Generally, positive coefficients indicate that the event becomes more likely as the predictor increases. Negative coefficients indicate that the event becomes less likely as the predictor increases. For more information, go to Coefficients and Regression equation.

The coefficient for Dose is 3.63, which suggests that higher dosages are associated with higher probabilities that the event will occur.

Step 2: Understand the effects of the predictors

Use the odds ratio to understand the effect of a predictor.
Odds Ratios for Continuous Predictors
Odds ratios that are greater than 1 indicate that the even is more likely to occur as the predictor increases. Odds ratios that are less than 1 indicate that the event is less likely to occur as the predictor increases.
Odds Ratios for Continuous Predictor
 
Odds Ratio
Key Result: Odds Ratio

In these results, the model uses the dosage level of a medicine to predict the presence or absence of bacteria in adults. The odds ratio indicates that for every 1 mg increase in the dosage level, the likelihood that no bacteria is present increases by approximately 38 times.

Odds Ratios for Categorical Predictors
For categorical predictors, the odds ratio compares the odds of the event occurring at 2 different levels of the predictor. Minitab sets up the comparison by listing the levels in 2 columns, Level A and Level B. Odds ratios that are greater than 1 indicate that the event is more likely at level A. Odds ratios that are less than 1 indicate that the event is less likely at level A.
Odds Ratios for Categorical Predictor
Level A
Level B
Odds Ratio
Yes
Odds ratio for level A relative to level B
Key Result: Odds Ratio

In these results, the response indicates whether a consumer bought a cereal and the categorical predictor indicates whether the consumer saw an advertisement about that cereal. The odds ratio is 3.06, which indicates that the odds that a consumer buys the cereal is 3 times higher for consumers who viewed the advertisement compared to consumers who didn't view the advertisement.

Step 3: Determine how well the model fits your data

To determine how well the model fits your data, examine the statistics in the Model Summary table. For binary logistic regression, the data format affects the deviance R2 statistics but not the AIC. For more information, go to For more information, go to How data formats affect goodness-of-fit in binary logistic regression.

Deviance R-sq

The higher the deviance R2, the better the model fits your data. Deviance R2 is always between 0% and 100%.

Deviance R2 always increases when you add additional predictors to a model. For example, the best 5-predictor model will always have an R2 that is at least as high as the best 4-predictor model. Therefore, deviance R2 is most useful when you compare models of the same size.

For binary logistic regression, the format of the data affects the deviance R2 value. The deviance R2 is usually higher for data in Event/Trial format. Deviance R2 values are comparable only between models that use the same data format.

Deviance R2 is just one measure of how well the model fits the data. Even when a model has a high R2, you should check the residual plots to assess how well the model fits the data.

Deviance R-sq (adj)

Use adjusted deviance R2 to compare models that have different numbers of predictors. Deviance R2 always increases when you add a predictor to the model. The adjusted deviance R2 value incorporates the number of predictors in the model to help you choose the correct model.

AIC
Use AIC to compare different models. The smaller the AIC, the better the model fits the data. However, the model with the smallest AIC does not necessarily fit the data well. Also use the residual plots to assess how well the model fits the data.
Model Summary
Deviance R-sq
Deviance R-sq(adj)
AIC
Key Results: Deviance R-Sq, Deviance R-Sq (adj), AIC

In these results, the model explains 96.04% of the deviance in the response variable. For these data, the Deviance R2 value indicates the model provides a good fit to the data. If additional models are fit with different predictors, use the adjusted Deviance R2 value and the AIC value to compare how well the models fit the data.

Step 4: Determine whether the model does not fit the data

Use the goodness-of-fit tests to determine whether the predicted probabilities deviate from the observed probabilities in a way that the binomial distribution does not predict. If the p-value for the goodness-of-fit test is lower than your chosen significance level, the predicted probabilities deviate from the observed probabilities in a way that the binomial distribution does not predict. This list provides common reasons for the deviation:
  • Incorrect link function
  • Omitted higher-order term for variables in the model
  • Omitted predictor that is not in the model
  • Overdispersion

If the deviation is statistically significant, you can try a different link function or change the terms in the model. If you need to use a different link function, use Fit Binary Logistic Model in Minitab Statistical Software.

For binary logistic regression, the format of the data affects the p-value because it changes the number of trials per row.
  • Deviance: The p-value for the deviance test tends to be lower for data that are in the Binary Response/Frequency format compared to data in the Event/Trial format. For data in Binary Response/Frequency format, the Hosmer-Lemeshow results are more trustworthy.
  • Pearson: The approximation to the chi-square distribution that the Pearson test uses is inaccurate when the expected number of events per row in the data is small. Thus, the Pearson goodness-of-fit test is inaccurate when the data are in Binary Response/Frequency format.
  • Hosmer-Lemeshow: The Hosmer-Lemeshow test does not depend on the number of trials per row in the data as the other goodness-of-fit tests do. When the data have few trials per row, the Hosmer-Lemeshow test is a more trustworthy indicator of how well the model fits the data.
Goodness-of-Fit Tests
Test
DF
Chi-Square
P-Value
Key Results: Deviance Test, Pearson Test, Hosmer-Lemeshow Test

In these results, the goodness-of-fit tests are all greater than the significance level of 0.05, which indicates that there is not enough evidence to conclude that the model does not fit the data.

By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy