Wald Test | |||
---|---|---|---|
Source | DF | Chi-Square | P-Value |
Regression | 1 | 7.83 | 0.005 |
Dose (mg) | 1 | 7.83 | 0.005 |
Term | Coef | SE Coef | Z-Value | P-Value | VIF |
---|---|---|---|---|---|
Constant | -5.25 | 1.99 | -2.64 | 0.008 | |
Dose (mg) | 3.63 | 1.30 | 2.80 | 0.005 | 1.00 |
In these results, the dosage is statistically significant at the significance level of 0.05. You can conclude that changes in the dosage are associated with changes in the probability that the event occurs.
Assess the coefficient to determine whether a change in a predictor variable makes the event more likely or less likely. The relationship between the coefficient and the probability depends on several aspects of the analysis, including the link function. Generally, positive coefficients indicate that the event becomes more likely as the predictor increases. Negative coefficients indicate that the event becomes less likely as the predictor increases. For more information, go to Coefficients and regression equation for Fit Binary Logistic Model.
The coefficient for Dose is 3.63, which suggests that higher dosages are associated with higher probabilities that the event will occur.
If an interaction term is statistically significant, the relationship between a predictor and the response differs by the level of the other predictor. In this case, you should not interpret the main effects without considering the interaction effect. To obtain a better understanding of the main effects, interaction effects, and curvature in your model, go to Factorial Plots and Response Optimizer.
Odds ratios that are greater than 1 indicate that the event is more likely to occur as the predictor increases. Odds ratios that are less than 1 indicate that the event is less likely to occur as the predictor increases.
Unit of Change | Odds Ratio | 95% CI | |
---|---|---|---|
Dose (mg) | 0.5 | 6.1279 | (1.7218, 21.8087) |
In these results, the model uses the dosage level of a medicine to predict the presence or absence of bacteria in adults. In this example, the absence of bacteria is the Event. Each pill contains a 0.5 mg dose, so the researchers use a unit change of 0.5 mg. The odds ratio is approximately 6. For each additional pill that an adult takes, the odds that a patient does not have the bacteria increase by about 6 times.
For categorical predictors, the odds ratio compares the odds of the event occurring at 2 different levels of the predictor. Minitab sets up the comparison by listing the levels in 2 columns, Level A and Level B. Level B is the reference level for the factor. Odds ratios that are greater than 1 indicate that the event is more likely at level A. Odds ratios that are less than 1 indicate that the event is less likely at level A. For information on coding categorical predictors, go to Coding schemes for categorical predictors.
Level A | Level B | Odds Ratio | 95% CI |
---|---|---|---|
Month | |||
2 | 1 | 1.1250 | (0.0600, 21.0834) |
3 | 1 | 3.3750 | (0.2897, 39.3165) |
4 | 1 | 7.7143 | (0.7461, 79.7592) |
5 | 1 | 2.2500 | (0.1107, 45.7172) |
6 | 1 | 6.0000 | (0.5322, 67.6397) |
3 | 2 | 3.0000 | (0.2547, 35.3325) |
4 | 2 | 6.8571 | (0.6556, 71.7169) |
5 | 2 | 2.0000 | (0.0976, 41.0019) |
6 | 2 | 5.3333 | (0.4679, 60.7946) |
4 | 3 | 2.2857 | (0.4103, 12.7323) |
5 | 3 | 0.6667 | (0.0514, 8.6389) |
6 | 3 | 1.7778 | (0.2842, 11.1200) |
5 | 4 | 0.2917 | (0.0252, 3.3719) |
6 | 4 | 0.7778 | (0.1464, 4.1326) |
6 | 5 | 2.6667 | (0.2124, 33.4861) |
In these results, the categorical predictor is the month from the start of a hotel's busy season. The response is whether or not a guest cancels a reservation. In this example, a cancellation is the Event. The largest odds ratio is approximately 7.71, when level A is month 4 and level B is month 1. This indicates that the odds that a guest cancels a reservation in month 4 is approximately 8 times higher than the odds that a guest cancels a reservation in month 1.
For more information, go to Odds Ratios for Fit Binary Logistic Model.
To determine how well the model fits your data, examine the statistics in the Model Summary table.
Many of the model summary and goodness-of-fit statistics are affected by how the data are arranged in the worksheet and whether there is one trial per row or multiple trials per row. The Hosmer-Lemeshow test is unaffected by the data format and is comparable between formats. For more information, go to How data formats affect goodness-of-fit in binary logistic regression.
The higher the deviance R^{2}, the better the model fits your data. Deviance R^{2} is always between 0% and 100%.
Deviance R^{2} always increases when you add additional predictors to a model. For example, the best 5-predictor model will always have an R^{2} that is at least as high as the best 4-predictor model. Therefore, deviance R^{2} is most useful when you compare models of the same size.
For binary logistic regression, the format of the data affects the deviance R^{2} value. The deviance R^{2} is usually higher for data in Event/Trial format. Deviance R^{2} values are comparable only between models that use the same data format.
Goodness-of-fit statistics are just one measure of how well the model fits the data. Even when a model has a desirable value, you should check the residual plots and goodness-of-fit tests to assess how well a model fits the data.
Use adjusted deviance R^{2} to compare models that have different numbers of predictors. Deviance R^{2} always increases when you add a predictor to the model. The adjusted deviance R^{2} value incorporates the number of predictors in the model to help you choose the correct model.
Use AIC, AICc, and BIC to compare different models. For each statistic, smaller values are desirable. However, the model with the smallest value for a set of predictors does not necessarily fit the data well. Also use goodness-of-fit tests and residual plots to assess how well a model fits the data.
The area under the ROC curve values range from 0.5 to 1. When the binary model can perfectly separate the classes, then the area under the curve is 1. When the binary model cannot separate the classes better than a random assignment, then the area under the curve is 0.5.
Deviance R-Sq | Deviance R-Sq(adj) | AIC | AICc | BIC | Area Under ROC Curve |
---|---|---|---|---|---|
96.04% | 91.81% | 10.63 | 14.63 | 10.22 | 0.9398 |
In these results, the model explains 96.04% of the total deviance in the response variable. For these data, the Deviance R^{2} value indicates the model provides a good fit to the data. The area under the ROC curve is 0.9398. This value indicates that the model classifies much of the data correctly. If additional models are fit with different predictors, use the adjusted Deviance R^{2} value, the AIC value, the AICc value, the BIC value, and the area under the ROC curve to compare how well the models fit the data.
If the deviation is statistically significant, you can try a different link function or change the terms in the model.
For binary logistic regression, the format of the data affects the p-value because it changes the number of trials per row.
Variable | Value | Count | Event Name |
---|---|---|---|
Event | Event | 160 | Event |
Non-event | 340 | ||
Trial | Total | 500 |
Test | DF | Chi-Square | P-Value |
---|---|---|---|
Deviance | 2 | 3.78 | 0.151 |
Pearson | 2 | 3.76 | 0.152 |
Hosmer-Lemeshow | 3 | 3.76 | 0.288 |
In these results, the Response Information table shows Event and Trial in the Variable column. These labels indicate that the data are in Event/Trial format. All of the goodness-of-fit tests have p-values higher than the usual significance level of 0.05. The tests do not provide evidence that the predicted probabilities deviate from the observed probabilities in a way that the binomial distribution does not predict.
Variable | Value | Count | |
---|---|---|---|
Y | Event | 160 | (Event) |
Non-event | 340 | ||
Total | 500 |
Test | DF | Chi-Square | P-Value |
---|---|---|---|
Deviance | 497 | 552.03 | 0.044 |
Pearson | 497 | 504.42 | 0.399 |
Hosmer-Lemeshow | 3 | 3.76 | 0.288 |
In these results for the same data, the Response Information table shows Y in the variable column. This label indicates that the data are in Binary Response/Frequency format. The deviance test has a p-value less than the usual significance level of 0.05, but the Hosmer-Lemeshow test is the most trustworthy test. The Hosmer-Lemeshow test does not provide evidence that the predicted probabilities deviate from the observed probabilities in a way that the binomial distribution does not predict.