The two methods produce minor differences in the results. For example, if you store the results with either method, the prediction statistics are in the worksheet but the version with Discover Best Model (Binary Response) also displays the regression equation in the output pane. The version with Fit Binary Logistic Model can include the standard error of the fit and the confidence interval for the fit. With either method, the results in the output pane include the regression equation, the settings for the predictors, and the Prediction table.
For binary logistic regression, Minitab shows two types of regression equations. The first equation relates the probability of the event to the transformed response. The form of the first equation depends on the link function.
The second equation relates the predictors to the transformed response. If the model contains both continuous and categorical predictors, the second equation can be separated for each combination of categories.
Use the equations to examine the relationship between the response and the predictor variables.
The first equation shows the relationship between the probability and the transformed response because of the logit link function.
The second equations show how income and whether a customer has children relate to the transformed response. When the customer does not have children, the coefficient for income is about 0.04. When the customer has children, the coefficient is about 0.02. For these equations, the more income a customer has, the more likely they are to buy the product. However, income has a stronger effect on whether the customer buys the product when the customer does not have children.
If your model is nonhierarchical and you standardized the continuous predictors, then the regression equation is in coded units. For more information, see the section on Coded Coefficients. For more information about hierarchy, go to What are hierarchical models?.
Minitab uses the regression equation and the variable settings to calculate the fit. If you create the model with Fit Binary Logistic Model and the variable settings are unusual compared to the data that was used to estimate the model, a warning is displayed below the prediction.
Use the variable settings table to verify that you performed the analysis as you intended.
When you create the model with Discover Best Model (Binary Response), the Prediction table shows an observation number, the predicted class, and the probability for membership in each class. When you create the model with Fit Binary Logistic Model, the Prediction table includes the Fitted Probability.
The event probability is the chance that a specific outcome or event occurs. The event probability estimates the likelihood of an event occurring, such as drawing an ace from a deck of cards or manufacturing a non-conforming part. The probability of an event ranges from 0 (impossible) to 1 (certain).
In binary logistic regression, a response variable has only two possible values, such as the presence or absence of a particular disease. The event probability is the likelihood that the response for a given factor or covariate pattern is 1 for an event (for example, the likelihood that a woman over 50 will develop type-2 diabetes).
Each performance in an experiment is called a trial. For example, if you flip a coin 10 times and record the number of heads, you perform 10 trials of the experiment. If the trials are independent and equally likely, you can estimate the event probability by dividing the number of events by the total number of trials. For example, if you flip 6 heads out of 10 coin tosses, the estimated probability of the event (flipping heads) is:
Number of events ÷ Number of trials = 6 ÷ 10 = 0.6
The SE Fit is in the prediction table when you create the model with Fit Binary Logistic Model. The standard error of the fit (SE fit) estimates the variation in the estimated mean response for the specified variable settings. The calculation of the confidence interval for the mean response uses the standard error of the fit. Standard errors are always non-negative.
Use the standard error of the fit to measure the precision of the estimate of the mean response. The smaller the standard error, the more precise the predicted mean response. For example, an analyst develops a model to predict delivery time. For one set of variable settings, the model predicts a mean delivery time of 3.80 days. The standard error of the fit for these settings is 0.08 days. For a second set of variable settings, the model produces the same mean delivery time with a standard error of the fit of 0.02 days. The analyst can be more confident that the mean delivery time for the second set of variable settings is close to 3.80 days.
With the fitted value, you can use the standard error of the fit to create a confidence interval for the mean response. For example, depending on the number of degrees of freedom, a 95% confidence interval extends approximately two standard errors above and below the predicted mean. For the delivery times, the 95% confidence interval for the predicted mean of 3.80 days when the standard error is 0.08 is (3.64, 3.96) days. You can be 95% confident that the population mean is within this range. When the standard error is 0.02, the 95% confidence interval is (3.76, 3.84) days. The confidence interval for the second set of variable settings is narrower because the standard error is smaller.
The confidence interval for the fit is in the prediction table when you create the model with Fit Binary Logistic Model. These confidence intervals (CI) are ranges of values that are likely to contain the event probability for the population that has the observed values of the predictor variables that are in the model.
Because samples are random, two samples from a population are unlikely to yield identical confidence intervals. But, if you sample many times, a certain percentage of the resulting confidence intervals contain the unknown population parameter. The percentage of these confidence intervals that contain the parameter is the confidence level of the interval.
Use the confidence interval to assess the estimate of the fitted value for the observed values of the variables.
For example, with a 95% confidence level, you can be 95% confident that the confidence interval contains the event probability for the specified values of the variables in the model. The confidence interval helps you assess the practical significance of your results. Use your specialized knowledge to determine whether the confidence interval includes values that have practical significance for your situation. If the interval is too wide to be useful, consider increasing your sample size.