# Prediction

The way that you predict with the model depends on how you created the model.
• If you create the model with Fit Binary Logistic Model, choose Stat > Regression > Binary Logistic Regression > Predict.
• If you create the model with Discover Best Model (Binary Response), click Predict in the results.

The two methods produce minor differences in the results. For example, if you store the results with either method, the prediction statistics are in the worksheet but the version with Discover Best Model (Binary Response) also displays the regression equation in the output pane. The version with Fit Binary Logistic Model can include the standard error of the fit and the confidence interval for the fit. With either method, the results in the output pane include the regression equation, the settings for the predictors, and the Prediction table.

## Regression equation

For binary logistic regression, Minitab shows two types of regression equations. The first equation relates the probability of the event to the transformed response. The form of the first equation depends on the link function.

The second equation relates the predictors to the transformed response. If the model contains both continuous and categorical predictors, the second equation can be separated for each combination of categories.

### Interpretation

Use the equations to examine the relationship between the response and the predictor variables.

For example, a model to predict whether a customer buys a product has these terms:
• Customer's income
• Whether a customer has children
• Interaction between the two predictors

The first equation shows the relationship between the probability and the transformed response because of the logit link function.

The second equations show how income and whether a customer has children relate to the transformed response. When the customer does not have children, the coefficient for income is about 0.04. When the customer has children, the coefficient is about 0.02. For these equations, the more income a customer has, the more likely they are to buy the product. However, income has a stronger effect on whether the customer buys the product when the customer does not have children.

### Binary Logistic Regression: Bought versus Income, Children

Regression Equation in Uncoded Units P(1) = exp(Y')/(1 + exp(Y'))
Children No Y' = -3.549 + 0.04296 Income Yes Y' = -1.076 + 0.01565 Income

If your model is nonhierarchical and you standardized the continuous predictors, then the regression equation is in coded units. For more information, see the section on Coded Coefficients. For more information about hierarchy, go to What are hierarchical models?.

## Variable settings

Minitab uses the regression equation and the variable settings to calculate the fit. If you create the model with Fit Binary Logistic Model and the variable settings are unusual compared to the data that was used to estimate the model, a warning is displayed below the prediction.

Use the variable settings table to verify that you performed the analysis as you intended.

## Fitted Probability or Class Probabilities

When you create the model with Discover Best Model (Binary Response), the Prediction table shows an observation number, the predicted class, and the probability for membership in each class. When you create the model with Fit Binary Logistic Model, the Prediction table includes the Fitted Probability.

The event probability is the chance that a specific outcome or event occurs. The event probability estimates the likelihood of an event occurring, such as drawing an ace from a deck of cards or manufacturing a non-conforming part. The probability of an event ranges from 0 (impossible) to 1 (certain).

### Interpretation

In binary logistic regression, a response variable has only two possible values, such as the presence or absence of a particular disease. The event probability is the likelihood that the response for a given factor or covariate pattern is 1 for an event (for example, the likelihood that a woman over 50 will develop type-2 diabetes).

Each performance in an experiment is called a trial. For example, if you flip a coin 10 times and record the number of heads, you perform 10 trials of the experiment. If the trials are independent and equally likely, you can estimate the event probability by dividing the number of events by the total number of trials. For example, if you flip 6 heads out of 10 coin tosses, the estimated probability of the event (flipping heads) is:

Number of events ÷ Number of trials = 6 ÷ 10 = 0.6

## SE Fit

The SE Fit is in the prediction table when you create the model with Fit Binary Logistic Model. The standard error of the fit (SE fit) estimates the variation in the estimated mean response for the specified variable settings. The calculation of the confidence interval for the mean response uses the standard error of the fit. Standard errors are always non-negative.

### Interpretation

Use the standard error of the fit to measure the precision of the estimate of the mean response. The smaller the standard error, the more precise the predicted mean response. For example, an analyst develops a model to predict delivery time. For one set of variable settings, the model predicts a mean delivery time of 3.80 days. The standard error of the fit for these settings is 0.08 days. For a second set of variable settings, the model produces the same mean delivery time with a standard error of the fit of 0.02 days. The analyst can be more confident that the mean delivery time for the second set of variable settings is close to 3.80 days.

With the fitted value, you can use the standard error of the fit to create a confidence interval for the mean response. For example, depending on the number of degrees of freedom, a 95% confidence interval extends approximately two standard errors above and below the predicted mean. For the delivery times, the 95% confidence interval for the predicted mean of 3.80 days when the standard error is 0.08 is (3.64, 3.96) days. You can be 95% confident that the population mean is within this range. When the standard error is 0.02, the 95% confidence interval is (3.76, 3.84) days. The confidence interval for the second set of variable settings is narrower because the standard error is smaller.

## Confidence interval for fit (95% CI)

The confidence interval for the fit is in the prediction table when you create the model with Fit Binary Logistic Model. These confidence intervals (CI) are ranges of values that are likely to contain the event probability for the population that has the observed values of the predictor variables that are in the model.

Because samples are random, two samples from a population are unlikely to yield identical confidence intervals. But, if you sample many times, a certain percentage of the resulting confidence intervals contain the unknown population parameter. The percentage of these confidence intervals that contain the parameter is the confidence level of the interval.

The confidence interval is composed of the following two parts:
Point estimate
The point estimate is the estimate of the parameter that is calculated from the sample data.
Margin of error
The margin of error defines the width of the confidence interval and is affected by the range of the event probabilities, the sample size, and the confidence level.

### Interpretation

Use the confidence interval to assess the estimate of the fitted value for the observed values of the variables.

For example, with a 95% confidence level, you can be 95% confident that the confidence interval contains the event probability for the specified values of the variables in the model. The confidence interval helps you assess the practical significance of your results. Use your specialized knowledge to determine whether the confidence interval includes values that have practical significance for your situation. If the interval is too wide to be useful, consider increasing your sample size.