Coefficients table for Analyze Binary Response for Definitive Screening Design

Coef

A regression coefficient describes the size and direction of the relationship between a predictor and the response variable. Coefficients are the numbers by which the values of the term are multiplied in a regression equation.

Interpretation

Use the coefficient to determine whether a change in a predictor variable makes the event more likely or less likely. The coefficient for a term represents the change in the link function associated with an increase of one coded unit in that term, while the other terms are held constant.

The size of the effect is usually a good way to assess the practical significance of the effect that a term has on the response variable. The size of the effect does not indicate whether a term is statistically significant because the calculations for significance also consider the variation in the response data. To determine statistical significance, examine the p-value for the term.

The relationship between the coefficient and the probability depends on several aspects of the analysis, including the link function, the reference event for the response, and the reference levels for categorical predictors that are in the model. Generally, positive coefficients make the event more likely and negative coefficients make the event less likely. An estimated coefficient near 0 implies that the effect of the predictor is small.

Terms that are not factors, such as the block, do not have high and low levels.
Covariates
The coefficient for a covariate is in the same units as the covariate. The coefficient represents the change in the link function for a one unit increase in the covariate. If the coefficient is negative, as the covariate increases, the probability decreases. If the coefficient is positive, as the covariate increases, the probability increases. Because covariates are not coded and are not usually orthogonal to the factors, the presence of covariates usually increases VIF values. For more information, go to the section on VIF.
Blocks
Blocks are categorical variables with a (−1, 0, +1) coding scheme. Each coefficient represents the difference between the link function for the block and the average value.

Interpretation for the logit link function

The logit link provides the most natural interpretation of the estimated coefficients and is therefore the default link in Minitab. The interpretation uses the fact that the odds of a reference event are P(event)/P(not event) and assumes that the other predictors remain constant. The greater the log odds, the more likely the reference event is. Therefore, positive coefficients indicate that the event becomes more likely and negative coefficients indicate that the event becomes less likely. A summary of interpretations for different types of factors follows.

Continuous factors
The coefficient of a continuous factor is the estimated change in the natural log of the odds for the reference event for each increase of one coded unit in the factor. For example, if each coded unit of a time factor represents a change of 30 seconds, and the coefficient for time is 1.4, then the natural log of the odds increases by 1.4 if you increase the time by 30 seconds.
Estimated coefficients can also be used to calculate the odds ratios, or the ratio between two odds.
Categorical factors
The coefficient of a categorical factor is the estimated change in the natural log of the odds of the event for a change of one coded unit. The difference between the low and high levels of a categorical factor is 2 coded units. For example, a categorical variable has the levels Fast and Slow. Slow is the low level, coded as -1. Fast is the high level, coded as +1. If the coefficient for the variable is 1.3, then a change from Slow to Fast increases the natural log of the odds of the event by 2.6.
Estimated coefficients can also be used to calculate the odds ratio, or the ratio between two odds.

SE Coef

The standard error of the coefficient estimates the variability between coefficient estimates that you would obtain if you took samples from the same population again and again. The calculation assumes that the sample size and the coefficients to estimate would remain the same if you sampled again and again.

Interpretation

Use the standard error of the coefficient to measure the precision of the estimate of the coefficient. The smaller the standard error, the more precise the estimate.

Confidence Interval for coefficient (95% CI)

These confidence intervals (CI) are ranges of values that are likely to contain the true value of the coefficient for each term in the model.

Because samples are random, two samples from a population are unlikely to yield identical confidence intervals. However, if you take many random samples, a certain percentage of the resulting confidence intervals contain the unknown population parameter. The percentage of these confidence intervals that contain the parameter is the confidence level of the interval.

The confidence interval is composed of the following two parts:
Point estimate
This single value estimates a population parameter by using your sample data.
Margin of error
The margin of error defines the width of the confidence interval and is affected by the range of the event probabilities, the sample size, and the confidence level.

Interpretation

Use the confidence interval to assess the estimate of the population coefficient for each term in the model.

For example, with a 95% confidence level, you can be 95% confident that the confidence interval contains the value of the coefficient for the population. The confidence interval helps you assess the practical significance of your results. Use your specialized knowledge to determine whether the confidence interval includes values that have practical significance for your situation. If the interval is too wide to be useful, consider increasing your sample size.

Z-Value

The Z-value is a test statistic for Wald tests that measures the ratio between the coefficient and its standard error.

Interpretation

Minitab uses the Z-value to calculate the p-value, which you use to make a decision about the statistical significance of the terms and the model. The Wald test is accurate when the sample size is large enough that the distribution of the sample coefficients follows a normal distribution.

A Z-value that is sufficiently far from 0 indicates that the coefficient estimate is both large and precise enough to be statistically different from 0. Conversely, a Z-value that is close to 0 indicates that the coefficient estimate is too small or too imprecise to be certain that the term has an effect on the response.

The tests in the Deviance table are likelihood ratio tests. The test in the expanded display of the Coefficients table are Wald approximation tests. The likelihood ratio tests are more accurate for small samples than the Wald approximation tests.

P-Value

The p-value is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.

The tests in the Deviance table are likelihood ratio tests. The test in the expanded display of the Coefficients table are Wald approximation tests. The likelihood ratio tests are more accurate for small samples than the Wald approximation tests.

Interpretation

To determine whether a coefficient is statistically different from 0, compare the p-value for the term to your significance level to assess the null hypothesis. The null hypothesis is that the coefficient equals 0, which implies that there is no association between the term and the response.

Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that the coefficient is not 0 when it is.

P-value ≤ α: The association is statistically significant
If the p-value is less than or equal to the significance level, you can conclude that there is a statistically significant association between the response variable and the term.
P-value > α: The association is not statistically significant
If the p-value is greater than the significance level, you cannot conclude that there is a statistically significant association between the response variable and the term. You may want to refit the model without the term.
If there are multiple predictors without a statistically significant association with the response, you can reduce the model by removing terms one at a time. For more information on removing terms from the model, go to Model reduction.
If a coefficient is statistically significant, the interpretation depends on the type of term. The interpretations are as follows:
Factors
If the coefficient for a factor is significant, you can conclude that the probability of the event is not the same for all levels of the factor.
Interactions among factors
If a coefficient for an interaction term is significant, the relationship between a factor and the response depends on the other factors in the term. In this case, you should not interpret the main effects without considering the interaction effect.
Squared terms
If a coefficient for a squared term is significant, you can conclude that the relationship between the factor and the response follows a curved line.
Covariates
If the coefficient for a covariate is statistically significant, you can conclude that the association between the response and the covariate is statistically significant.
Blocks
If the coefficient for a block is statistically significant, you can conclude that the link function for the block is different from the average value.

Interpretation for the logit link function

The logit link provides the most natural interpretation of the estimated coefficients and is therefore the default link in Minitab. The interpretation uses the fact that the odds of a reference event are P(event)/P(not event) and assumes that the other predictors remain constant. The greater the log odds, the more likely the reference event is. Therefore, positive coefficients indicate that the event becomes more likely and negative coefficients indicate that the event becomes less likely. A summary of interpretations for different types of factors follows.

Continuous factors
The coefficient of a continuous factor is the estimated change in the natural log of the odds for the reference event for each increase of one coded unit in the factor. For example, if each coded unit of a time factor represents a change of 30 seconds, and the coefficient for time is 1.4, then the natural log of the odds increases by 1.4 if you increase the time by 30 seconds.
Estimated coefficients can also be used to calculate the odds ratios, or the ratio between two odds.
Categorical factors
The coefficient of a categorical factor is the estimated change in the natural log of the odds of the event for a change of one coded unit. The difference between the low and high levels of a categorical factor is 2 coded units. For example, a categorical variable has the levels Fast and Slow. Slow is the low level, coded as -1. Fast is the high level, coded as +1. If the coefficient for the variable is 1.3, then a change from Slow to Fast increases the natural log of the odds of the event by 2.6.
Estimated coefficients can also be used to calculate the odds ratio, or the ratio between two odds.

VIF

The variance inflation factor (VIF) indicates how much the variance of a coefficient is inflated due to correlations among the predictors in the model.

Interpretation

Use the VIF to describe how much multicollinearity (which is correlation between predictors) exists in a model. All the VIF values are 1 in most factorial designs, which indicates the predictors have no multicollinearity. The absence of multicollinearity simplifies the determination of statistical significance. The inclusion of covariates in the model and the occurrence of botched runs during data collection are two common ways that VIF values increase, which complicates the interpretation of statistical significance. Also for binary responses, the VIF values are often greater than 1.

Use the following guidelines to interpret the VIF:
VIF Status of predictor
VIF = 1 Not correlated
1 < VIF < 5 Moderately correlated
VIF > 5 Highly correlated
Highly correlated predictors are problematic because the multicollinearity can increase the variance of the regression coefficients. The following are some of the consequences of unstable coefficients:
  • Coefficients can seem to be not statistically significant even when an important relationship exists between the predictor and the response.
  • Coefficients for highly correlated predictors will vary widely from sample to sample.
  • Removing any highly correlated terms from the model will greatly affect the estimated coefficients of the other highly correlated terms. Coefficients of the highly correlated terms can even change direction of the effect.

Be cautious when you use statistical significance to choose terms to remove from a model in the presence of multicollinearity. Add and remove only one term at a time from the model. Monitor changes in the model summary statistics, as well as the tests of statistical significance, as you change the model.