A regression coefficient describes the size and direction of the relationship between a predictor and the response variable. Coefficients are the numbers by which the values of the term are multiplied in a regression equation.
Use the coefficient to determine whether a change in a predictor variable makes the event more likely or less likely. The coefficient for a term represents the change in the link function associated with an increase of one coded unit in that term, while the other terms are held constant.
The size of the effect is usually a good way to assess the practical significance of the effect that a term has on the response variable. The size of the effect does not indicate whether a term is statistically significant because the calculations for significance also consider the variation in the response data. To determine statistical significance, examine the p-value for the term.
The relationship between the coefficient and the probability depends on several aspects of the analysis, including the link function, the reference event for the response, and the reference levels for categorical predictors that are in the model. Generally, positive coefficients make the event more likely and negative coefficients make the event less likely. An estimated coefficient near 0 implies that the effect of the predictor is small.
The logit link provides the most natural interpretation of the estimated coefficients and is therefore the default link in Minitab. The interpretation uses the fact that the odds of a reference event are P(event)/P(not event) and assumes that the other predictors remain constant. The greater the log odds, the more likely the reference event is. Therefore, positive coefficients indicate that the event becomes more likely and negative coefficients indicate that the event becomes less likely. A summary of interpretations for different types of factors follows.
The standard error of the coefficient estimates the variability between coefficient estimates that you would obtain if you took samples from the same population again and again. The calculation assumes that the sample size and the coefficients to estimate would remain the same if you sampled again and again.
Use the standard error of the coefficient to measure the precision of the estimate of the coefficient. The smaller the standard error, the more precise the estimate.
These confidence intervals (CI) are ranges of values that are likely to contain the true value of the coefficient for each term in the model.
Because samples are random, two samples from a population are unlikely to yield identical confidence intervals. However, if you take many random samples, a certain percentage of the resulting confidence intervals contain the unknown population parameter. The percentage of these confidence intervals that contain the parameter is the confidence level of the interval.
Use the confidence interval to assess the estimate of the population coefficient for each term in the model.
For example, with a 95% confidence level, you can be 95% confident that the confidence interval contains the value of the coefficient for the population. The confidence interval helps you assess the practical significance of your results. Use your specialized knowledge to determine whether the confidence interval includes values that have practical significance for your situation. If the interval is too wide to be useful, consider increasing your sample size.
The Z-value is a test statistic for Wald tests that measures the ratio between the coefficient and its standard error.
Minitab uses the Z-value to calculate the p-value, which you use to make a decision about the statistical significance of the terms and the model. The Wald test is accurate when the sample size is large enough that the distribution of the sample coefficients follows a normal distribution.
A Z-value that is sufficiently far from 0 indicates that the coefficient estimate is both large and precise enough to be statistically different from 0. Conversely, a Z-value that is close to 0 indicates that the coefficient estimate is too small or too imprecise to be certain that the term has an effect on the response.
The tests in the Deviance table are likelihood ratio tests. The test in the expanded display of the Coefficients table are Wald approximation tests. The likelihood ratio tests are more accurate for small samples than the Wald approximation tests.
The p-value is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.
The tests in the Deviance table are likelihood ratio tests. The test in the expanded display of the Coefficients table are Wald approximation tests. The likelihood ratio tests are more accurate for small samples than the Wald approximation tests.
To determine whether a coefficient is statistically different from 0, compare the p-value for the term to your significance level to assess the null hypothesis. The null hypothesis is that the coefficient equals 0, which implies that there is no association between the term and the response.
Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that the coefficient is not 0 when it is.
The logit link provides the most natural interpretation of the estimated coefficients and is therefore the default link in Minitab. The interpretation uses the fact that the odds of a reference event are P(event)/P(not event) and assumes that the other predictors remain constant. The greater the log odds, the more likely the reference event is. Therefore, positive coefficients indicate that the event becomes more likely and negative coefficients indicate that the event becomes less likely. A summary of interpretations for different types of factors follows.
The variance inflation factor (VIF) indicates how much the variance of a coefficient is inflated due to correlations among the predictors in the model.
Use the VIF to describe how much multicollinearity (which is correlation between predictors) exists in a model. All the VIF values are 1 in most factorial designs, which indicates the predictors have no multicollinearity. The absence of multicollinearity simplifies the determination of statistical significance. The inclusion of covariates in the model and the occurrence of botched runs during data collection are two common ways that VIF values increase, which complicates the interpretation of statistical significance. Also for binary responses, the VIF values are often greater than 1.
VIF | Status of predictor |
---|---|
VIF = 1 | Not correlated |
1 < VIF < 5 | Moderately correlated |
VIF > 5 | Highly correlated |
Be cautious when you use statistical significance to choose terms to remove from a model in the presence of multicollinearity. Add and remove only one term at a time from the model. Monitor changes in the model summary statistics, as well as the tests of statistical significance, as you change the model.