Coefficients table for Fit Regression Model and Linear Regression

Find definitions and interpretations for every statistic in the Coefficients table.

Coefficients

A regression coefficient describes the size and direction of the relationship between a predictor and the response variable. Coefficients are the numbers by which the values of the term are multiplied in a regression equation.

Interpretation

The coefficient for a term represents the change in the mean response associated with a change in that term, while the other terms in the model are held constant. The sign of the coefficient indicates the direction of the relationship between the term and the response. The size of the coefficient is usually a good way to assess the practical significance of the effect that a term has on the response variable. However, the size of the coefficient does not indicate whether a term is statistically significant because the calculations for significance also consider the variation in the response data. To determine statistical significance, examine the p-value for the term.

The interpretation of each coefficient depends on whether it is a coefficient for a continuous variable or a categorical variable, which is described as follows:
Continuous variable

The coefficient of the term represents the change in the mean response for one unit of change in that term. If the coefficient is negative, as the term increases, the mean value of the response decreases. If the coefficient is positive, as the term increases, the mean value of the response increases.

Categorical variable
A coefficient is listed for each level of the categorical variable except for one (unless you choose to show coefficients for all levels in the Results sub-dialog box). The coefficient for one level of the categorical variable must be set to zero so that the model can be fit. The interpretation of the coefficient for a categorical variable depends on the coding scheme that you choose for categorical variables. The coding scheme can be changed in the Coding sub-dialog box.
  • With the (0, 1) coding scheme, each coefficient represents the difference between each level mean and the reference level mean. The coefficient for the reference level is not displayed in the Coefficients table.
  • With the (−1, 0,+1) coding scheme, each coefficient represents the difference between each level mean and the overall mean.

For example, a manager determines that an employee's score on a job skills test can be predicted using the regression model, y = 130 + 4.3x1 + 10.1x2. In the equation, x1 is the hours of in-house training (from 0 to 20). The variable x2 is a categorical variable that equals 1 if the employee has a mentor and 0 if the employee does not have a mentor. The response is y and is the test score. The coefficient for the continuous variable of training hours, is 4.3, which indicates that, for every hour of training, the mean test score increases by 4.3 points. Using the (0, 1) coding scheme, the coefficient for the categorical variable of mentoring indicates that employees with mentors have scores that are an average of 10.1 points greater than employees without mentors.

Coded coefficients

Minitab can fit linear models using a variety of coding schemes for the continuous variables in the model. These coding schemes can improve the estimation process and the interpretation of the results. In addition, coded units can change the results of the statistical tests used to determine whether each term is a significant predictor of the response. When a model uses coded units, the analysis produces coded coefficients.

Interpretation

The coding method that Minitab uses affects both the estimation and the interpretation of the coded coefficients as follows:
Specify low and high levels to code as -1 and +1
This method both centers and scales the variables. Minitab uses this method in design of experiments (DOE). The coefficients represent the mean change in the response associated with the high and low values that you specified.
Subtract the mean, then divide by the standard deviation
This method both centers and scales the variables. Each coefficient represents the expected change in the response given a change of one standard deviation in the variable.
Subtract the mean
This method centers the variables. Each coefficient represents the expected change in the response given a one unit change in the variable, using the original measurement scale. When you subtract the mean, the constant coefficient is estimating the mean response when all the predictors are at their mean values.
Divide by the standard deviation
This method scales the variables. Each coefficient represents the expected change in the response given a change of one standard deviation in the variable.
Subtract a specified value, then divide by another
The effect and interpretation of this method depends on the values that you enter.

SE Coef

The standard error of the coefficient estimates the variability between coefficient estimates that you would obtain if you took samples from the same population again and again. The calculation assumes that the sample size and the coefficients to estimate would remain the same if you sampled again and again.

Interpretation

Use the standard error of the coefficient to measure the precision of the estimate of the coefficient. The smaller the standard error, the more precise the estimate. Dividing the coefficient by its standard error calculates a t-value. If the p-value associated with this t-statistic is less than your significance level, you conclude that the coefficient is statistically significant.

For example, technicians estimate a model for insolation as part of a solar thermal energy test:

Regression Analysis: Insolation versus South, North, Time of Day

Coefficients

TermCoefSE CoefT-ValueP-ValueVIF
Constant8093772.140.042 
South20.818.652.410.0242.24
North-23.717.4-1.360.1862.17
Time of Day-30.210.8-2.790.0103.86

In this model, North and South measure the position of a focal point in inches. The coefficients for North and South are similar in magnitude. The standard error of the coefficient for South is smaller than the standard error of the coefficient for North. Therefore, the model is able to estimate the coefficient for South with greater precision.

The standard error of the North coefficient is nearly as large as the value of the coefficient itself. The resulting p-value is greater than common levels of the significance level, so you cannot conclude that the coefficient for North differs from 0.

While the coefficient for South is closer to 0 than the coefficient for North, the standard error of the coefficient for South is also smaller. The resulting p-value is smaller than common significance levels. Because the estimate of the coefficient for South is more precise, you can conclude that the coefficient for South differs from 0.

Statistical significance is one criterion you can use to reduce a model in multiple regression. For more information, go to Model reduction.

Confidence Interval for coefficient (95% CI)

These confidence intervals (CI) are ranges of values that are likely to contain the true value of the coefficient for each term in the model.

Because samples are random, two samples from a population are unlikely to yield identical confidence intervals. However, if you take many random samples, a certain percentage of the resulting confidence intervals contain the unknown population parameter. The percentage of these confidence intervals that contain the parameter is the confidence level of the interval.

The confidence interval is composed of the following two parts:
Point estimate
This single value estimates a population parameter by using your sample data. The confidence interval is centered around the point estimate.
Margin of error
The margin of error defines the width of the confidence interval and is determined by the observed variability in the sample, the sample size, and the confidence level. To calculate the upper limit of the confidence interval, the margin of error is added to the point estimate. To calculate the lower limit of the confidence interval, the margin of error is subtracted from the point estimate.

Interpretation

Use the confidence interval to assess the estimate of the population coefficient for each term in the model.

For example, with a 95% confidence level, you can be 95% confident that the confidence interval contains the value of the coefficient for the population. The confidence interval helps you assess the practical significance of your results. Use your specialized knowledge to determine whether the confidence interval includes values that have practical significance for your situation. If the interval is too wide to be useful, consider increasing your sample size.

T-value

The t-value measures the ratio between the coefficient and its standard error.

Interpretation

Minitab uses the t-value to calculate the p-value, which you use to test whether the coefficient is significantly different from 0.

You can use the t-value to determine whether to reject the null hypothesis. However, the p-value is used more often because the threshold for the rejection of the null hypothesis does not depend on the degrees of freedom. For more information on using the t-value, go to Using the t-value to determine whether to reject the null hypothesis.

P-Value – Coefficient

The p-value is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.

Interpretation

To determine whether the association between the response and each term in the model is statistically significant, compare the p-value for the term to your significance level to assess the null hypothesis. The null hypothesis is that the term's coefficient is equal to zero, which implies that there is no association between the term and the response. Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that an association exists when there is no actual association.
P-value ≤ α: The association is statistically significant
If the p-value is less than or equal to the significance level, you can conclude that there is a statistically significant association between the response variable and the term.
P-value > α: The association is not statistically significant
If the p-value is greater than the significance level, you cannot conclude that there is a statistically significant association between the response variable and the term. You may want to refit the model without the term.
If there are multiple predictors without a statistically significant association with the response, you can reduce the model by removing terms one at a time. For more information on removing terms from the model, go to Model reduction.
If a model term is statistically significant, the interpretation depends on the type of term. The interpretations are as follows:
  • If a coefficient for a continuous variable is significant, changes in the value of the variable are associated with changes in the mean response value.
  • If a coefficient for a categorical level is significant, the mean for that level is different from either the overall mean (-1, 0, +1 coding) or the mean for the reference level (0, 1 coding).
  • If a coefficient for an interaction term is significant, the relationship between a factor and the response depends on the other factors in the term. In this case, you should not interpret the main effects without considering the interaction effect.
  • If a coefficient for a polynomial term is significant, you can conclude that the data contain curvature.

VIF

The variance inflation factor (VIF) indicates how much the variance of a coefficient is inflated due to the correlations among the predictors in the model.

Interpretation

Use the VIF to describe how much multicollinearity (which is correlation between predictors) exists in a regression analysis. Multicollinearity is problematic because it can increase the variance of the regression coefficients, making it difficult to evaluate the individual impact that each of the correlated predictors has on the response.

Use the following guidelines to interpret the VIF:
VIF Status of predictor
VIF = 1 Not correlated
1 < VIF < 5 Moderately correlated
VIF > 5 Highly correlated
A VIF value greater than 5 suggests that the regression coefficient is poorly estimated due to severe multicollinearity.

For more information on multicollinearity and how to mitigate the effects of multicollinearity, see Multicollinearity in regression.