Multicollinearity in regression is a condition that occurs when some predictor variables in the model are correlated with other predictor variables. Severe multicollinearity is problematic because it can increase the variance of the regression coefficients, making them unstable. The following are some of the consequences of unstable coefficients:

- Coefficients can seem to be insignificant even when a significant relationship exists between the predictor and the response.
- Coefficients for highly correlated predictors will vary widely from sample to sample.
- Removing any highly correlated terms from the model will greatly affect the estimated coefficients of the other highly correlated terms. Coefficients of the highly correlated terms can even have the wrong sign.

To measure multicollinearity, you can examine the correlation structure of the predictor variables. You can also examine the variance inflation factor (VIF), which measures how much the variance of an estimated regression coefficient increases if your predictors are correlated. If the VIF = 1, there is no multicollinearity but if the VIF is > 1, the predictors are correlated. When the VIF is > 5, the regression coefficients are not estimated well. Usually, you should remove highly correlated predictors from the model. Because the predictors supply redundant information, removing them often does not drastically reduce the R^{2}.

If the correlation of a predictor with other predictors is nearly perfect, Minitab displays a message that the term cannot be estimated. The VIF values for terms that cannot be estimated typically exceed one billion.

Multicollinearity does not affect the goodness of fit and the goodness of prediction.

Possible solutions to severe multicollinearity:

- If you are fitting the quadratic or cubic model in simple regression, subtract the mean of the predictor from the predictor values.
- Instead of multiple linear regression, use partial least squares regression or principal components analysis. These methods decrease the number of predictors to a smaller set of uncorrelated components. Minitab Statistical Software contains both methods.
- In multiple linear regression, consider whether to remove highly correlated predictors from the model. When the predictors supply redundant information, R
^{2}does not decrease drastically when you remove correlated predictors. Consider using stepwise regression, best subsets regression, or specialized knowledge of the data set to remove these predictors.

For example, a toy manufacturer wants to predict customer satisfaction and includes "lack of breakage" as a predictor variable in the regression model. The investigator determines that the relationship of this variable to customer satisfaction is curved, so the investigator fits a cubic model. The VIF values for the terms in the cubic model all exceed 5,000, so the investigator worries that multicollinearity affects the results. The investigator follows these steps in Minitab Express to subtract the mean of the predictor from the predictor values:

- Open the standardize dialog box.
- Mac:
- PC:

- In Standardize the following columns, enter lack of breakage.
- In Method, select Subtract the mean.
- Click OK.

After the investigator subtracts the mean, the investigator repeats the analysis with the new predictor. The VIF values fall below 10. Although the VIF values are still large, the investigator feels more confident in the results with lower multicollinearity.