Validate model assumptions in regression or ANOVA

Regression and ANOVA does not stop when the model is fit. You should examine residual plots and other diagnostic statistics to determine whether your model is adequate and the assumptions of regression are met. If your model is not adequate, it will incorrectly represent your data. For example:
  • The standard errors of the coefficients might be biased, leading to incorrect t- and p-values.
  • Coefficients can have the wrong sign.
  • The model can be affected by one or two points.
Use the following table to determine whether your model is adequate.
Characteristics of an adequate regression model Check using Possible solutions
Functional form accurately models any curvature that is present.

Lack-of-fit-test

Residuals vs variables plot

Add higher-order term to model

Transform variables

Nonlinear regression

Residuals have constant variance.

Residuals vs fits plot

Transform variables

Weighted least squares

Residuals are independent of (not correlated with) each other.

Durbin-Watson statistic

Residuals vs order plot

Add new predictor

Use time series analysis

Add lag variable

Residuals are normally distributed.

Histogram of residuals

Normal plot of residuals

Residuals vs fit plot

Normality test

Transform variables

Check for outliers

No unusual observations or outliers.

Residual plots

Leverages

Cook's distance

DFITS

Transform variables

Remove outlying observation

Data are not ill-conditioned.

Variance inflation factor (VIF)

Correlation matrix of predictors

Remove predictor

Partial least squares regression

Transform variables

Determine why a model does not meet assumptions

If you determine that your model does not meet the previous criteria, you should:
  1. Determine whether your data are entered correctly, especially observations identified as unusual.
  2. Try to determine the cause of the problem. You may want to determine how sensitive your model is to the issue. For example, if you have an outlier, do the regression analysis without that observation and determine how the results differ.
  3. Consider using one of the possible solutions listed earlier.