Methods and formulas for diagnostic measures in Fit Regression Model and Linear Regression

Select the method or formula of your choice.

In This Topic

Leverages (Hi)
Leverages (Hi) with validation
Cook's distance
DFITS
Variance inflation factor (VIF)
Durbin-Watson statistic

Leverages (Hi)

Leverages are obtained from the hat matrix (H), which is an n x n projection matrix:

The leverage of the i^th observation is the i^th diagonal element, h_i of H. If h_i is large, the i^th observation has unusual predictors (X_1i, X_2i, ..., X_pi). That is, the predictor values are far from the mean vector , using Mahalanobis distance.

Leverage values fall between 0 and 1. Minitab identifies observations with leverages over 3p/n or .99, whichever is smaller, with an X in the table of unusual observations. Usually, you examine values with large leverages.

Notation

Term	Description
X	design matrix
h_i	i^th diagonal element of the hat matrix
p	number of terms in the model, including the constant
n	number of observations

Leverages (Hi) with validation

Formula

With validation data, leverages for each row come from the following formula:

For weighted regression, the formula includes the weight:

Notation

Term	Description
X	design matrix for the rows in the training data set or the folds that act as the training data set
x_i	the vector of predictors in the i^th validation row
w_i	weight for the i^th validation row

Cook's distance

Overall measure, D, of the combined impact across all of the estimated regression coefficients on an observation. Minitab calculates D using leverage values and standardized residuals, and considers whether an observation is unusual with respect to both x- and y-values. Observations with large D values may be outliers.

Formula

Cook's distance is the distance between the coefficients calculated with and without the i ^th observation. Minitab calculates Cook's distance without fitting a new regression equation each time an observation is omitted. This calculation is:

Notation

Term	Description
e_i	i ^th residual
h_i	i ^th diagonal element of
p	number of model parameters, including the constant
s ²	mean square error
b	coefficient vector
b_(i)	coefficient vector calculated after deleting the i ^th observation
X	design matrix

DFITS

Combines leverage and studentized residual (deleted t residuals) values into one overall measure of how unusual an observation is. DFITS measures the influence of each observation on the fitted values in a regression and ANOVA model. Observations with large DFITS values may be outliers.

DFITS represents roughly the number of standard deviations that the fitted value changes when each observation is removed from the data set and the model is refit. Minitab can calculate DFITS without fitting a new regression equation each time an observation is omitted.

Formula

Notation

Term	Description
e_i	i ^th residual
h_i	i ^th diagonal element of
X	design matrix
	i ^th fitted response
	fitted value calculated without the i ^th observation
MSE_(i)	mean square error calculated without the i ^th observation
n	number of observations
p	number of model parameters

Variance inflation factor (VIF)

The VIF can be obtained by regressing each predictor on the remaining predictors and noting the R²value.

Formula

For predictor x_j, the VIF is:

Notation

Term	Description
R²( x_j)	coefficient of determination with x_j as the response variable and the other terms in the model as the predictors

Durbin-Watson statistic

Tests for the presence of autocorrelation in residuals by determining whether or not the correlation between two adjacent error terms is zero. The test is based upon an assumption that errors are generated by a first-order autoregressive process. Minitab assumes that the observations are in a meaningful order, such as time order.

First, Minitab multiplies the residuals by the square root of the weights. If you do not use weights, the value of the weights is 1, and the weighted residuals equal the same value as the normal residuals.

The weighted residuals are used in the following formula:

Notation

Term	Description
e_i	i^th residual
e_{i -1}	residual for the previous observation
n	number of observations