A leverage (Hi) measures the distance from an observation's x-value to the average of the x-values for all observations in a data set. Use to identify observations that have unusual predictor values compared to the remaining data.
Observations with large leverage can have a large effect on the fitted value, and thus the regression model. For example, an observation that has a large leverage can cause a significant coefficient to seem insignificant. However, not all leverage points are unusual observations.
Investigate observations with leverage values greater than 3p/n, where p is the number of model terms (including the constant) and n is the number of observations. Minitab identifies observations with leverage values greater than 3p/n or .99, whichever is smaller, with an X in the table of unusual observations.
Geometrically, Cook's distance is a measure of the distance between the fitted values calculated with and without the ith observation. Use to identify observations that have unusual predictor values compared to the remaining data and observations that the model does not fit well. Observations with large Cook's Distances can have a large effect on the fitted value, and thus the regression model.
Investigate observations where D is greater than F(0.5, p, n-p), the median of an F-distribution, where p is the number of model terms (including the constant) and n is the number of observations. A different way to examine distance values is to compare distance values to each other graphically, using a line plot. Observations with large distance values relative to other observations can be influential.
DFITS represents approximately the number of standard deviations that the fitted value changes when each observation is removed from the data set and the model is refit. Use to identify observations that have unusual predictor values compared to the remaining data and observations that the model does not fit well. Observations with large DFITS values can have a large effect on the fitted value, and thus the regression model.
Investigate observations with DFITS values greater than 2*sqrt(p / n), where p is the number of model terms (including the constant) and n is the number of observations. A different way to examine DFITS values is to compare DFITS values to each other graphically, using a time series plot or a line plot. Observations with large DFITS values relative to other observations can be influential.
To determine how much effect the unusual observation has, you can fit the model with and without the observation and compare the coefficients, p-values, R2, and other model information. If the model changes significantly when you remove the unusual observation, first, determine whether the observation is a data entry or measurement error. If not, determine whether you omitted an important term (for example, an interaction term) or variable, or have incorrectly specified the model. You might need to collect more data to determine a resolution.