Cook's distance (D) measures the effect that an observation has on the set of coefficients in a linear model. Cook's distance considers both the leverage value and the standardized residual of each observation to determine the observation's effect.
Observations with a large D may be considered influential. A commonly used criterion for a large D-value is when D is greater than the median of the F-distribution: F(0.5, p, n-p), where p is the number of model terms, including the constant, and n is the number of observations. Another way to examine the D-values is to compare them to one another using a graph, such as an individual value plot. Observations with large D-values relative to the others may be influential.
Influential observations have a disproportionate effect on the model and can produce misleading results. For example, the inclusion or exclusion of an influential point can change whether a coefficient is statistically significant or not. Influential observations can be leverage points, outliers, or both.
If you see an influential observation, determine whether the observation is a data-entry or measurement error. If the observation is neither a data-entry error nor a measurement error, determine how influential an observation is. First, fit the model with and without the observation. Then, compare the coefficients, p-values, R2, and other model information. If the model changes significantly when you remove the influential observation, examine the model further to determine if you have incorrectly specified the model. You may need to gather more data to resolve the issue.