The following methods and formulas for outlier tests include the Minitab calculations
for Dixon's test statistic and p-value, and Grubb's test statistic and p-value.

Dixon's test determines whether the most extreme value in a sample is an outlier. Dixon's test includes a choice of test statistics that overcome the potential masking effects of other extreme values in the sample. Dixon's test statistic is denoted by *r*_{ij} , where the subscripts *i* and *j* indicate the following:

*i*indicates the number of extreme values on the same side (lower or upper) of the data as the suspected outlier.*i*= 1 or 2.*j*indicates the number of extreme values on the opposite side of the data.*j*= 0, 1, or 2.

For example, if the suspected outlier is the smallest value in the sample, but the sample also includes two unusually large values, then *r*_{12} is the appropriate test statistic. The test statistic *r*_{10} , (also called Dixon's Q), is appropriate when the sample includes only one extreme value.

Critical values for Dixon's test statistics are tabulated in Rorabacher (1991).

The formula for the one-sided test depends on whether you test the smallest value, *y*_{i} , or the largest value, *y*_{n}. To test whether *y*_{i} , is the outlier, use the following formula:

To test whether *y*_{n} , is the outlier, use the following formula:

We define the two-sided test statistic as King (1953) defines the two-sided test statistic related to *r*_{10}. The two-sided test statistic is given by:

Term | Description |
---|---|

r_{ij} | Dixon's test statistic (i = 1, 2; j = 0, 1, 2) |

y_{i} | the i^{th} smallest value in the sample |

n | the number of observations in the sample |

- D.B. Rorabacher (1991). "Statistical Treatment for Rejection of Deviant Values: Critical Values of Dixon Q Parameter and Related Subrange Ratios at the 95 percent Confidence Level," Analytic Chemistry, 83, 2, 139-146.
- E.P. King (1953). "On Some Procedures for the Rejection of Suspected Data," Journal of the American Statistical Association, Vol. 48, No. 263, 531-533.

If you test whether the smallest data value is an outlier, then the test statistic *G* is given by:

If you test whether the largest data value is an outlier, then *G* is given by:

For a two-sided hypothesis, *G* is given by:

Term | Description |
---|---|

the sample mean | |

y_{i} | the i^{th} smallest value in the sample |

s | the standard deviation of the sample |

n | the number of observations in the sample |

Assuming that the data are normally distributed, the Dixon statistics have the same distribution whether you test the smallest value or the largest value. So, without any loss of generality, we may focus on the statistics for detecting outliers in the high end of the data, namely:

According to Dixon (1951) and McBane (2006), the probability density functions of the distribution of the test statistics *r*_{ij} may be written as:

where C is the normalizing factor specified by:

and the Jacobian J(*x*,*v*,*r*) is specified by:

Using the transformation where *t* = (1 + *r*^{2} ) *v*^{2} / 2 and *u*^{2} = 3*x*^{2} / 2, the density function may be rewritten as:

Minitab evaluates the inner integral using a 30-point Gauss-Laguerre quadrature. Minitab evaluates the outer integral using a 30-point Gauss-Hermite quadrature.

The cumulative distribution functions for the family of test statistics are specified by:

Similar to McBane (2006), Minitab calculates *F _{ij}*(

For any pair of subscripts (*i*, *j*), the p-value for the observed one-sided statistic, r, is specified by:

Using King's (1953) result, for any pair of subscripts (*i*, *j*), the p-value for the observed two-sided statistic, *r*, is specified by:

Also, King observes that the above approximation becomes an equality for .

Term | Description |
---|---|

r_{ij} | the Dixon test statistic where i = 1, 2; j = 0, 1, 2 |

y_{i} | the i^{th} smallest value in the sample |

n | the number of observations in the sample |

W.J. Dixon (1951). "Ratios Involving Extreme Values," Annals of Mathematical Statistics, 22(1), 68-78.

E.P. King (1953). "On Some Procedures for the Rejection of Suspected Data," Journal of the American Statistical Association, Vol. 48, No. 263, pages 531-533.

G.C. McBane (2006). "Programs to Compute Distribution Functions and Critical Values for Extreme Value Ratios for Outlier Detection," Journal of Statistical Software, Vol. 16, No. 3, pages 1-9.

The p-value for a one-sided test is:

The p-value for the two-sided test is:

If the following is true, then the p-value is exact.

If not, the calculated p-value represents an upper bound for the exact p-value. However, the upper bound is a very good approximation of the exact p-value.

Term | Description |
---|---|

G | Grubbs' test statistic |

n | the number of observations in the sample |

T | a random variable distributed as a t-distribution with n – 2 degrees of freedom |