Methods and formulas for Outlier Test

The following methods and formulas for outlier tests include the Minitab calculations for Dixon's test statistic and p-value, and Grubb's test statistic and p-value.

In This Topic

Dixon's test statistics
Grubbs' test statistic
P-values for Dixon's test statistics
P-values for Grubbs' test statistic

Dixon's test statistics

Dixon's test determines whether the most extreme value in a sample is an outlier. Dixon's test includes a choice of test statistics that overcome the potential masking effects of other extreme values in the sample. Dixon's test statistic is denoted by r_ij , where the subscripts i and j indicate the following:

i indicates the number of extreme values on the same side (lower or upper) of the data as the suspected outlier. i = 1 or 2.
j indicates the number of extreme values on the opposite side of the data. j = 0, 1, or 2.

For example, if the suspected outlier is the smallest value in the sample, but the sample also includes two unusually large values, then r₁₂ is the appropriate test statistic. The test statistic r₁₀ , (also called Dixon's Q), is appropriate when the sample includes only one extreme value.

Critical values for Dixon's test statistics are tabulated in Rorabacher (1991).

One-sided test statistics

The formula for the one-sided test depends on whether you test the smallest value, y_i , or the largest value, y_n. To test whether y_i , is the outlier, use the following formula:

To test whether y_n , is the outlier, use the following formula:

Two-sided test statistics

We define the two-sided test statistic as King (1953) defines the two-sided test statistic related to r₁₀. The two-sided test statistic is given by:

Notation

Term	Description
r_ij	Dixon's test statistic (i = 1, 2; j = 0, 1, 2)
y_i	the i^th smallest value in the sample
n	the number of observations in the sample

References

D.B. Rorabacher (1991). "Statistical Treatment for Rejection of Deviant Values: Critical Values of Dixon Q Parameter and Related Subrange Ratios at the 95 percent Confidence Level," Analytic Chemistry, 83, 2, 139-146.
E.P. King (1953). "On Some Procedures for the Rejection of Suspected Data," Journal of the American Statistical Association, Vol. 48, No. 263, 531-533.

Grubbs' test statistic

Formula for the one-sided statistic

If you test whether the smallest data value is an outlier, then the test statistic G is given by:

If you test whether the largest data value is an outlier, then G is given by:

Formula for the two-sided statistic

For a two-sided hypothesis, G is given by:

Notation

Term	Description
	the sample mean
y_i	the i^th smallest value in the sample
s	the standard deviation of the sample
n	the number of observations in the sample

P-values for Dixon's test statistics

Assuming that the data are normally distributed, the Dixon statistics have the same distribution whether you test the smallest value or the largest value. So, without any loss of generality, we may focus on the statistics for detecting outliers in the high end of the data, namely:

Cumulative distribution function for the test statistic

According to Dixon (1951) and McBane (2006), the probability density functions of the distribution of the test statistics r_ij may be written as:

where C is the normalizing factor specified by:

and the Jacobian J(x,v,r) is specified by:

Using the transformation where t = (1 + r² ) v² / 2 and u² = 3x² / 2, the density function may be rewritten as:

Minitab evaluates the inner integral using a 30-point Gauss-Laguerre quadrature. Minitab evaluates the outer integral using a 30-point Gauss-Hermite quadrature.

The cumulative distribution functions for the family of test statistics are specified by:

Similar to McBane (2006), Minitab calculates F_ij(r) using a 16-point Gauss-Legendre quadrature method.

P-value for one-sided test

For any pair of subscripts (i, j), the p-value for the observed one-sided statistic, r, is specified by:

P-value for one-sided test

Using King's (1953) result, for any pair of subscripts (i, j), the p-value for the observed two-sided statistic, r, is specified by:

Also, King observes that the above approximation becomes an equality for .

Notation

Term	Description
r_ij	the Dixon test statistic where i = 1, 2; j = 0, 1, 2
y_i	the i^th smallest value in the sample
n	the number of observations in the sample

References

W.J. Dixon (1951). "Ratios Involving Extreme Values," Annals of Mathematical Statistics, 22(1), 68-78.

E.P. King (1953). "On Some Procedures for the Rejection of Suspected Data," Journal of the American Statistical Association, Vol. 48, No. 263, pages 531-533.

G.C. McBane (2006). "Programs to Compute Distribution Functions and Critical Values for Extreme Value Ratios for Outlier Detection," Journal of Statistical Software, Vol. 16, No. 3, pages 1-9.

P-values for Grubbs' test statistic

Formula for a one-sided test

The p-value for a one-sided test is:

Formula for a two-sided test

The p-value for the two-sided test is:

Exact versus approximate p-values

If the following is true, then the p-value is exact.

If not, the calculated p-value represents an upper bound for the exact p-value. However, the upper bound is a very good approximation of the exact p-value.

Notation

Term	Description
G	Grubbs' test statistic
n	the number of observations in the sample
T	a random variable distributed as a t-distribution with n – 2 degrees of freedom