Methods and formulas for Normality Test

Mean

A commonly used measure of the center of a batch of numbers. The mean is also called the average. It is the sum of all observations divided by the number of (nonmissing) observations.

Formula

Notation

Term	Description
x_i	i^th observation
N	number of nonmissing observations

Standard deviation (StDev)

The sample standard deviation provides a measure of the spread of your data. It is equal to the square root of the sample variance.

Formula

If the column contains x ₁, x ₂,..., x _N, with mean

, then the standard deviation of the sample is:

Notation

Term	Description
x _i	i ^th observation
	mean of the observations
N	number of nonmissing observations

N

Minitab displays the number of nonmissing observations in a sample.

Anderson-Darling

A² measures the area between the fitted line (which is based on the chosen distribution) and the nonparametric step function (which is based on the plot points). The statistic is a squared distance that is weighted more heavily in the tails of the distribution. A small Anderson-Darling value indicates that the distribution fits the data better.

The Anderson-Darling normality test is defined as:

H₀: The data follow a normal distribution

H₁: The data do not follow a normal distribution

Formula

Another quantitative measure for reporting the result of the normality test is the p-value. A small p-value is an indication that the null hypothesis is false.

If you know A² you can calculate the p-value. Let:

Depending on A'², you will calculate p with the following equations:

If 13 > A'² > 0.600 then p = exp(1.2937 - 5.709 * A'² + 0.0186(A'²)²)
If 0.600 > A'² > 0.340 then p = exp(0.9177 - 4.279 * A'² – 1.38(A'²)²)
If 0.340 > A'² > 0.200 then p = 1 – exp(–8.318 + 42.796 * A'² – 59.938(A'²)²)
If A'² <0.200 then p = 1 – exp(–13.436 + 101.14 * A'² – 223.73(A'²)²)

Notation

Term	Description
F(Y_i)	, which is the cumulative distribution function of the standard normal distribution
Y_i	ordered data

Ryan-Joiner

The Ryan-Joiner test provides a correlation coefficient, which indicates the correlation between your data and the normal scores of the order statistics of the data. If the correlation coefficient is near 1, your data falls close to the normal probability plot. If it is less than the appropriate critical value, you will reject the null hypothesis of normality.

Formula

The correlation coefficient is calculated as:

The normal scores of the order statistics have the following definition:

where n is the sample size and i is the rank of the ordered observation. Assign tied observations the average of their ranks. For example, if two tied observations are in positions 5 and 6 in the ordered data, then assign each the rank of 5.5.

The p-value is calculated using the correction factor, which depends on the sample size (n). Use the factor corresponding to your significance level. For example, if α = 0.05, use a cor05.

If n ≥ 50

If n < 50

Then compare the correlation coefficient to the correction factor to determine the p-value:

If R_p > cor10, then p > 0.10.
If cor05 < R_p ≤ cor10, then:
If cor01 < R_p ≤ cor05, then:
If R_p ≤ cor01, then p < 0.01.

Notation

Term	Description
Y_i	ordered observations
b_i	normal scores of the order statistics
s²	sample variance
n	sample size
i	rank of the ordered data

Kolmogorov-Smirnov

Formula

The Kolmogorov-Smirnov test is defined as:

H₀: The data follow a normal distribution
H₁: The data do not follow a normal distribution

The Kolmogorov-Smirnov test statistic is defined as:

To determine the p-value, Minitab uses an adjusted statistic (d^*) which accounts for the sample size (n).

Compare d^* to the following critical values to determine the p-value:

If d^* < 0.775, then p > 0.15.
If 0.775 ≤ d^* < 0.819, then:
If 0.819 ≤ d^* < 0.895, then:
If 0.895 ≤ d^* < 0.995, then:
If 0.995 ≤ d^* < 1.035, then:
If d^* ≥ 1.035, then p < 0.01.

Notation

Term	Description
D⁺	max_i {i / n – Z _(i)}
D^–	max_i {Z _(i) – (i – 1) / n)}
Z	F(X_(i))
F(x)	probability distribution function of the normal distribution
X_(i)	i^th order statistics of a random sample, 1 ≤ i ≤ n
n	sample size

Plot points

In general, the closer the points fall to the fitted line, the better the fit. Minitab provides two goodness-of-fit measures to help assess how the distribution fits your data.

Formula

The table below shows how the middle line is constructed:

Distribution	x coordinate	y coordinate
Normal	x	Φ^–1 _norm

Notation

Term	Description
Φ^–1 _norm	value returned for p by the inverse cdf for the standard normal distribution

Probability plots

The input data are plotted as the x-values. Minitab calculates the probability of occurrence without assuming a distribution. The Y-scale on the graph resembles the Y scale found on normal probability paper where the probabilities plot as a straight line, as if the data are from a normal distribution.

Methods and formulas for Normality Test

In This Topic

Mean

Formula

Notation

Standard deviation (StDev)

Formula

Notation

N

Anderson-Darling

Formula

Notation

Ryan-Joiner

Formula

Notation

Kolmogorov-Smirnov

Formula

Notation

Plot points

Formula

Notation

Probability plots