Methods and formulas for One-Way ANOVA

Select the method or formula of your choice.

Mean squares (MS)

Formula

The calculation for the mean square for the factor follows:

The calculation for the mean square for error follows:

Notation

TermDescription
MSMean Square
SSSum of Squares
DFDegrees of Freedom

Sum of Squares (SS)

Formula

The sum of squared distances. SS Total is the total variation in the data. SS (Factor) is the deviation of the estimated factor level mean around the overall mean. It is also known as the sum of squares between treatments. SS Error is the deviation of an observation from its corresponding factor level mean. It is also known as error within treatments.

The calculations are:

Notation

TermDescription
i . mean of the observations at the i th factor level
y̅.. mean of all observations
yij value of the j th observation at the i th factor level

Degrees of freedom (DF)

Formula

Indicates the number of independent elements in the sum of squares. The degrees of freedom for each component of the model are:
  • DF (Factor) = r – 1
  • DF Error = nTr
  • Total = nT – 1

Notation

TermDescription
nT total number of observations
r number of factor levels

F-value

Formula

The degrees of freedom for the numerator are r – 1. The degrees of freedom for the denominator are nTr.

Notation

TermDescription
nT total number of observations
r number of factor levels

Grouping information table for multiple comparisons with a control

Minitab uses the confidence interval results for the difference between each level mean and the control level to obtain the grouping information. The grouping information is in a matrix with one column.

Minitab assigns the letter "A" to the control level.

If an interval contains 0, then the level mean is in the same group as the control level. Minitab assigns the letter "A" to the level mean.

If an interval does not contain zero, then no letter is assigned.

Grouping information table for multiple pairwise comparisons

Minitab uses the confidence interval results for the difference between two level means to obtain the grouping information. The grouping information is in a matrix. Suppose a term has k levels, then the maximum dimension of the matrix is k x k. If all levels are in one group, then the dimension is k x 1, with letter "A" for all factor levels. If all levels are in different groups, the dimension is k x k with letters on the diagonal only.

Minitab uses this algorithm to determine the content of the matrix:
  1. Sort all the least square means at different levels of a term in descending order, denoted as 1, 2, ... , k.
  2. Define a k x k matrix with value 0 in every cell where k = the number of factor levels.
  3. For column j, where j = 1, ... , Minitab does the following:
    1. Checks the confidence intervals of mean j – mean r, where r = j + 1, .. , k. If the interval for r contains 0, set the rth row and the jth column cell, j) as 1.
    2. Sets (j, j) cell as 1 if at least one cell in column j has value 1.
    3. Calculates the row sums from column 1 to column j for row I = j + 1, ... , k. If min (all row sums) >= 1, terminate the loop; otherwise, increase j by 1 and go to step a.
  4. For every row i, Minitab checks the sum of all column values for the row ≥ 1. If the sum is zero, set the cell of row i and column j = 1, where column j is the first column in the matrix with 0 values. This procedure produces a matrix with values 1 and 0. The total number of groups is the number of nonzero columns.
  5. Minitab matches letters to columns (e.g. A to Column 1, B to Column 2, etc.) and assigns cells with value 1 the correct letter.

Individual confidence intervals

Formula

The confidence intervals are calculated for each factor level mean using the pooled standard deviation. The formula is:

Notation

TermDescription
nT total number of observations
r number of factor levels
S pooled standard deviation
inverse cumulative distribution function from the t distribution at 1− α/2 with nTr degrees of freedom

Mean

Formula

The average of the observations at a given factor level.

Notation

TermDescription
ni number of observations at factor level i
yij value of the j th observation at the i th factor level

Multiple comparisons

Minitab offers four different confidence interval methods for comparing multiple factor means in one-way analysis of variance when you assume equal variances between the groups: Tukey's, Fisher's, Dunnett's, and Hsu's MCB. The formulas for the confidence intervals follow.

Notation

TermDescription
sample mean for the ith factor level
ninumber of observations in level i
r number of factor levels
spooled standard deviation or sqrt(MSE)
nTtotal number of observations
αprobability of making a Type I error

Tukey

where Q = (1 − α) percentile of the studentized range distribution with r number of factor levels and nT- r degrees of freedom.

Fisher

where t = (1 − α/2) percentile of the Student's t-distribution with nTr degrees of freedom.

Dunnett

To see how d is calculated, refer to page 63 in Hsu1.

Hsu's MCB

We give formulas for the case where all group sizes are equal to n. Formulas for unequal group sizes are found in Hsu1. Suppose you chose the best to be the largest mean, and you want the confidence interval for the ith mean minus the largest of the others.

The lower endpoint is the smaller of zero and the formula that follows:

The upper endpoint is the larger of zero and the formula that follows:

To see how d is calculated, refer to page 83 in Hsu1.

When the best is the smallest of the level means, the formulas are the same, except that max is replaced by min.

Acknowledgment

We are very grateful for assistance in the design and implementation of multiple comparisons from Jason C. Hsu.

References

  1. J.C. Hsu (1996). Multiple Comparisons, Theory and methods. Chapman & Hall.

Pooled standard deviation

Formula

The common variance for all observations. The pooled variance is:
The pooled standard deviation is the square root of the above formula. An equivalent form follows:

The pooled standard deviation is equivalent to S, which is in the output and is equal to:

Notation

TermDescription
yij j th observation of the response for the i th factor level
sample mean for factor level i
ni number of observations for the ith factor level
nT total number of observations
si 2 variance of observations
r number of levels of the factor
MSMean Square

P-value

Used in hypothesis tests to help you decide whether to reject or fail to reject a null hypothesis. The p-value is the probability of obtaining a test statistic that is at least as extreme as the actual calculated value, if the null hypothesis is true. A commonly used cut-off value for the p-value is 0.05. For example, if the calculated p-value of a test statistic is less than 0.05, you reject the null hypothesis.

Residuals

Notation

TermDescription
yijjth value of the response for the ith factor
jth fitted value for the ith factor
mean of the response for the ith factor

R-sq

Another presentation of the formula is:

R2 can also be calculated as the squared correlation of y and .

Notation

TermDescription
SSSum of Squares
yresponse variable
fitted response variable

R-sq (adj)

Notation

TermDescription
MSMean Square
SSSum of Squares
DFDegrees of Freedom

R-sq (pred)

While the calculations for R2(pred) can produce negative values, Minitab displays zero for these cases.

Notation

TermDescription
yi i th observed response value
mean response
n number of observations
ei i th residual
hi i th diagonal element of X(X'X)–1X'
X design matrix

S

An estimate of σ, the measure of the within-sample standard deviation. Note that S2 = MS Error. This is equivalent to the pooled standard deviation used in calculating the individual confidence intervals.

Standard deviation (StDev)

Notation

TermDescription
yij observations at the i th factor level
mean of observations at the i th factor level
ni number of observations at the i th factor level
By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy