Data considerations for balanced ANOVA

To ensure that your results are valid, consider the following guidelines when you collect data, perform the analysis, and interpret your results.

The data should include only categorical factors

If your design contains covariates, use Fit General Linear Model.

The categorical factors can be crossed and nested factors, and fixed and random factors.

For more information on factors, go to Factors and factor levels, What are factors, crossed factors, and nested factors?, and What is the difference between fixed and random factors?.

The design must be balanced unless you have a one-way design
A balanced design has the same number of observations for each treatment combination.

The requirement for balanced data extends to nested factors as well. Suppose A has 3 levels, and B is nested within A. If B has 4 levels within the first level of A, B must have 4 levels within the second and third levels of A. Minitab will tell you if you have unbalanced nesting. The requirement that data be balanced must be preserved after missing data are omitted.

If your design is not balanced, use Fit General Linear Model.

For more information about balanced designs, go to Balanced and unbalanced designs.

Nested factors must use the same set of subscripts
The subscripts used to indicate the 4 levels of B within each level of A must be the same. Thus, the four levels of B cannot be (1 2 3 4) in level 1 of A, (5 6 7 8) in level 2 of A, and (9 10 11 12) in level 3 of A.
The response variable should be continuous
If the response variable is categorical, your model is less likely to meet the assumptions of the analysis, to accurately describe your data, or to make useful predictions.
  • If your response variable has two categories, such as pass and fail, use Fit Binary Logistic Model.
  • If your response variable contains three or more categories that have a natural order, such as strongly disagree, disagree, neutral, agree, and strongly agree, use Ordinal Logistic Regression.
  • If your response variable contains three or more categories that do not have a natural order, such as scratch, dent, and tear, use Nominal Logistic Regression.
  • If your response variable counts occurrences, such as the number of defects, use Fit Poisson Model.
Each observation should be independent from all other observations
If your observations are dependent, your results might not be valid. Consider the following points to determine whether your observations are independent:
  • If an observation provides no information about the value of another observation, the observations are independent.
  • If an observation provides information about another observation, the observations are dependent.
The sample data should be selected randomly

Random samples are used to make generalizations, or inferences, about a population. If your data were not collected randomly, your results might not represent the population.

Collect data using best practices
To ensure that your results are valid, consider the following guidelines:
  • Make certain that the data represent the population of interest.
  • Collect enough data to provide the necessary precision.
  • Measure variables as accurately and precisely as possible.
  • Record the data in the order it is collected.
The model should provide a good fit to the data

If the model does not fit the data, the results can be misleading. In the output, use residual plots and model summary statistics to determine how well the model fits the data.