Data considerations for One-Way ANOVA

To ensure that your results are valid, consider the following guidelines when you collect data, perform the analysis, and interpret your results.

The data should include only one categorical variable that is a fixed factor

For more information on factors, go to Factors and factor levels and Fixed and random factors.

The response variable should be continuous
If the response variable is categorical, your model is less likely to meet the assumptions of the analysis, to accurately describe your data, or to make useful predictions.
  • If your response variable has two categories, such as pass and fail, use Fit Binary Logistic Model.
  • If your response variable contains three or more categories that have a natural order, such as strongly disagree, disagree, neutral, agree, and strongly agree, use Ordinal Logistic Regression.
  • If your response variable contains three or more categories that do not have a natural order, such as scratch, dent, and tear, use Nominal Logistic Regression.
  • If your response variable counts occurrences, such as the number of defects, use Fit Poisson Model.
Sample data should be from a normal population, or each sample should be > 15 or 20

If the sample size is greater than 15 or 20, the test performs very well with skewed and nonnormal distributions. If the sample size is less than 15 or 20, the results might be misleading with nonnormal distributions.

The actual sample size that you need depends on the number of groups in your data, as follows:
  • If you have 2-9 groups, the sample size for each group should be at least 15.
  • If you have 10-12 groups, the sample size for each group should be at least 20.

If you are not confident that your data follow a normal distribution and you do not meet the sample size guidelines, use Kruskal-Wallis Test.

Each observation should be independent from all other observations
If your observations are dependent, your results might not be valid. Consider the following points to determine whether your observations are independent:
  • If an observation provides no information about the value of another observation, the observations are independent.
  • If an observation provides information about another observation, the observations are dependent.

If you have dependent observations, go to Analyzing a repeated measures design. For more information about samples, go to How are dependent and independent samples different.

Collect data using best practices
To ensure that your results are valid, consider the following guidelines:
  • Make certain that the data represent the population of interest.
  • Collect enough data to provide the necessary precision.
  • Measure variables as accurately and precisely as possible.
  • Record the data in the order it is collected.
The model should provide a good fit to the data

If the model does not fit the data, the results can be misleading. In the output, use the residual plots, the diagnostic statistics for unusual observations, and the model summary statistics to determine how well the model fits the data.