Data considerations for Kruskal-Wallis Test

To ensure that your results are valid, consider the following guidelines when you collect data, perform the analysis, and interpret your results.

Note

If you use a parametric analysis as an alternative to the Kruskal-Wallis test, you should verify that your data meets the data requirements of that analysis. The data requirements for parametric analyses are not always compatible with the requirements for nonparametric analyses, such as the Kruskal-Wallis test.

The data should include only one categorical variable that is a fixed factor

For more information on factors, go to Factors and factor levels and Fixed and random factors.

The response variable should be continuous
If the response variable is categorical, your model is less likely to meet the assumptions of the analysis, to accurately describe your data, or to make useful predictions.
  • If your response variable has two categories, such as pass and fail, use Fit Binary Logistic Model.
  • If your response variable contains three or more categories that have a natural order, such as strongly disagree, disagree, neutral, agree, and strongly agree, use Ordinal Logistic Regression.
  • If your response variable contains three or more categories that do not have a natural order, such as scratch, dent, and tear, use Nominal Logistic Regression.
  • If your response variable counts occurrences, such as the number of defects, use Fit Poisson Model.
The sample data do not need to be normally distributed
The distributions of the groups should have the same shape and spread, and contain no outliers.
  • If the distributions of the groups include outliers, use Mood’s Median Test.
  • If the distributions of the groups are normally distributed, consider using One-Way ANOVA because it has more power.
The sample size should be less than 15 or 20 observations or your process should be better represented by the median

Nonparametric tests tend to have less power than parametric tests. Also, parametric tests can perform well with nonnormal data given a sufficiently large sample size. Consider using a parametric test even with nonnormal data unless your sample size is very small or if the median is more meaningful for your study.

If your data meet the following sample size guidelines, consider using One-Way ANOVA because it will perform very well with skewed and nonnormal distributions, and it has more power.
  • The data contain 2–9 groups and the sample size for each group is at least 15.
  • The data contain 10–12 groups and the sample size for each group is at least 20.
The sample size for each group should be at least five
If a sample has fewer than five observations, the p-value can be inaccurate.
Each observation should be independent from all other observations
If your observations are dependent, your results might not be valid. Consider the following points to determine whether your observations are independent:
  • If an observation provides no information about the value of another observation, the observations are independent.
  • If an observation provides information about another observation, the observations are dependent.

If you have a dependent observations, go to Analyzing a repeated measures design. For more information about samples, go to How are dependent and independent samples different?.

Collect data using best practices
To ensure that your results are valid, consider the following guidelines:
  • Make certain that the data represent the population of interest.
  • Collect enough data to provide the necessary precision.
  • Measure variables as accurately and precisely as possible.
  • Record the data in the order it is collected.