The average rank is the average of the ranks for all observations within each sample. Minitab uses the average rank to calculate the H statistic, which is the test statistic for the Kruskal-Wallis test.
To calculate the average rank, Minitab ranks the combined samples. Minitab assigns the smallest observation a rank of 1, the second smallest observation a rank of 2, and so on. If two or more observations are tied, Minitab assigns the average rank to each tied observation. Minitab calculates the average rank for each sample.
When a group's average rank is higher than the overall average rank, the observation values in that group tend to be higher than those of the other groups.
A boxplot provides a graphical summary of the distribution of each sample. The boxplot makes it easy to compare the shape, the central tendency, and the variability of the samples.
Use a boxplot to examine the spread of the data and to identify any potential outliers. Boxplots are best when the sample size is greater than 20.
Examine the spread of your data to determine whether your data appear to be skewed. When data are skewed, the majority of the data are located on the high or low side of the graph. Skewed data indicates that the data might not be normally distributed. Often, skewness is easiest to detect with an individual value plot, a histogram, or a boxplot.
Data that are severely skewed can affect the validity of the p-value if your sample is small (< 20 values). If your data are severely skewed and you have a small sample, consider increasing your sample size.
Outliers, which are data values that are far away from other data values, can strongly affect your results. Often, outliers are easiest to identify on a boxplot.
Try to identify the cause of any outliers. Correct any data-entry errors or measurement errors. Consider removing data values for abnormal, one-time events (special causes). Then, repeat the analysis.
The degrees of freedom (DF) equals the number of groups in your data minus 1. Under the null hypothesis, chi-square distribution approximates the distribution of the test statistic, with the specified degrees of freedom. Minitab uses the chi-square distribution to estimate the p-value for this test.
H is the test statistic for the Kruskal-Wallis test. Under the null hypothesis, the chi-square distribution approximates the distribution of H. The approximation is reasonably accurate when no group has fewer than five observations.
Minitab uses the test statistic to calculate the p-value, which you use to make a decision about the statistical significance of the terms and the model. The p-value is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.
A sufficiently high test statistic indicates that at least one difference between the medians is statistically significant.
You can use the test statistic to determine whether to reject the null hypothesis. However, using the p-value of the test to make the same determination is usually more practical and convenient.
An individual value plot displays the individual values in each sample. The individual value plot makes it easy to compare the samples. Each circle represents one observation. An individual value plot is especially useful when your sample size is small.
Use an individual value plot to examine the spread of the data and to identify any potential outliers. Individual value plots are best when the sample size is less than 50.
Examine the spread of your data to determine whether your data appear to be skewed. When data are skewed, the majority of the data are located on the high or low side of the graph. Skewed data indicate that the data might not be normally distributed. Often, skewness is easiest to detect with an individual value plot, a histogram, or a boxplot.
Outliers, which are data values that are far away from other data values, can strongly affect your results. Often, outliers are easy to identify on an individual value plot.
Try to identify the cause of any outliers. Correct any data-entry errors or measurement errors. Consider removing data values for abnormal, one-time events (special causes). Then, repeat the analysis.
The median is the midpoint of the data set. This midpoint value is the point at which half of the observations are above the value and half of the observations are below the value. The median is determined by ranking the observations and finding the observation at the number [N + 1] / 2 in the ranked order. If your data contain an even number of observations, the median is the average value of the observations that are ranked at numbers N / 2 and [N / 2] + 1.
The sample median is an estimate of the population median of each group. The overall median is the median of all observations.
The sample size (N) is the total number of observations in each group.
The sample size affects the confidence interval and the power of the test.
Usually, a larger sample yields a narrower confidence interval. A larger sample size also gives the test more power to detect a difference. For more information, go to What is power?.
The p-value is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.
Use the p-value to determine whether any of the differences between the medians are statistically significant.
The z-value indicates how the average rank for each group compares to the average rank of all observations.