The observed count is the actual number of observations in a sample that belong to a category.
The expected count is the frequency that would be expected in a cell, on average, if the variables are independent. Minitab calculates the expected counts as the product of the row and column totals, divided by the total number of observations.
You can compare the observed values and the expected values for each cell in the output table. In these results, the observed cell count is the first number in each cell, and the expected count is the second number in each cell.
If two variables are associated, then the distribution of observations for one variable will differ depending on the category of the second variable. If two variables are independent, then the distribution of observations for one variable will be similar for all categories of the second variable. In this example, from column 1, row 2 of the table, the observed count is 76, and the expected count is 60.78. The observed count seems to be much larger than would be expected if the variables were independent.
1st shift | 2nd shift | 3rd shift | All | |
---|---|---|---|---|
1 | 48 | 47 | 48 | 143 |
56.08 | 46.97 | 39.96 | ||
-1.0788 | 0.0050 | 1.2726 | ||
2 | 76 | 47 | 32 | 155 |
60.78 | 50.91 | 43.31 | ||
1.9516 | -0.5476 | -1.7184 | ||
3 | 36 | 40 | 34 | 110 |
43.14 | 36.13 | 30.74 | ||
-1.0867 | 0.6443 | 0.5889 | ||
All | 160 | 134 | 114 | 408 |
Use the marginal counts to understand how the counts are distributed between the categories.
In these results, the total for row 1 is 143, the total for row 2 is 155, and the total for row 3 is 110. The sum of all the rows is 408. The total for column 1 is 160, the total for column 2 is 134, and the total for column 3 is 114. The sum of all the columns is 408.
1st shift | 2nd shift | 3rd shift | All | |
---|---|---|---|---|
1 | 48 | 47 | 48 | 143 |
56.08 | 46.97 | 39.96 | ||
-1.0788 | 0.0050 | 1.2726 | ||
2 | 76 | 47 | 32 | 155 |
60.78 | 50.91 | 43.31 | ||
1.9516 | -0.5476 | -1.7184 | ||
3 | 36 | 40 | 34 | 110 |
43.14 | 36.13 | 30.74 | ||
-1.0867 | 0.6443 | 0.5889 | ||
All | 160 | 134 | 114 | 408 |
Minitab displays each cell's contribution to the chi-square statistic, which quantifies how much of the total chi-square statistic is attributable to each cell's divergence.
Minitab calculates each cell's contribution to the chi-square statistic as the square of the difference between the observed and expected values for a cell, divided by the expected value for that cell. The chi-square statistic is the sum of these values for all cells.
1st shift | 2nd shift | 3rd shift | All | |
---|---|---|---|---|
1 | 48 | 47 | 48 | 143 |
56.08 | 46.97 | 39.96 | ||
1.1637 | 0.0000 | 1.6195 | ||
2 | 76 | 47 | 32 | 155 |
60.78 | 50.91 | 43.31 | ||
3.8088 | 0.2998 | 2.9530 | ||
3 | 36 | 40 | 34 | 110 |
43.14 | 36.13 | 30.74 | ||
1.1809 | 0.4151 | 0.3468 | ||
All | 160 | 134 | 114 | 408 |
Chi-Square | DF | P-Value | |
---|---|---|---|
Pearson | 11.788 | 4 | 0.019 |
Likelihood Ratio | 11.816 | 4 | 0.019 |
The Pearson chi-square statistic (χ2) involves the squared difference between the observed and the expected frequencies.
The likelihood-ratio chi-square statistic (G2) is based on the ratio of the observed to the expected frequencies.
Use the chi-square statistics to test whether the variables are associated.
In these results, both the chi-square statistics are very similar. Use the p-values to evaluate the significance of the chi-square statistics.Chi-Square | DF | P-Value | |
---|---|---|---|
Pearson | 11.788 | 4 | 0.019 |
Likelihood Ratio | 11.816 | 4 | 0.019 |
When the expected counts are small, your results may be misleading. For more information, see the Data considerations for Chi-Square Test for Association
The degrees of freedom (DF) is the number of independent pieces of information on a statistic. The degrees of freedom for a table is (number of rows – 1), multiplied by (number of columns – 1).
Minitab uses the degrees of freedom to determine the p-value associated with the test statistic.
In these results, the degrees of freedom (DF) is 4.
Chi-Square | DF | P-Value | |
---|---|---|---|
Pearson | 11.788 | 4 | 0.019 |
Likelihood Ratio | 11.816 | 4 | 0.019 |
The p-value is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.
Use the p-value to determine whether to reject or fail to reject the null hypothesis, which states that no association between two categorical variables exist.
Minitab uses the chi-square statistic to determine the p-value.
Minitab does not display the p-value when any expected count is less than 1 because the results can be invalid.
In these results, the p-value = 0.019. Because the p-value is less than α, you reject the null hypothesis. You can conclude that the variables are associated.
Chi-Square | DF | P-Value | |
---|---|---|---|
Pearson | 11.788 | 4 | 0.019 |
Likelihood Ratio | 11.816 | 4 | 0.019 |
The expected count is the frequency that would be expected in a cell, on average, if the variables are independent. Minitab calculates the expected counts as the product of the row and column totals, divided by the total number of observations.
You can compare the observed values and the expected values in the output table.
1st shift | 2nd shift | 3rd shift | All | |
---|---|---|---|---|
1 | 48 | 47 | 48 | 143 |
56.08 | 46.97 | 39.96 | ||
-8.078 | 0.034 | 8.044 | ||
2 | 76 | 47 | 32 | 155 |
60.78 | 50.91 | 43.31 | ||
15.216 | -3.907 | -11.309 | ||
3 | 36 | 40 | 34 | 110 |
43.14 | 36.13 | 30.74 | ||
-7.137 | 3.873 | 3.265 | ||
All | 160 | 134 | 114 | 408 |
The standardized residuals are the raw residuals (or the difference between the observed counts and expected counts), divided by the square root of the expected counts.
You can compare the standardized residuals in the output table to see which category of variables have the largest difference between the expected counts and the actual counts relative to sample size, and seem to be dependent. For example, you can assess the standardized residuals in the output table to see the association between machine and shift for producing defects.
1st shift | 2nd shift | 3rd shift | All | |
---|---|---|---|---|
1 | 48 | 47 | 48 | 143 |
56.08 | 46.97 | 39.96 | ||
-1.0788 | 0.0050 | 1.2726 | ||
2 | 76 | 47 | 32 | 155 |
60.78 | 50.91 | 43.31 | ||
1.9516 | -0.5476 | -1.7184 | ||
3 | 36 | 40 | 34 | 110 |
43.14 | 36.13 | 30.74 | ||
-1.0867 | 0.6443 | 0.5889 | ||
All | 160 | 134 | 114 | 408 |
The adjusted residuals are the raw residuals (or the difference between the observed counts and expected counts) divided by an estimate of the standard error. Use adjusted residuals to account for the variation due to the sample size.
You can compare the adjusted residuals in the output table to see which categories have the largest difference between the expected counts and the actual counts relative to sample size. For example, you can see which machine or shift has the largest difference between the expected number of defectives and the actual number of defectives.
1st shift | 2nd shift | 3rd shift | All | |
---|---|---|---|---|
1 | 48 | 47 | 48 | 143 |
56.08 | 46.97 | 39.96 | ||
-1.7169 | 0.0076 | 1.8602 | ||
2 | 76 | 47 | 32 | 155 |
60.78 | 50.91 | 43.31 | ||
3.1788 | -0.8485 | -2.5707 | ||
3 | 36 | 40 | 34 | 110 |
43.14 | 36.13 | 30.74 | ||
-1.6309 | 0.9199 | 0.8117 | ||
All | 160 | 134 | 114 | 408 |