The contingency table tallies observations according to multiple categorical variables. The rows and columns in the table correspond to the categorical variables. The table includes marginal totals for each level of the variables.
The contingency table for simple correspondence analysis is a two-way table that tallies observations for two variables. You can also categorize observations for three or four variables by using the Combine sub-dialog box to cross the variables and create the rows and/or columns of a two-way table.
Use the contingency table to view the observed frequency for each cell defined by a row category and column category. Use the column and row totals to see the total frequency for each category.
A | B | C | D | E | Total | |
---|---|---|---|---|---|---|
Geology | 3.000 | 19.000 | 39.000 | 14.000 | 10.000 | 85.000 |
Biochemistry | 1.000 | 2.000 | 13.000 | 1.000 | 12.000 | 29.000 |
Chemistry | 6.000 | 25.000 | 49.000 | 21.000 | 29.000 | 130.000 |
Zoology | 3.000 | 15.000 | 41.000 | 35.000 | 26.000 | 120.000 |
Physics | 10.000 | 22.000 | 47.000 | 9.000 | 26.000 | 114.000 |
Engineering | 3.000 | 11.000 | 25.000 | 15.000 | 34.000 | 88.000 |
Microbiology | 1.000 | 6.000 | 14.000 | 5.000 | 11.000 | 37.000 |
Botany | 0.000 | 12.000 | 34.000 | 17.000 | 23.000 | 86.000 |
Statistics | 2.000 | 5.000 | 11.000 | 4.000 | 7.000 | 29.000 |
Mathematics | 2.000 | 11.000 | 37.000 | 8.000 | 20.000 | 78.000 |
Total | 31.000 | 128.000 | 310.000 | 129.000 | 198.000 | 796.000 |
The following two-way contingency table shows the observed counts of researchers in each academic discipline and funding category (A, B, C, D, E). The Total column indicates that most of the researchers are in the fields of Chemistry (130), Zoology (120), and Physics (114). The Total row indicates that most of the researchers are classified under funding category C (310). For the cell counts, researchers in Chemistry who are classified under funding category C have the highest observed frequency (49).
The expected frequency is the count of observations that is expected in a cell, on average, if the variables are independent. Minitab calculates the expected counts as the product of the row and column totals, divided by the total number of observations.
A | B | C | D | E | |
---|---|---|---|---|---|
Geology | 3.310 | 13.668 | 33.103 | 13.775 | 21.143 |
Biochemistry | 1.129 | 4.663 | 11.294 | 4.700 | 7.214 |
Chemistry | 5.063 | 20.905 | 50.628 | 21.068 | 32.337 |
Zoology | 4.673 | 19.296 | 46.734 | 19.447 | 29.849 |
Physics | 4.440 | 18.332 | 44.397 | 18.475 | 28.357 |
Engineering | 3.427 | 14.151 | 34.271 | 14.261 | 21.889 |
Microbiology | 1.441 | 5.950 | 14.410 | 5.996 | 9.204 |
Botany | 3.349 | 13.829 | 33.492 | 13.937 | 21.392 |
Statistics | 1.129 | 4.663 | 11.294 | 4.700 | 7.214 |
Mathematics | 3.038 | 12.543 | 30.377 | 12.641 | 19.402 |
The following expected frequency table shows the expected counts of researchers in each academic discipline and funding category (A, B, C, D, E), assuming that the relationship between funding and academic discipline is independent. Because most researchers are in Chemistry and most departments are in funding category C, the combination of those categories has the highest expected value (approximately 51).
The observed − expected frequency is the difference between the count of actual observations in the cell and the count of observations in the cell that you expect if the variables are independent.
Use the difference between the observed and expected frequencies to look for evidence of possible associations in the data. If two variables are associated, then the distribution of observations for one variable differs depending on the category of the second variable. As a result, the magnitude of the difference between the observed frequency and the expected frequency is relatively large. If the two variables are independent, then the distribution of observations for one variable is similar for all categories of the second variable. As a result, the magnitude of the difference between the observed frequency and the expected frequency is relatively small.
A | B | C | D | E | |
---|---|---|---|---|---|
Geology | -0.310 | 5.332 | 5.897 | 0.225 | -11.143 |
Biochemistry | -0.129 | -2.663 | 1.706 | -3.700 | 4.786 |
Chemistry | 0.937 | 4.095 | -1.628 | -0.068 | -3.337 |
Zoology | -1.673 | -4.296 | -5.734 | 15.553 | -3.849 |
Physics | 5.560 | 3.668 | 2.603 | -9.475 | -2.357 |
Engineering | -0.427 | -3.151 | -9.271 | 0.739 | 12.111 |
Microbiology | -0.441 | 0.050 | -0.410 | -0.996 | 1.796 |
Botany | -3.349 | -1.829 | 0.508 | 3.063 | 1.608 |
Statistics | 0.871 | 0.337 | -0.294 | -0.700 | -0.214 |
Mathematics | -1.038 | -1.543 | 6.623 | -4.641 | 0.598 |
In this table, the magnitude of the difference between the observed count and the expected count is relatively large for Zoology and funding category D (15.553) and for Engineering and funding category E (12.111). For these cells, the observed counts are greater than the count that you would expect if the variables were independent. The magnitude of the difference is also relatively large for Geology and funding category E (-11.143). For this cell, the observed count is smaller than the count that you would expect if the variables were independent. Therefore, you can conclude that considerably more Engineering departments were unfunded than expected, and considerably fewer geology departments were unfunded than expected.
Minitab displays each cell's contribution to the chi-square statistic as the chi-square distance. The chi-square distance for each cell quantifies how much of the total chi-square statistic is attributable to each cell's divergence.
Minitab calculates each cell's contribution to the chi-square statistic as the square of the difference between the observed and expected values for a cell, divided by the expected value for that cell. The total chi-square is the sum of the values for all cells.
You can compare the chi-square distances for each cell to assess which cells contribute most to the total chi-square. If the observed and expected cell frequencies differ greatly, the chi-squared value for the cell is larger. Therefore, a larger chi-square distance in a cell suggests a stronger association between the row and column categories than is expected by chance.
A | B | C | D | E | Total | |
---|---|---|---|---|---|---|
Geology | 0.029 | 2.080 | 1.050 | 0.004 | 5.873 | 9.036 |
Biochemistry | 0.015 | 1.521 | 0.258 | 2.913 | 3.176 | 7.882 |
Chemistry | 0.173 | 0.802 | 0.052 | 0.000 | 0.344 | 1.373 |
Zoology | 0.599 | 0.957 | 0.703 | 12.438 | 0.496 | 15.194 |
Physics | 6.964 | 0.734 | 0.153 | 4.859 | 0.196 | 12.906 |
Engineering | 0.053 | 0.702 | 2.508 | 0.038 | 6.700 | 10.001 |
Microbiology | 0.135 | 0.000 | 0.012 | 0.166 | 0.351 | 0.663 |
Botany | 3.349 | 0.242 | 0.008 | 0.673 | 0.121 | 4.393 |
Statistics | 0.671 | 0.024 | 0.008 | 0.104 | 0.006 | 0.814 |
Mathematics | 0.354 | 0.190 | 1.444 | 1.704 | 0.018 | 3.710 |
Total | 12.343 | 7.252 | 6.196 | 22.899 | 17.282 | 65.972 |
In this table, the cell for Zoology and funding category D is 12.438, which accounts for the largest contribution to the total chi-square (65.972). Of the row categories, Zoology (15.194), Physics (12.906), and Engineering (10.001) contribute most to the total chi-square. Of the column categories, funding levels D (22.899) and E (17.282) contribute most to the total chi-square.
Cell inertia is the chi-squared value in the cell divided by the total frequency for the contingency table. The sum of all the cell inertias is the total inertia, or simply the inertia. The relative inertia for a cell is the cell inertia divided by the total inertia. The relative inertia for a row is the sum of the cell inertias for the row divided by the total inertia. The relative inertia for a column is the sum of the cell inertias for the column divided by the total inertia.
Use relative inertia to assess the strength of the associations between categories and contributions to variation in the data. Higher values generally indicate a stronger association and a greater proportion of the total variability from expected values in the data.
A | B | C | D | E | Total | |
---|---|---|---|---|---|---|
Geology | 0.000 | 0.032 | 0.016 | 0.000 | 0.089 | 0.137 |
Biochemistry | 0.000 | 0.023 | 0.004 | 0.044 | 0.048 | 0.119 |
Chemistry | 0.003 | 0.012 | 0.001 | 0.000 | 0.005 | 0.021 |
Zoology | 0.009 | 0.015 | 0.011 | 0.189 | 0.008 | 0.230 |
Physics | 0.106 | 0.011 | 0.002 | 0.074 | 0.003 | 0.196 |
Engineering | 0.001 | 0.011 | 0.038 | 0.001 | 0.102 | 0.152 |
Microbiology | 0.002 | 0.000 | 0.000 | 0.003 | 0.005 | 0.010 |
Botany | 0.051 | 0.004 | 0.000 | 0.010 | 0.002 | 0.067 |
Statistics | 0.010 | 0.000 | 0.000 | 0.002 | 0.000 | 0.012 |
Mathematics | 0.005 | 0.003 | 0.022 | 0.026 | 0.000 | 0.056 |
Total | 0.187 | 0.110 | 0.094 | 0.347 | 0.262 | 1.000 |
The Relative Inertias table shows the relative contribution of each cell to the total chi-square statistic. The higher the relative inertia in a cell, the greater the association between the row and column categories. In this table, the cell for Zoology and funding category D has the highest relative inertia (0.189), which is the strongest association in the table. The table also indicates the total relative inertia for each row and column.