Frequencies and chi-square distances for Simple Correspondence Analysis

Find definitions and interpretation for every statistic that is provided for frequencies and chi-square distances for simple correspondence analysis.

Contingency Table

The contingency table tallies observations according to multiple categorical variables. The rows and columns in the table correspond to the categorical variables. The table includes marginal totals for each level of the variables.

The contingency table for simple correspondence analysis is a two-way table that tallies observations for two variables. You can also categorize observations for three or four variables by using the Combine sub-dialog box to cross the variables and create the rows and/or columns of a two-way table.

Interpretation

Use the contingency table to view the observed frequency for each cell defined by a row category and column category. Use the column and row totals to see the total frequency for each category.

Contingency Table A B C D E Total Geology 3.000 19.000 39.000 14.000 10.000 85.000 Biochemistry 1.000 2.000 13.000 1.000 12.000 29.000 Chemistry 6.000 25.000 49.000 21.000 29.000 130.000 Zoology 3.000 15.000 41.000 35.000 26.000 120.000 Physics 10.000 22.000 47.000 9.000 26.000 114.000 Engineering 3.000 11.000 25.000 15.000 34.000 88.000 Microbiology 1.000 6.000 14.000 5.000 11.000 37.000 Botany 0.000 12.000 34.000 17.000 23.000 86.000 Statistics 2.000 5.000 11.000 4.000 7.000 29.000 Mathematics 2.000 11.000 37.000 8.000 20.000 78.000 Total 31.000 128.000 310.000 129.000 198.000 796.000

The following two-way contingency table shows the observed counts of researchers in each academic discipline and funding category (A, B, C, D, E). The Total column indicates that most of the researchers are in the fields of Chemistry (130), Zoology (120), and Physics (114). The Total row indicates that most of the researchers are classified under funding category C (310). For the cell counts, researchers in Chemistry who are classified under funding category C have the highest observed frequency (49).

Expected Frequencies

The expected frequency is the count of observations that is expected in a cell, on average, if the variables are independent. Minitab calculates the expected counts as the product of the row and column totals, divided by the total number of observations.

Expected Frequencies A B C D E Geology 3.310 13.668 33.103 13.775 21.143 Biochemistry 1.129 4.663 11.294 4.700 7.214 Chemistry 5.063 20.905 50.628 21.068 32.337 Zoology 4.673 19.296 46.734 19.447 29.849 Physics 4.440 18.332 44.397 18.475 28.357 Engineering 3.427 14.151 34.271 14.261 21.889 Microbiology 1.441 5.950 14.410 5.996 9.204 Botany 3.349 13.829 33.492 13.937 21.392 Statistics 1.129 4.663 11.294 4.700 7.214 Mathematics 3.038 12.543 30.377 12.641 19.402

The following expected frequency table shows the expected counts of researchers in each academic discipline and funding category (A, B, C, D, E), assuming that the relationship between funding and academic discipline is independent. Because most researchers are in Chemistry and most departments are in funding category C, the combination of those categories has the highest expected value (approximately 51).

Observed – Expected Frequencies

The observed − expected frequency is the difference between the count of actual observations in the cell and the count of observations in the cell that you expect if the variables are independent.

Interpretation

Use the difference between the observed and expected frequencies to look for evidence of possible associations in the data. If two variables are associated, then the distribution of observations for one variable differs depending on the category of the second variable. As a result, the magnitude of the difference between the observed frequency and the expected frequency is relatively large. If the two variables are independent, then the distribution of observations for one variable is similar for all categories of the second variable. As a result, the magnitude of the difference between the observed frequency and the expected frequency is relatively small.

Observed - Expected Frequencies A B C D E Geology -0.310 5.332 5.897 0.225 -11.143 Biochemistry -0.129 -2.663 1.706 -3.700 4.786 Chemistry 0.937 4.095 -1.628 -0.068 -3.337 Zoology -1.673 -4.296 -5.734 15.553 -3.849 Physics 5.560 3.668 2.603 -9.475 -2.357 Engineering -0.427 -3.151 -9.271 0.739 12.111 Microbiology -0.441 0.050 -0.410 -0.996 1.796 Botany -3.349 -1.829 0.508 3.063 1.608 Statistics 0.871 0.337 -0.294 -0.700 -0.214 Mathematics -1.038 -1.543 6.623 -4.641 0.598

In this table, the magnitude of the difference between the observed count and the expected count is relatively large for Zoology and funding category D (15.553) and for Engineering and funding category E (12.111). For these cells, the observed counts are greater than the count that you would expect if the variables were independent. The magnitude of the difference is also relatively large for Geology and funding category E (-11.143). For this cell, the observed count is smaller than the count that you would expect if the variables were independent. Therefore, you can conclude that considerably more Engineering departments were unfunded than expected, and considerably fewer geology departments were unfunded than expected.

Chi-Square Distances

Minitab displays each cell's contribution to the chi-square statistic as the chi-square distance. The chi-square distance for each cell quantifies how much of the total chi-square statistic is attributable to each cell's divergence.

Minitab calculates each cell's contribution to the chi-square statistic as the square of the difference between the observed and expected values for a cell, divided by the expected value for that cell. The total chi-square is the sum of the values for all cells.

Interpretation

You can compare the chi-square distances for each cell to assess which cells contribute most to the total chi-square. If the observed and expected cell frequencies differ greatly, the chi-squared value for the cell is larger. Therefore, a larger chi-square distance in a cell suggests a stronger association between the row and column categories than is expected by chance.

Chi-Square Distances A B C D E Total Geology 0.029 2.080 1.050 0.004 5.873 9.036 Biochemistry 0.015 1.521 0.258 2.913 3.176 7.882 Chemistry 0.173 0.802 0.052 0.000 0.344 1.373 Zoology 0.599 0.957 0.703 12.438 0.496 15.194 Physics 6.964 0.734 0.153 4.859 0.196 12.906 Engineering 0.053 0.702 2.508 0.038 6.700 10.001 Microbiology 0.135 0.000 0.012 0.166 0.351 0.663 Botany 3.349 0.242 0.008 0.673 0.121 4.393 Statistics 0.671 0.024 0.008 0.104 0.006 0.814 Mathematics 0.354 0.190 1.444 1.704 0.018 3.710 Total 12.343 7.252 6.196 22.899 17.282 65.972

In this table, the cell for Zoology and funding category D is 12.438, which accounts for the largest contribution to the total chi-square (65.972). Of the row categories, Zoology (15.194), Physics (12.906), and Engineering (10.001) contribute most to the total chi-square. Of the column categories, funding levels D (22.899) and E (17.282) contribute most to the total chi-square.

Relative Inertias

Cell inertia is the chi-squared value in the cell divided by the total frequency for the contingency table. The sum of all the cell inertias is the total inertia, or simply the inertia. The relative inertia for a cell is the cell inertia divided by the total inertia. The relative inertia for a row is the sum of the cell inertias for the row divided by the total inertia. The relative inertia for a column is the sum of the cell inertias for the column divided by the total inertia.

Interpretation

Use relative inertia to assess the strength of the associations between categories and contributions to variation in the data. Higher values generally indicate a stronger association and a greater proportion of the total variability from expected values in the data.

Relative Inertias A B C D E Total Geology 0.000 0.032 0.016 0.000 0.089 0.137 Biochemistry 0.000 0.023 0.004 0.044 0.048 0.119 Chemistry 0.003 0.012 0.001 0.000 0.005 0.021 Zoology 0.009 0.015 0.011 0.189 0.008 0.230 Physics 0.106 0.011 0.002 0.074 0.003 0.196 Engineering 0.001 0.011 0.038 0.001 0.102 0.152 Microbiology 0.002 0.000 0.000 0.003 0.005 0.010 Botany 0.051 0.004 0.000 0.010 0.002 0.067 Statistics 0.010 0.000 0.000 0.002 0.000 0.012 Mathematics 0.005 0.003 0.022 0.026 0.000 0.056 Total 0.187 0.110 0.094 0.347 0.262 1.000

The Relative Inertias table shows the relative contribution of each cell to the total chi-square statistic. The higher the relative inertia in a cell, the greater the association between the row and column categories. In this table, the cell for Zoology and funding category D has the highest relative inertia (0.189), which is the strongest association in the table. The table also indicates the total relative inertia for each row and column.

By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy