Frequencies and chi-square distances for Simple Correspondence Analysis

Find definitions and interpretation for every statistic that is provided for frequencies and chi-square distances for simple correspondence analysis.

Contingency Table

The contingency table tallies observations according to multiple categorical variables. The rows and columns in the table correspond to the categorical variables. The table includes marginal totals for each level of the variables.

The contingency table for simple correspondence analysis is a two-way table that tallies observations for two variables. You can also categorize observations for three or four variables by using the Combine sub-dialog box to cross the variables and create the rows and/or columns of a two-way table.

Interpretation

Use the contingency table to view the observed frequency for each cell defined by a row category and column category. Use the column and row totals to see the total frequency for each category.

Contingency Table

ABCDETotal
Geology3.00019.00039.00014.00010.00085.000
Biochemistry1.0002.00013.0001.00012.00029.000
Chemistry6.00025.00049.00021.00029.000130.000
Zoology3.00015.00041.00035.00026.000120.000
Physics10.00022.00047.0009.00026.000114.000
Engineering3.00011.00025.00015.00034.00088.000
Microbiology1.0006.00014.0005.00011.00037.000
Botany0.00012.00034.00017.00023.00086.000
Statistics2.0005.00011.0004.0007.00029.000
Mathematics2.00011.00037.0008.00020.00078.000
Total31.000128.000310.000129.000198.000796.000

The following two-way contingency table shows the observed counts of researchers in each academic discipline and funding category (A, B, C, D, E). The Total column indicates that most of the researchers are in the fields of Chemistry (130), Zoology (120), and Physics (114). The Total row indicates that most of the researchers are classified under funding category C (310). For the cell counts, researchers in Chemistry who are classified under funding category C have the highest observed frequency (49).

Expected Frequencies

The expected frequency is the count of observations that is expected in a cell, on average, if the variables are independent. Minitab calculates the expected counts as the product of the row and column totals, divided by the total number of observations.

Expected Frequencies

ABCDE
Geology3.31013.66833.10313.77521.143
Biochemistry1.1294.66311.2944.7007.214
Chemistry5.06320.90550.62821.06832.337
Zoology4.67319.29646.73419.44729.849
Physics4.44018.33244.39718.47528.357
Engineering3.42714.15134.27114.26121.889
Microbiology1.4415.95014.4105.9969.204
Botany3.34913.82933.49213.93721.392
Statistics1.1294.66311.2944.7007.214
Mathematics3.03812.54330.37712.64119.402

The following expected frequency table shows the expected counts of researchers in each academic discipline and funding category (A, B, C, D, E), assuming that the relationship between funding and academic discipline is independent. Because most researchers are in Chemistry and most departments are in funding category C, the combination of those categories has the highest expected value (approximately 51).

Observed – Expected Frequencies

The observed − expected frequency is the difference between the count of actual observations in the cell and the count of observations in the cell that you expect if the variables are independent.

Interpretation

Use the difference between the observed and expected frequencies to look for evidence of possible associations in the data. If two variables are associated, then the distribution of observations for one variable differs depending on the category of the second variable. As a result, the magnitude of the difference between the observed frequency and the expected frequency is relatively large. If the two variables are independent, then the distribution of observations for one variable is similar for all categories of the second variable. As a result, the magnitude of the difference between the observed frequency and the expected frequency is relatively small.

Observed - Expected Frequencies

ABCDE
Geology-0.3105.3325.8970.225-11.143
Biochemistry-0.129-2.6631.706-3.7004.786
Chemistry0.9374.095-1.628-0.068-3.337
Zoology-1.673-4.296-5.73415.553-3.849
Physics5.5603.6682.603-9.475-2.357
Engineering-0.427-3.151-9.2710.73912.111
Microbiology-0.4410.050-0.410-0.9961.796
Botany-3.349-1.8290.5083.0631.608
Statistics0.8710.337-0.294-0.700-0.214
Mathematics-1.038-1.5436.623-4.6410.598

In this table, the magnitude of the difference between the observed count and the expected count is relatively large for Zoology and funding category D (15.553) and for Engineering and funding category E (12.111). For these cells, the observed counts are greater than the count that you would expect if the variables were independent. The magnitude of the difference is also relatively large for Geology and funding category E (-11.143). For this cell, the observed count is smaller than the count that you would expect if the variables were independent. Therefore, you can conclude that considerably more Engineering departments were unfunded than expected, and considerably fewer geology departments were unfunded than expected.

Chi-Square Distances

Minitab displays each cell's contribution to the chi-square statistic as the chi-square distance. The chi-square distance for each cell quantifies how much of the total chi-square statistic is attributable to each cell's divergence.

Minitab calculates each cell's contribution to the chi-square statistic as the square of the difference between the observed and expected values for a cell, divided by the expected value for that cell. The total chi-square is the sum of the values for all cells.

Interpretation

You can compare the chi-square distances for each cell to assess which cells contribute most to the total chi-square. If the observed and expected cell frequencies differ greatly, the chi-squared value for the cell is larger. Therefore, a larger chi-square distance in a cell suggests a stronger association between the row and column categories than is expected by chance.

Chi-Square Distances

ABCDETotal
Geology0.0292.0801.0500.0045.8739.036
Biochemistry0.0151.5210.2582.9133.1767.882
Chemistry0.1730.8020.0520.0000.3441.373
Zoology0.5990.9570.70312.4380.49615.194
Physics6.9640.7340.1534.8590.19612.906
Engineering0.0530.7022.5080.0386.70010.001
Microbiology0.1350.0000.0120.1660.3510.663
Botany3.3490.2420.0080.6730.1214.393
Statistics0.6710.0240.0080.1040.0060.814
Mathematics0.3540.1901.4441.7040.0183.710
Total12.3437.2526.19622.89917.28265.972

In this table, the cell for Zoology and funding category D is 12.438, which accounts for the largest contribution to the total chi-square (65.972). Of the row categories, Zoology (15.194), Physics (12.906), and Engineering (10.001) contribute most to the total chi-square. Of the column categories, funding levels D (22.899) and E (17.282) contribute most to the total chi-square.

Relative Inertias

Cell inertia is the chi-squared value in the cell divided by the total frequency for the contingency table. The sum of all the cell inertias is the total inertia, or simply the inertia. The relative inertia for a cell is the cell inertia divided by the total inertia. The relative inertia for a row is the sum of the cell inertias for the row divided by the total inertia. The relative inertia for a column is the sum of the cell inertias for the column divided by the total inertia.

Interpretation

Use relative inertia to assess the strength of the associations between categories and contributions to variation in the data. Higher values generally indicate a stronger association and a greater proportion of the total variability from expected values in the data.

Relative Inertias

ABCDETotal
Geology0.0000.0320.0160.0000.0890.137
Biochemistry0.0000.0230.0040.0440.0480.119
Chemistry0.0030.0120.0010.0000.0050.021
Zoology0.0090.0150.0110.1890.0080.230
Physics0.1060.0110.0020.0740.0030.196
Engineering0.0010.0110.0380.0010.1020.152
Microbiology0.0020.0000.0000.0030.0050.010
Botany0.0510.0040.0000.0100.0020.067
Statistics0.0100.0000.0000.0020.0000.012
Mathematics0.0050.0030.0220.0260.0000.056
Total0.1870.1100.0940.3470.2621.000

The Relative Inertias table shows the relative contribution of each cell to the total chi-square statistic. The higher the relative inertia in a cell, the greater the association between the row and column categories. In this table, the cell for Zoology and funding category D has the highest relative inertia (0.189), which is the strongest association in the table. The table also indicates the total relative inertia for each row and column.