Frequencies and chi-square distances for Simple Correspondence Analysis

Find definitions and interpretation for every statistic that is provided for frequencies and chi-square distances for simple correspondence analysis.

In This Topic

Contingency Table
Expected Frequencies
Observed – Expected Frequencies
Chi-Square Distances
Relative Inertias

Contingency Table

The contingency table tallies observations according to multiple categorical variables. The rows and columns in the table correspond to the categorical variables. The table includes marginal totals for each level of the variables.

The contingency table for simple correspondence analysis is a two-way table that tallies observations for two variables. You can also categorize observations for three or four variables by using the Combine sub-dialog box to cross the variables and create the rows and/or columns of a two-way table.

Interpretation

Use the contingency table to view the observed frequency for each cell defined by a row category and column category. Use the column and row totals to see the total frequency for each category.

Contingency Table

	A	B	C	D	E	Total
Geology	3.000	19.000	39.000	14.000	10.000	85.000
Biochemistry	1.000	2.000	13.000	1.000	12.000	29.000
Chemistry	6.000	25.000	49.000	21.000	29.000	130.000
Zoology	3.000	15.000	41.000	35.000	26.000	120.000
Physics	10.000	22.000	47.000	9.000	26.000	114.000
Engineering	3.000	11.000	25.000	15.000	34.000	88.000
Microbiology	1.000	6.000	14.000	5.000	11.000	37.000
Botany	0.000	12.000	34.000	17.000	23.000	86.000
Statistics	2.000	5.000	11.000	4.000	7.000	29.000
Mathematics	2.000	11.000	37.000	8.000	20.000	78.000
Total	31.000	128.000	310.000	129.000	198.000	796.000

The following two-way contingency table shows the observed counts of researchers in each academic discipline and funding category (A, B, C, D, E). The Total column indicates that most of the researchers are in the fields of Chemistry (130), Zoology (120), and Physics (114). The Total row indicates that most of the researchers are classified under funding category C (310). For the cell counts, researchers in Chemistry who are classified under funding category C have the highest observed frequency (49).

Expected Frequencies

The expected frequency is the count of observations that is expected in a cell, on average, if the variables are independent. Minitab calculates the expected counts as the product of the row and column totals, divided by the total number of observations.

Expected Frequencies

	A	B	C	D	E
Geology	3.310	13.668	33.103	13.775	21.143
Biochemistry	1.129	4.663	11.294	4.700	7.214
Chemistry	5.063	20.905	50.628	21.068	32.337
Zoology	4.673	19.296	46.734	19.447	29.849
Physics	4.440	18.332	44.397	18.475	28.357
Engineering	3.427	14.151	34.271	14.261	21.889
Microbiology	1.441	5.950	14.410	5.996	9.204
Botany	3.349	13.829	33.492	13.937	21.392
Statistics	1.129	4.663	11.294	4.700	7.214
Mathematics	3.038	12.543	30.377	12.641	19.402

The following expected frequency table shows the expected counts of researchers in each academic discipline and funding category (A, B, C, D, E), assuming that the relationship between funding and academic discipline is independent. Because most researchers are in Chemistry and most departments are in funding category C, the combination of those categories has the highest expected value (approximately 51).

Observed – Expected Frequencies

The observed − expected frequency is the difference between the count of actual observations in the cell and the count of observations in the cell that you expect if the variables are independent.

Interpretation

Use the difference between the observed and expected frequencies to look for evidence of possible associations in the data. If two variables are associated, then the distribution of observations for one variable differs depending on the category of the second variable. As a result, the magnitude of the difference between the observed frequency and the expected frequency is relatively large. If the two variables are independent, then the distribution of observations for one variable is similar for all categories of the second variable. As a result, the magnitude of the difference between the observed frequency and the expected frequency is relatively small.

Observed - Expected Frequencies

	A	B	C	D	E
Geology	-0.310	5.332	5.897	0.225	-11.143
Biochemistry	-0.129	-2.663	1.706	-3.700	4.786
Chemistry	0.937	4.095	-1.628	-0.068	-3.337
Zoology	-1.673	-4.296	-5.734	15.553	-3.849
Physics	5.560	3.668	2.603	-9.475	-2.357
Engineering	-0.427	-3.151	-9.271	0.739	12.111
Microbiology	-0.441	0.050	-0.410	-0.996	1.796
Botany	-3.349	-1.829	0.508	3.063	1.608
Statistics	0.871	0.337	-0.294	-0.700	-0.214
Mathematics	-1.038	-1.543	6.623	-4.641	0.598

In this table, the magnitude of the difference between the observed count and the expected count is relatively large for Zoology and funding category D (15.553) and for Engineering and funding category E (12.111). For these cells, the observed counts are greater than the count that you would expect if the variables were independent. The magnitude of the difference is also relatively large for Geology and funding category E (-11.143). For this cell, the observed count is smaller than the count that you would expect if the variables were independent. Therefore, you can conclude that considerably more Engineering departments were unfunded than expected, and considerably fewer geology departments were unfunded than expected.

Chi-Square Distances

Minitab displays each cell's contribution to the chi-square statistic as the chi-square distance. The chi-square distance for each cell quantifies how much of the total chi-square statistic is attributable to each cell's divergence.

Minitab calculates each cell's contribution to the chi-square statistic as the square of the difference between the observed and expected values for a cell, divided by the expected value for that cell. The total chi-square is the sum of the values for all cells.

Interpretation

You can compare the chi-square distances for each cell to assess which cells contribute most to the total chi-square. If the observed and expected cell frequencies differ greatly, the chi-squared value for the cell is larger. Therefore, a larger chi-square distance in a cell suggests a stronger association between the row and column categories than is expected by chance.

Chi-Square Distances

	A	B	C	D	E	Total
Geology	0.029	2.080	1.050	0.004	5.873	9.036
Biochemistry	0.015	1.521	0.258	2.913	3.176	7.882
Chemistry	0.173	0.802	0.052	0.000	0.344	1.373
Zoology	0.599	0.957	0.703	12.438	0.496	15.194
Physics	6.964	0.734	0.153	4.859	0.196	12.906
Engineering	0.053	0.702	2.508	0.038	6.700	10.001
Microbiology	0.135	0.000	0.012	0.166	0.351	0.663
Botany	3.349	0.242	0.008	0.673	0.121	4.393
Statistics	0.671	0.024	0.008	0.104	0.006	0.814
Mathematics	0.354	0.190	1.444	1.704	0.018	3.710
Total	12.343	7.252	6.196	22.899	17.282	65.972

In this table, the cell for Zoology and funding category D is 12.438, which accounts for the largest contribution to the total chi-square (65.972). Of the row categories, Zoology (15.194), Physics (12.906), and Engineering (10.001) contribute most to the total chi-square. Of the column categories, funding levels D (22.899) and E (17.282) contribute most to the total chi-square.

Relative Inertias

Cell inertia is the chi-squared value in the cell divided by the total frequency for the contingency table. The sum of all the cell inertias is the total inertia, or simply the inertia. The relative inertia for a cell is the cell inertia divided by the total inertia. The relative inertia for a row is the sum of the cell inertias for the row divided by the total inertia. The relative inertia for a column is the sum of the cell inertias for the column divided by the total inertia.

Interpretation

Use relative inertia to assess the strength of the associations between categories and contributions to variation in the data. Higher values generally indicate a stronger association and a greater proportion of the total variability from expected values in the data.

Relative Inertias

	A	B	C	D	E	Total
Geology	0.000	0.032	0.016	0.000	0.089	0.137
Biochemistry	0.000	0.023	0.004	0.044	0.048	0.119
Chemistry	0.003	0.012	0.001	0.000	0.005	0.021
Zoology	0.009	0.015	0.011	0.189	0.008	0.230
Physics	0.106	0.011	0.002	0.074	0.003	0.196
Engineering	0.001	0.011	0.038	0.001	0.102	0.152
Microbiology	0.002	0.000	0.000	0.003	0.005	0.010
Botany	0.051	0.004	0.000	0.010	0.002	0.067
Statistics	0.010	0.000	0.000	0.002	0.000	0.012
Mathematics	0.005	0.003	0.022	0.026	0.000	0.056
Total	0.187	0.110	0.094	0.347	0.262	1.000

The Relative Inertias table shows the relative contribution of each cell to the total chi-square statistic. The higher the relative inertia in a cell, the greater the association between the row and column categories. In this table, the cell for Zoology and funding category D has the highest relative inertia (0.189), which is the strongest association in the table. The table also indicates the total relative inertia for each row and column.