Method for Simple Correspondence Analysis

Simple correspondence analysis performs a weighted principal components analysis of a contingency table. If the contingency table has r rows and c columns, the number of underlying dimensions is the smaller of (r − 1) or (c − 1). As with principal components, variability is partitioned, but rather than partitioning the total variance, simple correspondence analysis partitions the Pearson χ² statistic (basically the same statistic calculated in the χ² test for association).

Traditionally, correspondence analysis uses χ² / n, which is termed inertia or total inertia, rather than Χ². The inertias associated with all the principal components add up to the total inertia. Ideally, the first one, two, or three components account for most of the total inertia.

Lower dimensional subspaces are spanned by principal components, also called principal axes. The first principal axis is chosen so that it accounts for the maximum amount of the total inertia; the second principal axis is chosen so that it accounts for the maximum amount of the remaining inertia; and so on. The first principal axis spans the best one-dimensional subspace (closest to the profiles using an appropriate metric; the first two principal axes span the best two-dimensional subspace; and so on. These subspaces are nested, which means that the best one-dimensional subspace is a subspace of the best two-dimensional subspace, and so on.

The principal coordinate for row profile i and component (axis) k is the coordinate of the projection of row profile i onto component k. The row standardized coordinates for component k are the principal coordinates for component k divided by the square root of the k^th inertia.

Likewise, the principal coordinate for column profile j and component k is the coordinate of the projection of column profile j onto component k. The column standardized coordinates for component k are the column principal coordinates for component k divided by the square root of the k^th inertia.

The contingency table can be analyzed in terms of row profiles or column profiles. A row profile is a list of row proportions that are calculated from the counts in the contingency table. Specifically, the profile for row i is (n_i1 / _ni., n_i2 / n_i., ... , n_ic / n_i.). A column profile is a list of column proportions, where n_ij, is the frequency in row i and column j of the table and n_i., is the sum of the frequencies in row i. Specifically, the profile for column j is (n_1j/ n_.j, n_2j / n_.j, ... , n_rj / n_.j), where n_.j, is the sum of the frequencies in column j.

The two analyses are mathematically equivalent. The analysis that you use depends on your application. Most of the time, a researcher is interested in studying either how the row profiles differ from each other or how the column profiles differ from each other.

Row profiles are vectors of length c and therefore lie in a c-dimensional space (similarly, column profiles lie in an r-dimensional space). Because this dimension is usually too high to allow easy interpretation, you should try to find a subspace of lower dimension (preferably not more than two or three) that lies close to all the row profile points (or column profile points). You can then project the profile points onto this subspace and study the projections. If the projections are close to the profiles, you do not lose much information. Working in two or three dimensions allows you to study the data more easily and, in particular, allows you to examine plots. This process is analogous to choosing a small number of principal components to summarize the variability of continuous data.

If d = the smaller of (r − 1) and (c − 1), then the row profiles (or equivalently the column profiles) will lie in a d-dimensional subspace of the full c-dimensional space (or equivalently the full r-dimensional space). Thus, there are at most d principal components.