A contingency table is table that tallies observations by multiple categorical variables. The tables' rows and columns correspond to these categorical variables.
For example, after a recent election between two candidates, an exit poll recorded the gender and vote of 100 random voters and tabulated the data as follows:
Candidate A | Candidate B | All | |
---|---|---|---|
Male | 28 | 20 | 48 |
Female | 39 | 13 | 52 |
All | 67 | 33 | 100 |
This contingency table tallies responses by gender and vote. The count at the intersection of row i and column j is identified by nij, and it represents the number of observations that exhibit that combination of levels. For example, n1,2 displays the number of male respondents who voted for Candidate B.
The table also includes marginal totals for each level of the variables. The marginal totals for the rows show that 52 of the respondents were female. Marginal totals for columns show that 67 respondents voted for Candidate A. Also, the grand total shows that the sample size is 100.
Contingency tables can also reveal associations between the two variables. Use a chi-square test or Fisher's exact test to determine whether the observed counts differ significantly from the expected counts under the null hypothesis of no association. For example, you could test whether an association exists between gender and vote.
The simplest contingency tables are two-way tables that tally the responses by two variables. You can categorize observations by three or more variables by "crossing" them. In the previous voting example, you could also classify the responses by employment status as follows:
Candidate A | Candidate B | Total | |
---|---|---|---|
Male / employed | 18 | 19 | 37 |
Male / unemployed | 10 | 1 | 11 |
Female / employed | 33 | 10 | 43 |
Female / unemployed | 6 | 3 | 9 |
Total | 67 | 33 | 100 |
Simple correspondence analysis can detect associations in contingency tables that categorize data by more than two variables. To perform a simple correspondence analysis in Minitab, choose
.You can use to calculate the odds ratio and confidence interval.
Heart Attack | No Heart Attack | |
---|---|---|
Placebo | 189 | 10845 |
Aspirin | 104 | 10933 |
C1 | C2 | C3 |
---|---|---|
Group | Heart Attack | Count |
Placebo | Yes | 189 |
Placebo | No | 10845 |
Aspirin | Yes | 104 |
Aspirin | No | 10933 |
The odds ratio is 1.8321. This means that a person taking the placebo has odds 1.8321 times larger of having a heart attack than a person taking aspirin. You can be 95% confident that the true value for the odds ratio is between 1.44 and 2.3308.
The data used in this example is from page 20 of A. Agresti (1996). An Introduction to Categorical Data Analysis. John Wiley & Sons, Inc.