Select the method or formula of your choice.

Fisher's exact test is a test of independence. The test is based on an exact distribution rather than on the approximate chi-square distribution used for the Pearson's and likelihood ratio tests. Fishers exact test is useful when the expected cell counts are low and the chi-square approximation is not very good.

The p-value is based on a hypergeometric distribution with the following parameters:

- population size
- total number of observations
- number of successes in the population
- number of observations in the first row
- sample size
- number of observations in the first column

Suppose you want to calculate the p-value for Fisher's exact test for
cookie preference between adults and children.

Child | Adult | Row Total | |
---|---|---|---|

Sugar |
9 | 1 | 10 |

Chocolate
Chip |
2 | 8 | 10 |

Column Total |
11 | 9 | 20 |

Fisher showed that the probability of obtaining this set of values follows the hypergeometric distribution
where in the example looks like this:

Child | Adult | Row Total | |
---|---|---|---|

Sugar |
a | b | a+b |

Chocolate
Chip |
c | d | c+d |

Column Total |
a+c | b+d | a+b+c+d |

In the case of a 2x2 matrix, you can calculate the p-value of the test by summing all the p-values which are less than the conditional probability of the actual matrix. Thus, p_{cutoff}.

For this example, the sum of the p-values that are less than or equal to the p_{cutoff} for the other possible matrices is 0.0054775.

McNemar's test compares proportions that are observed before and after a treatment. For example, you can use McNemar's test to determine whether a training program changes the proportion of participants who correctly answer a question.

Observations for McNemar's test can be summarized in a two-by-two table, as shown below.

After Treatment |
|||

Before Treatment |
Condition True | Condition Not True | Total |

Condition True | n_{11} |
n_{12} |
n_{1.} |

Condition Not True | n_{21} |
n_{22} |
n_{2.} |

Total | n_{·1} |
n_{·2} |
n_{··} |

The condition for the training example is a correct answer. Thus, *n*_{21} represents the number of participants who answer the question correctly after training but not before training. And *n*_{12} represents the number of participants who answer the question correctly before training but not after training. The total number of participants is represented by *n*_{..}.

Let *δ* be the difference between the marginal probabilities, *p*_{1.}- *p*_{.1}, in the population. The estimated difference, , is given by the following formula:

An approximate 100(1 – α)% confidence interval is given by the following formula:

where *α* is the significance level for the test, *z* _{α/2} is the z-score associated with a tail probability of *α*/2, and SE is given by the following formula:

The null hypothesis is *δ* = 0. The exact p-value for the test of the null hypothesis is calculated as:

where X is a random variable that is drawn from a binomial distribution with an event probability of 0.5 and a number of trials equal to *n*_{21} + *n*_{12}.

The test assumes no three-way interaction exists. The purpose of the test is to assess the degree of relationship between two dichotomous variables while controlling for a nuisance variable. The CMH statistic is compared with a chi-square percentile with one degree of freedom.

The Cochran-Mantel-Haenszel (CMH) test applies only if three or more classification variables exist, and the first two variables have two levels each. All variables beyond the first two are treated as a single variable Z for the purposes of the CMH test, with each combination of levels treated as a level of Z.

Term | Description |
---|---|

k | level of Z |

n _{11k} | number of observations in first row, first column |

n _{1+k} | number of observations in first row |

n _{+1k} | number of observations in first column |

n _{++k} | total number of observations |

n _{2+k} | number of observations in second row |

n _{+2k} | number of observations in second column |