Select the method or formula of your choice.

- Cohen's kappa statistic (unknown standard)
- Cohen's kappa statistic (known standard)
- Testing significance of Cohen's kappa
- Fleiss' kappa statistic (unknown standard)
- Testing significance of Fleiss' kappa (unknown standard)
- Fleiss' kappa statistic (known standard)
- Test the significance of Fleiss' kappa (known standard)

Use Cohen's kappa statistic when classifications are nominal. When the standard is not known and you choose to obtain Cohen's kappa, Minitab will calculate the statistic when the data satisfy the following conditions:

- Within appraiser — there are exactly two trials with an appraiser
- Between appraisers — there are exactly two appraisers each having one trial only

For a particular response value, kappa can be calculated by collapsing all responses that are not equal to the value in one category. Then, you can use the 2X2 table to calculate kappa.

When the true standard is unknown, Minitab estimates Cohen's kappa by:

Trial B (or Appraiser B) | |||||

Trial A (or Appraiser A) | 1 | 2 | ... | k | Total |

1 | p_{11} |
p_{12} |
... | p_{1k} |
p_{1+} |

2 | p_{21} |
p_{22} |
... | p_{2k} |
P_{2+} |

.... | |||||

k | p_{k1} |
p_{k2} |
... | p_{kk} |
p_{k+.} |

Total | p_{.+1} |
p_{.+2} |
... | p_{.+k} |
1 |

Term | Description |
---|---|

P_{o} | the observed proportion of agreement |

p_{ii} | each value in the diagonal of the two-way table |

P_{e} | the expected proportion of times k appraisers agree |

n_{ij} | the number of samples in the i^{th} row and the j^{th} column |

N | the total number of samples |

Use Cohen's kappa statistic when classifications are nominal. When the standard is known and you choose to obtain Cohen's kappa, Minitab will calculate the statistic using the formulas below.

The kappa coefficient for the agreement of trials with the known standard is the mean of these kappa coefficients.

When the true standard is known, first calculate kappa using the data from each trial and the known standard.

Standard | |||||

Trial A | 1 | 2 | ... | k | Total |

1 | p_{11} |
p_{12} |
... | p_{1k} |
p_{1+} |

2 | p_{21} |
p_{22} |
... | p_{2k} |
P_{2+} |

.... | |||||

k | p_{k1} |
p_{k2} |
... | p_{kk} |
p_{k+.} |

Total | p_{.+1} |
p_{.+2} |
... | p_{.+k} |
1 |

Term | Description |
---|---|

P_{o} | the observed proportion of agreement |

p_{ii} | each value in the diagonal of the two-way table |

P_{e} | the expected proportion of times k appraisers agree |

n_{ij} | the number of samples in the i^{th} row and the j^{th} column |

N | the total number of samples |

To test the null hypothesis that the ratings are independent (so that kappa = 0), use:

z = kappa / SE of kappa

This is a one-sided test. Under the null hypothesis, z follows the standard normal distribution. Reject the hypothesis if z is significantly larger than the α critical value.

The standard error of kappa for each trial and the standard is:

Term | Description |
---|---|

P_{e} | the expected proportion of times k appraisers agree |

N | the total number of samples |

There are 2 cases for calculating the kappa statistics.

- Case 1—Agreement within each appraiser
- Calculate the kappa coefficients that represent the agreement within each appraiser.
- In this case, m = the number of trials within each appraiser, m is assumed to be >1. The analyst is interested in examining the agreement between the m trials within each appraiser. Here we assume that each trial is made under the condition that the appraiser doesn't remember the ratings from previous trials.
- Case 2—Agreement between all appraisers
- Calculate the kappa coefficients that represent the agreement between all appraisers.
- In this case, m = the total number of trials across all appraisers. The number of appraisers is assumed to be >1, the number of trials may be 1 or >1. The analyst is interested in the agreement of all the appraisers.

Define x_{ij} to be the number of ratings on sample i into category j, where i is from 1 to n, and j is from 1 to k.

The overall kappa coefficient is defined by:

where:

P_{o} is the observed proportion of the pairwise agreement among the m trials.

P_{e} is the expected proportion of agreement if the ratings from one trial is independent of another.

p_{j} represents the overall proportion of ratings in category j.

Substituting P_{o} and P_{e} into K, the overall kappa coefficient is estimated by:

where:

Term | Description |
---|---|

k | the total number of categories |

m | the number of trials—for case 1, m = the number of trials for each appraiser; for case 2, m = the number of trials for all appraisers. |

n | the number of samples |

x_{ij} | the number of ratings on sample i into category j |

For measuring agreement with respect to classifications into a single one of the k categories, say the j^{th}, one may combine all categories, other than the one of current interest, into a single category and apply the above equation. The resulting formula for the kappa statistic for the j^{th} category is:

where:

Term | Description |
---|---|

k | the total number of categories |

m | the number of trials—for case 1, m = the number of trials for each appraiser; for case 2, m = the number of trials for all appraisers. |

n | the number of samples |

x_{ij} | the number of ratings on sample i into category j |

The null hypothesis, H_{0}, is kappa = 0. The alternative hypothesis, H_{1}, is kappa > 0.

Under the null hypothesis, Z is approximately normally distributed and is used to calculate the p-values.

To test whether kappa > 0, use the following Z statistic:

Var (K) is calculated by:

To test whether kappa > 0 for the j^{th} category, use the following Z statistic:

Var (K_{j}) is calculated by:

Term | Description |
---|---|

K | the overall kappa statistic |

K_{j} | the kappa statistic for the j^{th} category |

k | the total number of categories |

m | the number of trials—for case 1, m = the number of trials for each appraiser; for case 2, m = the number of trials for all appraisers. |

n | the number of samples |

x_{ij} | the number of ratings on sample i into category j |

Use the following steps to calculate overall kappa and kappa for a specific category when the standard rating for each sample is known.

Assume there are m trials.

See the formulas from Fleiss' kappa statistic (unknown standard).

- For each trial, calculate kappa using the ratings from the trial, and the ratings given by the standard. In other words, treat the standard as another trial, and use the unknown standard kappa formulas for two trials to estimate kappa.
- Repeat the calculation for all m trials. Now you have m overall kappa values and m kappa values for the specific category values.

The overall kappa with known standard is then equal to the average of all the m overall kappa values.

In the same way, the kappa for a specific category with known standard is the average of all the m kappa for specific category values.

The null hypothesis, H_{0}, is kappa = 0. The alternative hypothesis, H_{1}, is kappa > 0.

Under the null hypothesis, Z is approximately normally distributed and is used to calculate the p-values.

Where K is the kappa statistic, Var(K) is the variance of the kappa statistic.

See the formulas from Fleiss' kappa statistic (unknown standard)

Assume there are m trials.

- For each trial, calculate variance of kappa using the ratings from the trial, and the ratings given by the standard. In other words, treat the standard as the second trial, and use the variance of kappa formulas for two trial and unknown standard case to calculate the variance.
- Repeat the calculation for all m trials. Now you have m variances for overall kappa and m variances for kappa for specific categories.

The variance of overall kappa with known standards is then equal to the sum of the m variances for overall kappa divided by m^{2}.

Similarly, the variance of kappa for a specific category with known standard equals the sum of the m variances for the kappa for a specific category divided by m^{2}.