Methods and formulas for kappa statistics for Attribute Agreement Analysis

Select the method or formula of your choice.

In This Topic

Cohen's kappa statistic (unknown standard)
Cohen's kappa statistic (known standard)
Testing significance of Cohen's kappa
Fleiss' kappa statistic (unknown standard)
Testing significance of Fleiss' kappa (unknown standard)
Fleiss' kappa statistic (known standard)
Test the significance of Fleiss' kappa (known standard)

Cohen's kappa statistic (unknown standard)

Use Cohen's kappa statistic when classifications are nominal. When the standard is not known and you choose to obtain Cohen's kappa, Minitab will calculate the statistic when the data satisfy the following conditions:

Within appraiser — there are exactly two trials with an appraiser
Between appraisers — there are exactly two appraisers each having one trial only

For a particular response value, kappa can be calculated by collapsing all responses that are not equal to the value in one category. Then, you can use the 2X2 table to calculate kappa.

Formulas

When the true standard is unknown, Minitab estimates Cohen's kappa by:

	Trial B (or Appraiser B)
Trial A (or Appraiser A)	1	2	...	k	Total
1	p₁₁	p₁₂	...	p_1k	p₁₊
2	p₂₁	p₂₂	...	p_2k	P₂₊
....
k	p_k1	p_k2	...	p_kk	p_k+.
Total	p_.+1	p_.+2	...	p_.+k	1

Notation

Term	Description
P_o	the observed proportion of agreement
p_ii	each value in the diagonal of the two-way table
P_e	the expected proportion of times k appraisers agree
n_ij	the number of samples in the i^th row and the j^th column
N	the total number of samples

Cohen's kappa statistic (known standard)

Use Cohen's kappa statistic when classifications are nominal. When the standard is known and you choose to obtain Cohen's kappa, Minitab will calculate the statistic using the formulas below.

The kappa coefficient for the agreement of trials with the known standard is the mean of these kappa coefficients.

Formulas

When the true standard is known, first calculate kappa using the data from each trial and the known standard.

	Standard
Trial A	1	2	...	k	Total
1	p₁₁	p₁₂	...	p_1k	p₁₊
2	p₂₁	p₂₂	...	p_2k	P₂₊
....
k	p_k1	p_k2	...	p_kk	p_k+.
Total	p_.+1	p_.+2	...	p_.+k	1

Notation

Term	Description
P_o	the observed proportion of agreement
p_ii	each value in the diagonal of the two-way table
P_e	the expected proportion of times k appraisers agree
n_ij	the number of samples in the i^th row and the j^th column
N	the total number of samples

Testing significance of Cohen's kappa

To test the null hypothesis that the ratings are independent (so that kappa = 0), use:

z = kappa / SE of kappa

This is a one-sided test. Under the null hypothesis, z follows the standard normal distribution. Reject the hypothesis if z is significantly larger than the α critical value.

Formulas

The standard error of kappa for each trial and the standard is:

Notation

Term	Description
P_e	the expected proportion of times k appraisers agree
N	the total number of samples

Fleiss' kappa statistic (unknown standard)

There are 2 cases for calculating the kappa statistics.

Case 1—Agreement within each appraiser: Calculate the kappa coefficients that represent the agreement within each appraiser.; In this case, m = the number of trials within each appraiser, m is assumed to be >1. The analyst is interested in examining the agreement between the m trials within each appraiser. Here we assume that each trial is made under the condition that the appraiser doesn't remember the ratings from previous trials.
Case 2—Agreement between all appraisers: Calculate the kappa coefficients that represent the agreement between all appraisers.; In this case, m = the total number of trials across all appraisers. The number of appraisers is assumed to be >1, the number of trials may be 1 or >1. The analyst is interested in the agreement of all the appraisers.

Formulas for overall kappa

Define x_ij to be the number of ratings on sample i into category j, where i is from 1 to n, and j is from 1 to k.

The overall kappa coefficient is defined by:

where:

P_o is the observed proportion of the pairwise agreement among the m trials.

P_e is the expected proportion of agreement if the ratings from one trial is independent of another.

p_j represents the overall proportion of ratings in category j.

Substituting P_o and P_e into K, the overall kappa coefficient is estimated by:

where:

Term	Description
k	the total number of categories
m	the number of trials—for case 1, m = the number of trials for each appraiser; for case 2, m = the number of trials for all appraisers.
n	the number of samples
x_ij	the number of ratings on sample i into category j

Formulas for kappa for a single category

For measuring agreement with respect to classifications into a single one of the k categories, say the j^th, one may combine all categories, other than the one of current interest, into a single category and apply the above equation. The resulting formula for the kappa statistic for the j^th category is:

where:

Term	Description
k	the total number of categories
m	the number of trials—for case 1, m = the number of trials for each appraiser; for case 2, m = the number of trials for all appraisers.
n	the number of samples
x_ij	the number of ratings on sample i into category j

Testing significance of Fleiss' kappa (unknown standard)

The null hypothesis, H₀, is kappa = 0. The alternative hypothesis, H₁, is kappa > 0.

Under the null hypothesis, Z is approximately normally distributed and is used to calculate the p-values.

Formulas

To test whether kappa > 0, use the following Z statistic:

Var (K) is calculated by:

To test whether kappa > 0 for the j^th category, use the following Z statistic:

Var (K_j) is calculated by:

Notation

Term	Description
K	the overall kappa statistic
K_j	the kappa statistic for the j^th category
k	the total number of categories
m	the number of trials—for case 1, m = the number of trials for each appraiser; for case 2, m = the number of trials for all appraisers.
n	the number of samples
x_ij	the number of ratings on sample i into category j

Fleiss' kappa statistic (known standard)

Use the following steps to calculate overall kappa and kappa for a specific category when the standard rating for each sample is known.

Assume there are m trials.

Note

See the formulas from Fleiss' kappa statistic (unknown standard).

For each trial, calculate kappa using the ratings from the trial, and the ratings given by the standard. In other words, treat the standard as another trial, and use the unknown standard kappa formulas for two trials to estimate kappa.
Repeat the calculation for all m trials.
Now you have m overall kappa values and m kappa values for the specific category values.

The overall kappa with known standard is then equal to the average of all the m overall kappa values.

In the same way, the kappa for a specific category with known standard is the average of all the m kappa for specific category values.

Test the significance of Fleiss' kappa (known standard)

The null hypothesis, H₀, is kappa = 0. The alternative hypothesis, H₁, is kappa > 0.

Under the null hypothesis, Z is approximately normally distributed and is used to calculate the p-values.

Where K is the kappa statistic, Var(K) is the variance of the kappa statistic.

Note

See the formulas from Fleiss' kappa statistic (unknown standard)

Assume there are m trials.

For each trial, calculate variance of kappa using the ratings from the trial, and the ratings given by the standard. In other words, treat the standard as the second trial, and use the variance of kappa formulas for two trial and unknown standard case to calculate the variance.
Repeat the calculation for all m trials.
Now you have m variances for overall kappa and m variances for kappa for specific categories.

The variance of overall kappa with known standards is then equal to the sum of the m variances for overall kappa divided by m².

Similarly, the variance of kappa for a specific category with known standard equals the sum of the m variances for the kappa for a specific category divided by m².