What is kappa?

Kappa measures the degree of agreement of the nominal or ordinal assessments made by multiple appraisers when assessing the same samples.

For example, 45 patients are assessed by two different doctors for a particular disease. How often will the doctors' diagnosis of the condition (positive or negative) agree? A different example of nominal assessments is inspectors rating defects on TV screens. Do they consistently agree on their classifications of bubbles, divots, and dirt?

Interpreting kappa values

Kappa values range from −1 to +1. The higher the value of kappa, the stronger the agreement. When:
  • Kappa = 1, perfect agreement exists.
  • Kappa = 0, agreement is the same as would be expected by chance.
  • Kappa < 0, agreement is weaker than expected by chance; this rarely occurs.

The AIAG1 suggests that a kappa value of at least 0.75 indicates good agreement. However, larger kappa values, such as 0.90, are preferred.

When you have ordinal ratings, such as defect severity ratings on a scale of 1-5, Kendall's coefficients, which take ordering into consideration, are usually more appropriate statistics to determine association than kappa alone.

A comparison of Fleiss's kappa and Cohen's kappa

Minitab can calculate both Fleiss's kappa and Cohen's kappa. Cohen's kappa is a popular statistics for measuring assessment agreement between two raters. Fleiss's kappa is a generalization of Cohen's kappa for more than 2 raters. In Attribute Agreement Analysis, Minitab calculates Fleiss' kappa by default and offers the option to calculate Cohen's kappa when appropriate.
Note

Minitab can calculate Cohen's kappa when your data satisfy the following requirements:

  • To calculate Cohen's kappa for Within Appraiser, you must have 2 trials for each appraiser.
  • To calculate Cohen's kappa for Between Appraisers, you must have 2 appraisers with 1 trial.
  • To calculate Cohen's kappa for Each Appraiser vs Standard and All Appraisers vs Standard, you must provide a standard for each sample.

Fleiss' kappa and Cohen's kappa use different methods to estimate the probability that agreements occur by chance. Fleiss' kappa assumes that the appraisers are selected at random from a group of available appraisers. Cohen's kappa assumes that the appraisers are specifically chosen and are fixed. Thus, Fleiss' kappa and Cohen's kappa estimate the probability of agreement differently.

What is Kendall's coefficient of concordance (KCC)?

Kendall's coefficient of concordance indicates the degree of association of ordinal assessments made by multiple appraisers when assessing the same samples. Kendall's coefficient is commonly used in attribute agreement analysis.

Interpreting Kendall's coefficient of concordance values

Kendall's coefficient values can range from 0 to 1. The higher the value of Kendall's, the stronger the association. Usually Kendall's coefficients of 0.9 or higher are considered very good. A high or significant Kendall's coefficient means that the appraisers are applying essentially the same standard when assessing the samples.

What is Kendall's correlation coefficient?

If you provide a known rating for each sample, Minitab also calculates Kendall's correlation coefficients. The correlation coefficients are specified for each appraiser to identify the agreement of each appraiser with the known standard; and an overall coefficient to represent all appraisers with the standards. The correlation coefficient helps you determine whether an appraiser is consistent but inaccurate.

Interpreting Kendall's correlation coefficient

Kendall's coefficient values can range from −1 to 1. A positive value indicates positive association. A negative value indicates negative association. The higher the magnitude, the stronger the association.

Use Kendall's correlation coefficient and their p-values to choose between two opposing hypotheses, based on your sample data:
  • H0: There is no association between ratings of all appraisers and the known standard.
  • H1: Ratings by all appraisers are associated with the known standard.

The p-value provides the likelihood of obtaining your sample, with its particular Kendall's correlation coefficient, if the null hypothesis (H0) is true. If the p-value is less than or equal to a predetermined level of significance (α-level), then you reject the null hypothesis and claim support for the alternative hypothesis.

Should I use a kappa statistic or one of Kendall coefficients?

  • When your classifications are nominal (true/false, good/bad, crispy/crunchy/soggy), use kappa.
  • When your classifications are ordinal (ratings made on a scale), in addition to kappa statistics, use Kendall's coefficient of concordance.
  • When your classifications are ordinal and you have a known standard for each trial, in addition to kappa statistics, use Kendall's correlation coefficient.

Kappa statistics represent absolute agreement between ratings while Kendall's coefficients measure the associations between ratings. Therefore, kappa statistics treat all misclassifications equally, but Kendall's coefficients do not treat all misclassifications equally. For instance, Kendall's coefficients considers the consequences of misclassifying a perfect (rating = 5) object as bad (rating = 1) as more serious than misclassifying it as very good (rating = 4).

1 Automotive Industry Action Group (AIAG) (2010). Measurement Systems Analysis Reference Manual, 4th edition. Chrysler, Ford, General Motors Supplier Quality Requirements Task Force