Kappa measures the degree of agreement of the nominal or ordinal assessments made by multiple appraisers when assessing the same samples.
For example, 45 patients are assessed by two different doctors for a particular disease. How often will the doctors' diagnosis of the condition (positive or negative) agree? A different example of nominal assessments is inspectors rating defects on TV screens. Do they consistently agree on their classifications of bubbles, divots, and dirt?
The AIAG1 suggests that a kappa value of at least 0.75 indicates good agreement. However, larger kappa values, such as 0.90, are preferred.
When you have ordinal ratings, such as defect severity ratings on a scale of 1-5, Kendall's coefficients, which take ordering into consideration, are usually more appropriate statistics to determine association than kappa alone.
Minitab can calculate Cohen's kappa when your data satisfy the following requirements:
Fleiss' kappa and Cohen's kappa use different methods to estimate the probability that agreements occur by chance. Fleiss' kappa assumes that the appraisers are selected at random from a group of available appraisers. Cohen's kappa assumes that the appraisers are specifically chosen and are fixed. Thus, Fleiss' kappa and Cohen's kappa estimate the probability of agreement differently.
Kendall's coefficient of concordance indicates the degree of association of ordinal assessments made by multiple appraisers when assessing the same samples. Kendall's coefficient is commonly used in attribute agreement analysis.
Kendall's coefficient values can range from 0 to 1. The higher the value of Kendall's, the stronger the association. Usually Kendall's coefficients of 0.9 or higher are considered very good. A high or significant Kendall's coefficient means that the appraisers are applying essentially the same standard when assessing the samples.
If you provide a known rating for each sample, Minitab also calculates Kendall's correlation coefficients. The correlation coefficients are specified for each appraiser to identify the agreement of each appraiser with the known standard; and an overall coefficient to represent all appraisers with the standards. The correlation coefficient helps you determine whether an appraiser is consistent but inaccurate.
Kendall's coefficient values can range from −1 to 1. A positive value indicates positive association. A negative value indicates negative association. The higher the magnitude, the stronger the association.
The p-value provides the likelihood of obtaining your sample, with its particular Kendall's correlation coefficient, if the null hypothesis (H0) is true. If the p-value is less than or equal to a predetermined level of significance (α-level), then you reject the null hypothesis and claim support for the alternative hypothesis.
Kappa statistics represent absolute agreement between ratings while Kendall's coefficients measure the associations between ratings. Therefore, kappa statistics treat all misclassifications equally, but Kendall's coefficients do not treat all misclassifications equally. For instance, Kendall's coefficients considers the consequences of misclassifying a perfect (rating = 5) object as bad (rating = 1) as more serious than misclassifying it as very good (rating = 4).