Find definitions and interpretation guidance for the kappa statistics.

The level of categories in the data. For example, if the appraisers use a 1–5 scale, the responses are 1–5.

Kappa is the ratio of the proportion of times that the appraisers agree (corrected for chance agreement) to the maximum proportion of times that the appraisers could agree (corrected for chance agreement).

Use kappa statistics to assess the degree of agreement of the nominal or ordinal ratings made by multiple appraisers when the appraisers evaluate the same samples.

Minitab can calculate both Fleiss's kappa and Cohen's kappa. Cohen's kappa is a popular statistic for measuring assessment agreement between 2 raters. Fleiss's kappa is a generalization of Cohen's kappa for more than 2 raters. In Attribute Agreement Analysis, Minitab calculates Fleiss's kappa by default.

Minitab can calculate Cohen's kappa when your data satisfy the following requirements:

- To calculate Cohen's kappa for Within Appraiser, you must have 2 trials for each appraiser.
- To calculate Cohen's kappa for Between Appraisers, you must have 2 appraisers with 1 trial.
- To calculate Cohen's kappa for Each Appraiser vs Standard and All Appraisers vs Standard, you must provide a standard for each sample.

Kappa values range from –1 to +1. The higher the value of kappa, the stronger the agreement, as follows:

- When Kappa = 1, perfect agreement exists.
- When Kappa = 0, agreement is the same as would be expected by chance.
- When Kappa < 0, agreement is weaker than expected by chance; this rarely occurs.

The AIAG suggests that a kappa value of at least 0.75 indicates good agreement. However, larger kappa values, such as 0.90, are preferred.

When you have ordinal ratings, such as defect severity ratings on a scale of 1–5, Kendall's coefficients, which account for ordering, are usually more appropriate statistics to determine association than kappa alone.

For more information, see Kappa statistics and Kendall's coefficients.

The standard error for an estimated kappa statistic measures the precision of the estimate. The smaller the standard error, the more precise the estimate.

Z is the z-value, which is the approximate normal test statistic. Minitab uses the z-value to determine the p-value.

The p-value is a probability that measures the evidence against the null hypothesis. Lower p-values provide stronger evidence against the null hypothesis.

Use the p-value for kappa to determine whether to reject or fail to reject the following null hypotheses:

- H
_{0}for Within Appraiser: the agreement within appraiser is due to chance. - H
_{0}for Each Appraiser vs Standard: the agreement between appraisers' ratings and the standard is due to chance. - H
_{0}for Between Appraisers: the agreement between appraisers is due to chance. - H
_{0}for All Appraisers vs Standard: the agreement between all appraisers' ratings and the standard is due to chance.

Minitab uses the z-value to determine the p-value.

To determine whether agreement is due to chance, compare the p-value to the significance level. Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates that the risk of concluding that the appraisers are in agreement—when, actually, they are not—is 5%.

- P-value ≤ α: The appraiser agreement is not due to chance (Reject H
_{0}) - If the p-value is less than or equal to the significance level, you reject the null hypothesis and conclude that the appraiser agreement is significantly different from what would be achieved by chance.
- P-value > α: The appraiser agreement is due to chance (Fail to reject H
_{0}) - If the p-value is larger than the significance level, you fail to reject the null hypothesis because you do not have enough evidence to conclude that the appraiser agreement is different from what would be achieved by chance.