Attribute Agreement Analysis


Provides an evaluation of how well appraisers can match each other and the standard (expert). You can use attribute agreement analysis for binary data (good or bad), nominal data (yellow, blue, brown), or ordinal data (1, 2, 3, 4, where categories are value-ordered). Evaluation includes various % agreement analyses as well as Kappa and Kendall’s metrics.

Answers the questions:
  • Does the same appraiser reach the same conclusion when evaluating the same samples repeatedly?
  • Do different appraisers reach the same conclusions?
  • Do my appraisers reach the correct conclusions?
When to Use Purpose
Start of project Verify you can consistently measure categorical process outputs before attempting to perform a baseline analysis.
Mid-project Verify you can consistently measure appropriate categorical process inputs.
End of project Verify you can consistently measure categorical process outputs after improvements have been made.
End of project Verify you can consistently measure key categorical inputs that need to be controlled to maintain the improvements.


Discrete Y that can be binary (for example, good or bad), nominal, or ordinal.


  1. Select representative units from each possible process outcome. For example, you want roughly equal number of parts that are rated good or bad; red, yellow, or green; or 1, 2, 3, 4, or 5.
  2. Obtain expert opinion regarding standard values.
  3. Have multiple appraisers (three or more is recommended) evaluate each part multiple times.
  4. Set up your measurement data in a single column or in multiple columns:
    • When using single columns, place all measurement data in one column, the corresponding part identifiers in a second column, and the corresponding appraiser name or number in a third column.
    • When using multiple columns, enter part identifiers and then the measurements for each trial by each appraiser in consecutive columns. In Minitab, enter the number of columns for each appraiser, and then enter the standard/attribute values for each part (for whichever way you set up the data).
  5. If the data are ordinal, choose that option.


General Guidelines

  • Operators can be replaced by another factor (for example, you have three different gages that are supposed to be identical, or you want to evaluate three different operating temperatures).
  • You might need to take at least 50 items to effectively answer the questions posed in the Summary, provided the correct appraisals (for example, good or bad) are fairly evenly divided.

Example and Comments

Assessment Disagreement Table

# Pass / # Fail / Appraiser Fail Percent Pass Percent # Mixed Percent Ed 4 9.30 2 3.51 1 1.00 Ann 5 11.63 0 0.00 1 1.00 Ted 2 4.65 0 0.00 10 10.00 # Pass / Fail: Assessments across trials = Pass / standard = Fail. # Fail / Pass: Assessments across trials = Fail / standard = Pass. # Mixed: Assessments across trials are not identical.
Notes on the output above:
  • This table provides data on whether a given appraiser is biased toward the producer (high percentage of Pass/Fail), biased toward the consumer (high percentage of Fail/Pass), cannot agree with themselves (high percentage of # Mixed), or makes few errors.
  • In this case, Ann never calls a "good" part "bad" (#Fail/Pass); however, in 11.63% of the time, she calls "bad" parts "good" (#Pass/Fail).

Interpretation of the "All Appraisers vs. Standard" Output

Assessment Agreement

# Inspected # Matched Percent 95 % CI 100 83 83.00 (74.18, 89.77) # Matched: All appraisers' assessments agree with the known standard.

Fleiss' Kappa Statistics

Response Kappa SE Kappa Z P(vs > 0) Fail 0.869267 0.0408248 21.2926 0.0000 Pass 0.869267 0.0408248 21.2926 0.0000
Notes on the output above:
  • A typical guideline for percentage matched assessment agreement is a minimum of 80% to 90% for the case of 3 operators and 2 trials. In this example, 83% is marginally acceptable.
  • The guideline for the Kappa statistic is a minimum of 0.75 and preferably > 0.9 for the case of 100 samples. The Kappa statistic’s accuracy declines with smaller sample sizes. In this case, the Kappa of 0.869 is acceptable.

For the ordinal data case, the output includes Kendall’s statistics, as shown below:

Assessment Agreement

# Inspected # Matched Percent 95 % CI 50 35 70.00 (55.39, 82.14) # Matched: All appraisers' assessments agree with the known standard.

Fleiss' Kappa Statistics

Response Kappa SE Kappa Z P(vs > 0) -2 0.902252 0.0577350 15.6275 0.0000 -1 0.924193 0.0577350 16.0075 0.0000 0 0.807302 0.0577350 13.9829 0.0000 1 0.803252 0.0577350 13.9127 0.0000 2 0.847198 0.0577350 14.6739 0.0000 Overall 0.857454 0.0291269 29.4385 0.0000

Kendall's Correlation Coefficient

Coef SE Coef Z P 0.910976 0.0398410 22.8619 0.0000
Notes on the output above:
  • Kendall’s correlation coefficient: If the p-value is less than 0.05, assume the alternate hypothesis is true – agreement exists among the appraisers and between the appraisers and the standards.
  • Kendall’s correlation coefficient ranges from -1 to 1. High, positive values imply a strong association and correct appraisals, which may or may not be of practical significance (depending on the situation).
By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy