Attribute agreement analysis

Use an attribute agreement analysis to evaluate how well appraisers can match each other and the standard (expert). You can use an attribute agreement analysis for binary data (good or bad), nominal data (yellow, blue, brown), or ordinal data (1, 2, 3, 4, where categories are value-ordered). Evaluation includes various % agreement analyses in addition to Kappa and Kendall’s metrics.

Answers the questions:
  • Does the same appraiser reach the same conclusion when evaluating the same samples repeatedly?
  • Do different appraisers reach the same conclusions?
  • Do my appraisers reach the correct conclusions?
When to Use Purpose
Start of project Verify you can consistently measure categorical process outputs before attempting to perform a baseline analysis.
Mid-project Verify you can consistently measure appropriate categorical process inputs.
End of project Verify you can consistently measure categorical process outputs after improvements have been made.
End of project Verify you can consistently measure key categorical inputs that need to be controlled to maintain the improvements.

Data

Your data must be discrete Y, which can be binary, nominal, or ordinal.

Guidelines

  • Operators can be replaced by another factor (for example, you have three different gages that are supposed to be identical, or you want to evaluate three different operating temperatures).
  • You might need at least 50 items to effectively answer the previous questions, provided the correct appraisals (for example, good or bad) are about even.

Example and Comments

Assessment Disagreement Table

# Pass / # Fail / Appraiser Fail Percent Pass Percent # Mixed Percent Ed 4 9.30 2 3.51 1 1.00 Ann 5 11.63 0 0.00 1 1.00 Ted 2 4.65 0 0.00 10 10.00 # Pass / Fail: Assessments across trials = Pass / standard = Fail. # Fail / Pass: Assessments across trials = Fail / standard = Pass. # Mixed: Assessments across trials are not identical.
Notes on the preceding output:
  • This table provides data on whether a given appraiser is biased toward the producer (high percentage of Pass/Fail), biased toward the consumer (high percentage of Fail/Pass), cannot agree with themselves (high percentage of # Mixed), or makes few errors.
  • In this case, Ann never calls a "good" part "bad" (#Fail/Pass); however, in 11.63% of the time, she calls "bad" parts "good" (#Pass/Fail).

Interpretation of the "All Appraisers vs. Standard" Output

Assessment Agreement

# Inspected # Matched Percent 95 % CI 100 83 83.00 (74.18, 89.77) # Matched: All appraisers' assessments agree with the known standard.

Fleiss' Kappa Statistics

Response Kappa SE Kappa Z P(vs > 0) Fail 0.869267 0.0408248 21.2926 0.0000 Pass 0.869267 0.0408248 21.2926 0.0000
Notes on the preceding output:
  • A typical guideline for percentage matched assessment agreement is a minimum of 80% to 90% for the case of 3 operators and 2 trials. In this example, 83% is marginally acceptable.
  • The guideline for the Kappa statistic is a minimum of 0.75 and preferably > 0.9 for the case of 100 samples. The Kappa statistic’s accuracy declines with smaller sample sizes. In this case, the Kappa of 0.869 is acceptable.

For the ordinal data case, the output includes Kendall’s statistics, as shown below:

Assessment Agreement

# Inspected # Matched Percent 95 % CI 50 35 70.00 (55.39, 82.14) # Matched: All appraisers' assessments agree with the known standard.

Fleiss' Kappa Statistics

Response Kappa SE Kappa Z P(vs > 0) -2 0.902252 0.0577350 15.6275 0.0000 -1 0.924193 0.0577350 16.0075 0.0000 0 0.807302 0.0577350 13.9829 0.0000 1 0.803252 0.0577350 13.9127 0.0000 2 0.847198 0.0577350 14.6739 0.0000 Overall 0.857454 0.0291269 29.4385 0.0000

Kendall's Correlation Coefficient

Coef SE Coef Z P 0.910976 0.0398410 22.8619 0.0000
Notes on the preceding output:
  • Kendall’s correlation coefficient: If the p-value is less than 0.05, assume the alternate hypothesis is true – agreement exists among the appraisers and between the appraisers and the standards.
  • Kendall’s correlation coefficient ranges from -1 to 1. High, positive values imply a strong association and correct appraisals, which may or may not be of practical significance (depending on the situation).

How-to

  1. Select representative units from each possible process outcome. For example, you want roughly equal number of parts that are rated good or bad; red, yellow, or green; or 1, 2, 3, 4, or 5.
  2. Obtain expert opinion regarding standard values.
  3. Have multiple appraisers (three or more is recommended) evaluate each part multiple times.
  4. Set up your measurement data in a single column or in multiple columns:
    • When using single columns, place all measurement data in one column, the corresponding part identifiers in a second column, and the corresponding appraiser name or number in a third column.
    • When using multiple columns, enter part identifiers and then the measurements for each trial by each appraiser in consecutive columns. In Minitab, enter the number of columns for each appraiser, and then enter the standard/attribute values for each part (for whichever way you set up the data).
  5. If the data are ordinal, choose that option.

For more information, go to Insert an analysis capture tool.

By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy