Data considerations for Attribute Agreement Analysis

To ensure that your results are valid, consider the following guidelines when you collect data, perform the analysis, and interpret your results.

Appraisers should evaluate samples in a random order: To ensure that the data collection order does not influence the results, each appraiser should evaluate all samples randomly within a replicate. After all appraiser evaluate all samples one time, you repeat the process for all replicates.
You can have a known reference rating for each sample: A reference value (also called master value) is the known and correct rating for a standard sample. For example, you have a set of standard fabric samples with a known and correct print quality rating. You use these samples to assess appraisers' ability to rate print quality correctly.
You should have at least 50 samples for an adequate study: You need at least 50 samples to obtain adequate estimates of agreement. Select samples from the entire range of process variation. Having fewer replicates of many samples is better than having many replicates of fewer samples.
Appraisers should rate each sample at least twice: To assess the ability of an appraiser to consistently evaluate the same sample, each appraiser should rate each sample at least twice in random order.; Replication is important, but can be tedious. When planning resources, remember that it is better to have more samples evaluated in random order with less replicates, instead of more replicates of fewer samples that are not evaluated in random order.
You should have at least 3 appraisers for an adequate study: For the best results, include 3 to 5 appraisers in your study. You should not have fewer than 3 appraisers in the study, unless the number of appraisers who use the measurement system is actually less than 3. If you suspect that there are large differences between appraisers, consider using more than 3 to 5 appraisers. If you identify differences between appraisers, such as an appraiser whose accuracy is lower than other appraisers, you can often improve consistency with training.; Select appraisers who are representative of all the appraisers who use the measurement system. If you perform the study with only the best (or worst) appraisers, the results will be biased and will not provide an accurate estimate of appraiser differences. The best way to ensure accuracy is to randomly select the appraisers for the study.
Appraisers should rate approximately the same number of samples from each category: For the best results, you should have a moderately balanced mix of samples from the different categories so that you can evaluate the appraiser's ability to rate samples from each category with similar precision. If you have a smaller percentage of samples from one category, the estimates for that category may be less precise.; When the response is binary (such as pass/fail, or yes/no), you need several samples that are marginally acceptable and several that are marginally unacceptable. For example, a reasonable number of the samples that pass should be barely passing.
The attribute agreement analysis must be balanced: Each appraiser must evaluate each sample the same number of times.