Assessment at UNCW

General Education Assessment

2015 Annual Report

There were a number of common papers scored between each pair or trio of faculty scorers so that interrater reliability could be assessed (86 out of 100, or 86% of the total number of papers, though not all dimensions were necessarily scored for all common papers). More than 85% of all common scores were either in agreement or within one score level for the four DV dimensions. The following table shows the reliability measures for Diversity.



Percent Agreement

Plus Percent Adjacent

Krippendorff's alpha

DV 1 Factual Knowledge





DV 2 Knowledge of Diverse Perspectives and Roots





DV 3 Examining Diversity, History, and Culture





DV 4 Evaluating Claims and Theories





Interrater reliability is a measure of the degree of agreement between scorers, and provides information about the trustworthiness of the data. It helps answer the question-would a different set of scorers at a different time arrive at the same conclusions? In practice, interrater reliability is enhanced over time through scorer discussion, as well as through improvements to the scoring rubric. Percent Agreement, Percent Agreement Plus Adjacent, and Krippendorff's Alpha measure scorer agreement. The UNCW benchmark is .67 for Krippendorff's Alpha. See A Note on Interrater Reliability Measures for a more complete discussion of these statistics and the determination of benchmark levels.

Comparing the results of the reliability indices for this study to the benchmark of .67 for Krippendorff's Alpha, there are no dimension of the rubric that meets these standards. Looking at percent agreement plus adjacent (that is, the scores that were within one level of each other), we find that all dimensions had at least 86.8% of scores in agreement or within one level of each other.