Assessment at UNCW

General Education Assessment

2015 Annual Report

There were a number of common papers scored between each pair of faculty scorers so that interrater reliability could be assessed (107 out of 174, or 61.5% of the total number of papers, though not all dimensions were necessarily scored for all common papers). More than 70% of all common scores were either in agreement or within one score level for the five GC dimensions. The following table shows the reliability measures for Global Citizenship.

Dimension

N

Percent Agreement

Plus Percent Adjacent

Krippendorff's alpha

GC 1 Factual Knowledge

107

47.7%

84.1%

.4933

GC 2 Knowledge of Connections

84

33.3%

76.2%

-.1948

GC 3 Use of Diverse Cultural Frames

107

57.9%

94.4%

.3444

GC 4 Tolerance of Differences

107

44.9%

88.8%

.4516

GC 5 Ethical Responsibility

75

53.3%

72%

.0254

Interrater reliability is a measure of the degree of agreement between scorers, and provides information about the trustworthiness of the data. It helps answer the question-would a different set of scorers at a different time arrive at the same conclusions? In practice, interrater reliability is enhanced over time through scorer discussion, as well as through improvements to the scoring rubric. Percent Agreement, Percent Agreement Plus Adjacent, and Krippendorff's Alpha measure scorer agreement. The UNCW benchmark is .67 for Krippendorff's Alpha. See A Note on Interrater Reliability Measures for a more complete discussion of these statistics and the determination of benchmark levels.

Comparing the results of the reliability indices for this study to the benchmark of .67 for Krippendorff's Alpha, there are no dimension of the rubric that meets these standards. Looking at percent agreement plus adjacent (that is, the scores that were within one level of each other), we find that all dimensions had at least 84.1% of scores in agreement or within one level of each other.