Unfolding the phenomenon of inter-rater agreement: a stepwise analytic approach for in-depth examination proposed

Bjørn Slaug, Oliver Schilling, Tina Helle, Susanne Iwarsson, Gunilla Carlsson, Åse Brandt

Publikation: Bidrag til tidsskriftTidsskriftsartikelForskningpeer review


Objective: The overall objective was to unfold the phenomenon of inter-rater agreement: to identify potential sources of variation in agreement data and to explore how they can be statistically accounted for. The ultimate aim was to propose recommendations for in-depth examination of agreement, in order to improve the reliability of assessment instruments.

Study Design and Setting: Utilizing a sample where 10 rater pairs had assessed the presence/absence of 188 environmental barriers by a systematic rating form, a raters  items dataset was generated (N=1,880). In addition to common agreement indices, relative shares of agreement variation were calculated. Multilevel regression analysis was carried out, using rater and item characteristics as predictors of agreement variation.

Results: The raters accounted for 6-11 % of the agreement variation, the items for 33-39 % and the contexts for 53-60 %. Multilevel regression analysis showed barrier prevalence and raters’ familiarity with using standardized instruments to have the strongest impact on agreement, though for study design reasons contextual characteristics were not included.

Conclusion: Supported by a conceptual analysis, we propose an approach of in-depth examination of agreement variation, as a strategy for increasing the level of inter-rater agreement. By identifying and limiting the most important sources of disagreement, ultimately instrument reliability can be improved.

Keywords: inter-rater, reliability, sources of disagreement, kappa, methodology, recommendations

TidsskriftJournal of Clinical Epidemiology
StatusUdgivet - 2011