TY - JOUR
T1 - Performance and Agreement When Annotating Chest X-ray Text Reports—A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System
AU - Li, Dana
AU - Pehrson, Lea Marie
AU - Bonnevie, Rasmus
AU - Fraccaro, Marco
AU - Thrane, Jakob
AU - Tøttrup, Lea
AU - Lauridsen, Carsten Ammitzbøl
AU - Butt Balaganeshan, Sedrah
AU - Jankovic, Jelena
AU - Andersen, Tobias Thostrup
AU - Mayar, Alyas
AU - Hansen, Kristoffer Lindskov
AU - Carlsen, Jonathan Frederik
AU - Darkner, Sune
AU - Nielsen, Michael Bachmann
N1 - Publisher Copyright: © 2023 by the authors.
PY - 2023/3
Y1 - 2023/3
N2 - A chest X-ray report is a communicative tool and can be used as data for developing artificial intelligence-based decision support systems. For both, consistent understanding and labeling is important. Our aim was to investigate how readers would comprehend and annotate 200 chest X-ray reports. Reports written between 1 January 2015 and 11 March 2022 were selected based on search words. Annotators included three board-certified radiologists, two trained radiologists (physicians), two radiographers (radiological technicians), a non-radiological physician, and a medical student. Consensus labels by two or more of the experienced radiologists were considered “gold standard”. Matthew’s correlation coefficient (MCC) was calculated to assess annotation performance, and descriptive statistics were used to assess agreement between individual annotators and labels. The intermediate radiologist had the best correlation to “gold standard” (MCC 0.77). This was followed by the novice radiologist and medical student (MCC 0.71 for both), the novice radiographer (MCC 0.65), non-radiological physician (MCC 0.64), and experienced radiographer (MCC 0.57). Our findings showed that for developing an artificial intelligence-based support system, if trained radiologists are not available, annotations from non-radiological annotators with basic and general knowledge may be more aligned with radiologists compared to annotations from sub-specialized medical staff, if their sub-specialization is outside of diagnostic radiology.
AB - A chest X-ray report is a communicative tool and can be used as data for developing artificial intelligence-based decision support systems. For both, consistent understanding and labeling is important. Our aim was to investigate how readers would comprehend and annotate 200 chest X-ray reports. Reports written between 1 January 2015 and 11 March 2022 were selected based on search words. Annotators included three board-certified radiologists, two trained radiologists (physicians), two radiographers (radiological technicians), a non-radiological physician, and a medical student. Consensus labels by two or more of the experienced radiologists were considered “gold standard”. Matthew’s correlation coefficient (MCC) was calculated to assess annotation performance, and descriptive statistics were used to assess agreement between individual annotators and labels. The intermediate radiologist had the best correlation to “gold standard” (MCC 0.77). This was followed by the novice radiologist and medical student (MCC 0.71 for both), the novice radiographer (MCC 0.65), non-radiological physician (MCC 0.64), and experienced radiographer (MCC 0.57). Our findings showed that for developing an artificial intelligence-based support system, if trained radiologists are not available, annotations from non-radiological annotators with basic and general knowledge may be more aligned with radiologists compared to annotations from sub-specialized medical staff, if their sub-specialization is outside of diagnostic radiology.
KW - agreement
KW - artificial intelligence
KW - chest X-ray
KW - data
KW - deep learning
KW - development
KW - performance
KW - radiologists
KW - text annotation
UR - http://www.scopus.com/inward/record.url?scp=85152616848&partnerID=8YFLogxK
U2 - 10.3390/diagnostics13061070
DO - 10.3390/diagnostics13061070
M3 - Journal article
SN - 2075-4418
VL - 13
JO - Diagnostics
JF - Diagnostics
IS - 6
M1 - 1070
ER -