1. Artificial Intelligence for Retinopathy of Prematurity: Validation of a Vascular Severity Scale against International Expert Diagnosis
- Author
-
Campbell, J. Peter, Chiang, Michael F., Chen, Jimmy S., Moshfeghi, Darius M., Nudleman, Eric, Ruambivoonsuk, Paisan, Cherwek, Hunter, Cheung, Carol Y., Singh, Praveer, Kalpathy-Cramer, Jayashree, Ostmo, Susan, Eydelman, Malvina, Chan, R.V. Paul, and Capone, Antonio
- Subjects
Diagnostic Imaging ,Ophthalmoscopy ,Artificial Intelligence ,Infant, Newborn ,Humans ,Reproducibility of Results ,Gestational Age ,Retinopathy of Prematurity ,Article - Abstract
OBJECTIVE: To validate a vascular severity score as an appropriate output for artificial intelligence (AI) Software as a Medical Device (SaMD) for retinopathy of prematurity (ROP) through comparison with ordinal disease severity labels for stage and plus disease assigned by the International Classification of ROP, 3rd edition (ICROP3) committee. DESIGN: Validation study of an AI-based ROP vascular severity score SUBJECTS, PARTICIPANTS, AND/OR CONTROLS: 34 ROP experts from the ICROP3 committee. METHODS: Two separate datasets of 30 fundus photographs each for stage (0–5) and plus disease (plus, pre-plus, neither) were labeled by members of the ICROP3 committee using an open-source platform. Averaging these results produced a continuous label for plus (1–9) and stage (1–3) for each image. Experts were also asked to compare each image to each other in terms of relative severity for plus disease. Each image was also labelled with a vascular severity score from the Imaging and Informatics in ROP deep learning (i-ROP DL) system, which was compared with each grader’s diagnostic labels for correlation, as well as the ophthalmoscopic diagnosis of stage. MAIN OUTCOME MEASURES: Weighted kappa and Pearson correlation coefficients (CC) were calculated between each pair of grader classification labels for stage and plus disease. The Elo algorithm was also used to convert pairwise comparisons for each expert into an ordered set of images from least to most severe. RESULTS: The mean weighted kappa and CC for all inter-observer pairs for plus disease image comparison was 0.67 and 0.88 respectively. The vascular severity score was found to be highly correlated with both the average plus disease classification (CC = 0.90, p < 0.001) and the ophthalmoscopic diagnosis of stage (p < 0.001 by ANOVA) among all experts. CONCLUSIONS: The ROP vascular severity score correlates well with the ICROP committee member’s labels for plus disease and stage, which had significant inter-grader variability. Generation of a consensus for a validated scoring system for ROP SaMD can facilitate global innovation and regulatory authorization of these technologies.
- Published
- 2022