1. Towards Accountable AI-Assisted Eye Disease Diagnosis: Workflow Design, External Validation, and Continual Learning
- Author
-
Chen, Qingyu, Keenan, Tiarnan D L, Agron, Elvira, Allot, Alexis, Guan, Emily, Duong, Bryant, Elsawy, Amr, Hou, Benjamin, Xue, Cancan, Bhandari, Sanjeeb, Broadhead, Geoffrey, Cousineau-Krieger, Chantal, Davis, Ellen, Gensheimer, William G, Grasic, David, Gupta, Seema, Haddock, Luis, Konstantinou, Eleni, Lamba, Tania, Maiberger, Michele, Mantopoulos, Dimosthenis, Mehta, Mitul C, Nahri, Ayman G, AL-Nawaflh, Mutaz, Oshinsky, Arnold, Powell, Brittany E, Purt, Boonkit, Shin, Soo, Stiefel, Hillary, Thavikulwat, Alisa T, Wroblewski, Keith James, Chung, Tham Yih, Cheung, Chui Ming Gemmy, Cheng, Ching-Yu, Chew, Emily Y, Hribar, Michelle R., Chiang, Michael F., and Lu, Zhiyong
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Timely disease diagnosis is challenging due to increasing disease burdens and limited clinician availability. AI shows promise in diagnosis accuracy but faces real-world application issues due to insufficient validation in clinical workflows and diverse populations. This study addresses gaps in medical AI downstream accountability through a case study on age-related macular degeneration (AMD) diagnosis and severity classification. We designed and implemented an AI-assisted diagnostic workflow for AMD, comparing diagnostic performance with and without AI assistance among 24 clinicians from 12 institutions with real patient data sampled from the Age-Related Eye Disease Study (AREDS). Additionally, we demonstrated continual enhancement of an existing AI model by incorporating approximately 40,000 additional medical images (named AREDS2 dataset). The improved model was then systematically evaluated using both AREDS and AREDS2 test sets, as well as an external test set from Singapore. AI assistance markedly enhanced diagnostic accuracy and classification for 23 out of 24 clinicians, with the average F1-score increasing by 20% from 37.71 (Manual) to 45.52 (Manual + AI) (P-value < 0.0001), achieving an improvement of over 50% in some cases. In terms of efficiency, AI assistance reduced diagnostic times for 17 out of the 19 clinicians tracked, with time savings of up to 40%. Furthermore, a model equipped with continual learning showed robust performance across three independent datasets, recording a 29% increase in accuracy, and elevating the F1-score from 42 to 54 in the Singapore population.
- Published
- 2024