Jing Cao, Kun You, Jingxin Zhou, Mingyu Xu, Peifang Xu, Lei Wen, Shengzhan Wang, Kai Jin, Lixia Lou, Yao Wang, and Juan Ye
Summary: Background: Clinical application of artificial intelligence is limited due to the lack of interpretability and expandability in complex clinical settings. We aimed to develop an eye diseases screening system with improved interpretability and expandability based on a lesion-level dissection and tested the clinical expandability and auxiliary ability of the system. Methods: The four-hierarchical interpretable eye diseases screening system (IEDSS) based on a novel structural pattern named lesion atlas was developed to identify 30 eye diseases and conditions using a total of 32,026 ultra-wide field images collected from the Second Affiliated Hospital of Zhejiang University, School of Medicine (SAHZU), the First Affiliated Hospital of University of Science and Technology of China (FAHUSTC), and the Affiliated People's Hospital of Ningbo University (APHNU) in China between November 1, 2016 to February 28, 2022. The performance of IEDSS was compared with ophthalmologists and classic models trained with image-level labels. We further evaluated IEDSS in two external datasets, and tested it in a real-world scenario and an extended dataset with new phenotypes beyond the training categories. The accuracy (ACC), F1 score and confusion matrix were calculated to assess the performance of IEDSS. Findings: IEDSS reached average ACCs (aACC) of 0·9781 (95%CI 0·9739-0·9824), 0·9660 (95%CI 0·9591-0·9730) and 0·9709 (95%CI 0·9655-0·9763), frequency-weighted average F1 scores of 0·9042 (95%CI 0·8957-0·9127), 0·8837 (95%CI 0·8714-0·8960) and 0·8874 (95%CI 0·8772-0·8972) in datasets of SAHZU, APHNU and FAHUSTC, respectively. IEDSS reached a higher aACC (0·9781, 95%CI 0·9739-0·9824) compared with a multi-class image-level model (0·9398, 95%CI 0·9329-0·9467), a classic multi-label image-level model (0·9278, 95%CI 0·9189-0·9366), a novel multi-label image-level model (0·9241, 95%CI 0·9151-0·9331) and a lesion-level model without Adaboost (0·9381, 95%CI 0·9299-0·9463). In the real-world scenario, the aACC of IEDSS (0·9872, 95%CI 0·9828-0·9915) was higher than that of the senior ophthalmologist (SO) (0·9413, 95%CI 0·9321-0·9504, p = 0·000) and the junior ophthalmologist (JO) (0·8846, 95%CI 0·8722-0·8971, p = 0·000). IEDSS remained strong performance (ACC = 0·8560, 95%CI 0·8252-0·8868) compared with JO (ACC = 0·784, 95%CI 0·7479-0·8201, p= 0·003) and SO (ACC = 0·8500, 95%CI 0·8187-0·8813, p = 0·789) in the extended dataset. Interpretation: IEDSS showed excellent and stable performance in identifying common eye conditions and conditions beyond the training categories. The transparency and expandability of IEDSS could tremendously increase the clinical application range and the practical clinical value of it. It would enhance the efficiency and reliability of clinical practice, especially in remote areas with a lack of experienced specialists. Funding: National Natural Science Foundation Regional Innovation and Development Joint Fund (U20A20386), Key research and development program of Zhejiang Province (2019C03020), Clinical Medical Research Centre for Eye Diseases of Zhejiang Province (2021E50007).