Li Xingyi, Ping Zuo, Yulong Xiong, Xiaoyun Yang, Guohua Wan, Shouxing Hu, Jia Ding, Beitong Zhou, Hongling Zhu, Yonge Li, Hang Yin, Cheng Cheng, Fan Lin, Binran Wang, Jingyi Wang, and Ye Yuan
Background: Electrocardiography (ECG) is a fundamental diagnostic tool, frequently used in clinical practice, for testing heart conditions. Recently, computer-aided ECG interpretation has become more widely investigated. However, researches on concurrent ECG diagnosis for multiple heart abnormalities are limited and therefore a market-applicable multi-label automatic diagnosis framework that covers a wide range of arrhythmias, with better-than-human accuracy in diagnosing ECGs with multi-labels, has not yet been developed. We therefore aimed to engineer a deep learning approach for the automated multi-label diagnosis of impulse or conduction abnormalities by realtime ECG analysis. Methods: We utilized a dataset with 70,692 patients, consisting of adults with age ≥18 years. 180112 ECGs (standard 10-second and 12-lead format) with 21 distinct rhythm classes, including various impulse or conduction abnormalities, were used for the diagnosis of arrhythmias in multi-label and single-label level. Base rhythm labels were assessed by trained personnel under the supervision of cardiologists. We ensured that each abnormality had a sample size of at least 4,000 ECGs, except idioventricular rhythm (1,780 ECGs due to the rarity of said disease), for compiling a balanced dataset. We allocated ECGs to a training or a validation dataset for the multi-label diagnostic model training, and evaluated model performance on a test dataset by calculating the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, as well as quantifying accuracy, sensitivity, specificity, and the F1 score. The test dataset was annotated by the consensus of a committee of board-certified actively-practicing cardiologists. Validation of results for the test dataset were compared to diagnoses made by 53 ECG cardiologists who have a wide range of ECG interpretation experience (range=0-12+ years). Findings: We demonstrated that in multi-label level, our convolutional neural network (CNN) approach toward diagnosing heart abnormalities resulted in AUC scores above 0.913 for most of the diseases, with an average score of 0.985 (95% confidence interval (CI) 0.988-0.994). The model obtained an average F1 score of 0.890 (95% CI0.877- 0.903), sensitivity of 0.869 (95% CI 0.851-0.887), specificity of 0.995 (95% CI 0.994- 0.996). The average F1 score computed from our deep learning model exceeded the score from human cardiologists regardless of ECG interpretation experience (0 12 years: 0.829). The CNN outputs also showed identical or increased performance with respect to average sensitivity and accuracy as compared to the cardiologists. In single-label level, the CNN achieved an average F1 of 0.918, higher than the average level of the cardiologists 0.798. Interpretation: Our automatic ECG diagnosis system is better than human cardiologists to classify particular waveforms into clinically defined categories accurately, to distinguish a wide range of distinct arrhythmias, and to apply both single label and multiple label ECGs. Our study is therefore a promising foundation for the deployment of computational decision-support systems in practical clinical applications. Funding Statement: Project supported by the Major International (Regional) Joint Research Project, the Ministry of Management Science, the National Natural Science Foundation of China (Grant No. 71520107003). Declaration of Interests: Dr. Yuan reports grants from National Natural Science Foundation of China, during the conduct of the study; In addition, Dr. Yuan has a patent An automatic ECG classification method, system and equipment based on deep learning algorithm pending to Ye Yuan, Xiaoyun Yang, Hongling Zhu, Yiran Wang, Cheng Cheng, Xingyi Li, Hang Yin, Jingyi Wang. There are no other conflicts of interest. Ethics Approval Statement: To protect patients’ privacy, we used anonymized data as to prevent breaches. Accordingly, a written informed consent was not required for this study, since the ECG samples were appropriately anonymized and de-identified, as according to the Health Insurance Portability and Accountability Act Safe Harbor provision. The study design was evaluated and exempted from full review by the Huazhong University of Science and Technology Institutional Review Board.