1. Development of a Machine Learning Model Using Electrocardiogram Signals to Improve Pulmonary Embolism Screening
- Author
-
Girish N. Nadkarni, Isotta Landi, Benjamin S. Glicksberg, Sukrit Narula, Hossein Honarvar, Edgar Argulian, Arvind Kumar, Robert Freeman, Yeraz Khachatoorian, Shan P. Zhao, Arsalan Rehmani, Sulaiman S. Somani, Shawn Lee, Alexander C. Kagen, Adam Russak, Suraj Jaladanki, Andrew Kim, Shelly Teng, Matthew A. Levin, and Jessica K. De Freitas
- Subjects
medicine.medical_specialty ,Pulmonary angiogram ,business.industry ,Retrospective cohort study ,Institutional review board ,medicine.disease ,Pulmonary embolism ,Feature (computer vision) ,Test set ,Emergency medicine ,medicine ,Geneva score ,business ,Sensitivity analyses - Abstract
Background: Clinical scoring systems for pulmonary embolism (PE) screening have low specificity and contribute to CT pulmonary angiogram (CTPA) overuse. We assessed whether deep learning models using an existing and routinely collected data modality, electrocardiogram (ECG) waveforms, can increase specificity for PE detection. Methods: We use clinical variables, annotated CTPAs, and ECG waveform and morphological parameter data from five hospitals to conduct a retrospective cohort study and develop three models to predict PE likelihood: an ECG model using only ECG waveform data, an EHR model using tabular clinical data, and a Fusion model integrating tabular clinical data and an embedded representation of the ECG waveform. We benchmark the best model against four clinical scores: Wells’ Criteria, Revised Geneva Score, Pulmonary Embolism Rule-Out Criteria, and 4-Level Pulmonary Embolism Clinical Probability Score. Finally, we investigate model robustness through feature sensitivity analyses and assess for demographic subgroup performance parity. Findings: We create a dataset linking 23,793 CTPAs (10·0% PE-positive) and 320,746 ECGs from 21,183 patients for model development and testing. We find that a Fusion model (area under receiver-operating characteristic [AUROC] 0·81 ± 0·01) outperforms both the ECG model (AUROC 0·59 ± 0·01) and EHR model (AUROC 0·65 ± 0·01). On a sample of 100 patients from the test set, the Fusion model has greater specificity (0·18) and performance (AUROC 0·84 ± 0.01) than the all four clinical criteria (AUROC 0·50-0·58, specificity 0·00-0·05). The model also retains superiority over clinical scores in feature sensitivity analyses (AUROC 0·66 to 0·84) and achieves comparable performance across different sex (AUROC 0·81) and racial (AUROC 0·77 to 0·84) subgroups. Interpretation: Integration of electrocardiogram waveforms with traditional clinical variables synergistically increases prediction performance and specificity for PE detection in those who are at least at moderate suspicion for PE. Funding: National Center for Advancing Translational Sciences, National Institutes of Health. Declaration of Interest: All authors declare no competing interests with this study. Ethical Approval: The study was approved by the Mount Sinai Institutional Review Board.
- Published
- 2021