1. Using artificial intelligence to read chest radiographs for tuberculosis detection: A multi-site evaluation of the diagnostic accuracy of three deep learning systems
- Author
-
Andrew J. Codlin, Jacob Creswell, Santat Sudrungrot, Collins N. Titahong, Lekha Puri, Bishwa Rai, E. Jane Carter, Melissa S. Sander, Zhi Zhen Qin, Lal Mani Adhikari, and Sylvain N. Laah
- Subjects
Adult ,DNA, Bacterial ,Male ,medicine.medical_specialty ,Tuberculosis ,Radiography ,Population ,lcsh:Medicine ,Diagnostic accuracy ,Sensitivity and Specificity ,Article ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,Population screening ,0302 clinical medicine ,Deep Learning ,Nepal ,Pulmonary tuberculosis ,Medicine ,Humans ,Mass Screening ,030212 general & internal medicine ,Cameroon ,education ,lcsh:Science ,Tuberculosis, Pulmonary ,Retrospective Studies ,education.field_of_study ,Multidisciplinary ,business.industry ,Deep learning ,lcsh:R ,Multi site ,Mycobacterium tuberculosis ,Middle Aged ,medicine.disease ,Triage ,Data Accuracy ,Area Under Curve ,Female ,Radiography, Thoracic ,lcsh:Q ,Radiology ,Artificial intelligence ,business ,Nucleic Acid Amplification Techniques - Abstract
Deep learning (DL) neural networks have only recently been employed to interpret chest radiography (CXR) to screen and triage people for pulmonary tuberculosis (TB). No published studies have compared multiple DL systems and populations. We conducted a retrospective evaluation of three DL systems (CAD4TB, Lunit INSIGHT, and qXR) for detecting TB-associated abnormalities in chest radiographs from outpatients in Nepal and Cameroon. All 1196 individuals received a Xpert MTB/RIF assay and a CXR read by two groups of radiologists and the DL systems. Xpert was used as the reference standard. The area under the curve of the three systems was similar: Lunit (0.94, 95% CI: 0.93–0.96), qXR (0.94, 95% CI: 0.92–0.97) and CAD4TB (0.92, 95% CI: 0.90–0.95). When matching the sensitivity of the radiologists, the specificities of the DL systems were significantly higher except for one. Using DL systems to read CXRs could reduce the number of Xpert MTB/RIF tests needed by 66% while maintaining sensitivity at 95% or better. Using a universal cutoff score resulted different performance in each site, highlighting the need to select scores based on the population screened. These DL systems should be considered by TB programs where human resources are constrained, and automated technology is available.
- Published
- 2019
- Full Text
- View/download PDF