1. Sociodemographic biases in a commercial AI model for intracranial hemorrhage detection.
- Author
-
Trang A, Putman K, Savani D, Chatterjee D, Zhao J, Kamel P, Jeudy JJ, Parekh VS, and Yi PH
- Subjects
- Humans, Female, Male, Retrospective Studies, Middle Aged, Socioeconomic Factors, Aged, Bias, Artificial Intelligence, Predictive Value of Tests, Intracranial Hemorrhages diagnostic imaging, Tomography, X-Ray Computed, Sensitivity and Specificity
- Abstract
Purpose: To evaluate whether a commercial AI tool for intracranial hemorrhage (ICH) detection on head CT exhibited sociodemographic biases., Methods: Our retrospective study reviewed 9736 consecutive, adult non-contrast head CT scans performed between November 2021 and February 2022 in a single healthcare system. Each CT scan was evaluated by a commercial ICH AI tool and a board-certified neuroradiologist; ground truth was defined as final radiologist determination of ICH presence/absence. After evaluating the AI tool's aggregate diagnostic performance, sub-analyses based on sociodemographic groups (age, sex, race, ethnicity, insurance status, and Area of Deprivation Index [ADI] scores) assessed for biases. χ
2 test or Fisher's exact tests evaluated for statistical significance with p ≤ 0.05., Results: Our patient population was 50% female (mean age 60 ± 19 years). The AI tool had an aggregate accuracy of 93% [9060/9736], sensitivity of 85% [1140/1338], specificity of 94% [7920/ 8398], positive predictive value (PPV) of 71% [1140/1618] and negative predictive value (NPV) of 98% [7920/8118]. Sociodemographic biases were identified, including lower PPV for patients who were females (67.3% [62,441/656] vs. 72.7% [699/962], p = 0.02), Black (66.7% [454/681] vs. 73.2% [686/937], p = 0.005), non-Hispanic/non-Latino (69.7% [1038/1490] vs. 95.4% [417/437]), p = 0.009), and who had Medicaid/Medicare (69.9% [754/1078]) or Private (66.5% [228/343]) primary insurance (p = 0.003). Lower sensitivity was seen for patients in the third quartile of national (78.8% [241/306], p = 0.001) and state ADI scores (79.0% [22/287], p = 0.001)., Conclusions: In our healthcare system, a commercial AI tool had lower performance for ICH detection than previously reported and demonstrated several sociodemographic biases., (© 2024. The Author(s), under exclusive licence to American Society of Emergency Radiology (ASER).)- Published
- 2024
- Full Text
- View/download PDF