Back to Search Start Over

Explainable AI for Classifying UTI Risk Groups Using a Real-World Linked EHR and Pathology Lab Dataset

Authors :
Dai, Yujie
Sullivan, Brian
Montout, Axel
Dillon, Amy
Waller, Chris
Acs, Peter
Denholm, Rachel
Williams, Philip
Hay, Alastair D
Santos-Rodriguez, Raul
Dowsey, Andrew
Publication Year :
2024

Abstract

The use of machine learning and AI on electronic health records (EHRs) holds substantial potential for clinical insight. However, this approach faces significant challenges due to data heterogeneity, sparsity, temporal misalignment, and limited labeled outcomes. In this context, we leverage a linked EHR dataset of approximately one million de-identified individuals from Bristol, North Somerset, and South Gloucestershire, UK, to characterize urinary tract infections (UTIs) and develop predictive models focused on data quality, fairness and transparency. A comprehensive data pre-processing and curation pipeline transforms the raw EHR data into a structured format suitable for AI modeling. Given the limited availability and biases of ground truth UTI outcomes, we introduce a UTI risk estimation framework informed by clinical expertise to estimate UTI risk across individual patient timelines. Using this framework, we built pairwise XGBoost models to differentiate UTI risk categories with explainable AI techniques to identify key predictors while ensuring interpretability. Our findings reveal differences in clinical and demographic factors across risk groups, offering insights into UTI risk stratification and progression. This study demonstrates the added value of AI-driven insights into UTI clinical decision-making while prioritizing interpretability, transparency, and fairness, underscoring the importance of sound data practices in advancing health outcomes.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2411.17645
Document Type :
Working Paper