1. An integrated approach to geographic validation helped scrutinize prediction model performance and its variability
- Author
-
Tsvetan R. Yordanov, Ricardo R. Lopes, Anita C.J. Ravelli, Marije Vis, Saskia Houterman, Henk Marquering, Ameen Abu-Hanna, Graduate School, Medical Informatics, APH - Methodology, Biomedical Engineering and Physics, Radiology and Nuclear Medicine, ACS - Heart failure & arrhythmias, APH - Health Behaviors & Chronic Diseases, ARD - Amsterdam Reproduction and Development, Cardiology, ACS - Atherosclerosis & ischemic syndromes, ACS - Pulmonary hypertension & thrombosis, APH - Aging & Later Life, ANS - Brain Imaging, and ANS - Neurovascular Disorders
- Subjects
Epidemiology ,Calibration ,Discrimination ,Subgroup discovery ,Heterogeneity ,Multicenter ,Prediction models - Abstract
Objectives: To illustrate in-depth validation of prediction models developed on multicenter data. Methods: For each hospital in a multicenter registry, we evaluated predictive performance of a 30-day mortality prediction model for transcatheter aortic valve implantation (TAVI) using the Netherlands heart registration (NHR) dataset. We measured discrimination and calibration per hospital in a leave-center-out analysis (LCOA). Meta-analysis was used to calculate I2 values per performance metric from the LCOA and to compute mean and confidence interval (CI) estimates. Case mix differences between studies were inspected using the framework of Debray et al. for understanding external validation. We also aimed to discover subgroups (SGs) with high model prediction error (PE) and their distribution over the centers. Results: We studied 16 hospitals with 11,599 TAVI patients with an early mortality of 3.7%. The models’ area under the curve (AUCs) had a wide range between hospitals from 0.59 to 0.79, and miscalibration occurred in seven hospitals. Mean AUC from meta-analysis was 0.68 (95% CI 0.65-0.70). I2 values were 0%, 74%, and 0% for AUC, calibration intercept and slope, respectively. Between-hospital case-mix differences were substantial, and model transportability was low. One SG was discovered with marked global PE and was associated with poor performance on validation centers. Conclusion: The illustrated combination of approaches provides useful insights to inspect multicenter-based prediction models, and it exposes their limitations in transportability and performance variability when applied to different populations.
- Published
- 2023
- Full Text
- View/download PDF