Back to Search
Start Over
A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data
- Source :
- Khalid , S , Yang , C , Blacketer , C , Duarte-Salles , T , Fernández-Bertolín , S , Kim , C , Park , R W , Park , J , Schuemie , M J , Sena , A G , Suchard , M A , You , S C , Rijnbeek , P R & Reps , J M 2021 , ' A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data ' , Computer Methods and Programs in Biomedicine , vol. 211 , 106394 .
- Publication Year :
- 2021
-
Abstract
- Background and objective: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). Methods: We show step-by-step how to implement the analytics pipeline for the question: ‘In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?’. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. Results: Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logis
Details
- Database :
- OAIster
- Journal :
- Khalid , S , Yang , C , Blacketer , C , Duarte-Salles , T , Fernández-Bertolín , S , Kim , C , Park , R W , Park , J , Schuemie , M J , Sena , A G , Suchard , M A , You , S C , Rijnbeek , P R & Reps , J M 2021 , ' A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data ' , Computer Methods and Programs in Biomedicine , vol. 211 , 106394 .
- Notes :
- application/pdf, English
- Publication Type :
- Electronic Resource
- Accession number :
- edsoai.on1313637905
- Document Type :
- Electronic Resource