Start Over

Investigating the impact of development and internal validation design when training prognostic models using a retrospective cohort in big US observational healthcare data

Authors :: Reps, Jenna M
Ryan, Patrick
Rijnbeek, P R
Medical Informatics
Source :: BMJ Open, 11(12):e050146. BMJ Publishing Group, BMJ Open
Publication Year :: 2021
Abstract: ObjectiveThe internal validation of prediction models aims to quantify the generalisability of a model. We aim to determine the impact, if any, that the choice of development and internal validation design has on the internal performance bias and model generalisability in big data (n~500 000).DesignRetrospective cohort.SettingPrimary and secondary care; three US claims databases.Participants1 200 769 patients pharmaceutically treated for their first occurrence of depression.MethodsWe investigated the impact of the development/validation design across 21 real-world prediction questions. Model discrimination and calibration were assessed. We trained LASSO logistic regression models using US claims data and internally validated the models using eight different designs: ‘no test/validation set’, ‘test/validation set’ and cross validation with 3-fold, 5-fold or 10-fold with and without a test set. We then externally validated each model in two new US claims databases. We estimated the internal validation bias per design by empirically comparing the differences between the estimated internal performance and external performance.ResultsThe differences between the models’ internal estimated performances and external performances were largest for the ‘no test/validation set’ design. This indicates even with large data the ‘no test/validation set’ design causes models to overfit. The seven alternative designs included some validation process to select the hyperparameters and a fair testing process to estimate internal performance. These designs had similar internal performance estimates and performed similarly when externally validated in the two external databases.ConclusionsEven with big data, it is important to use some validation process to select the optimal hyperparameters and fairly assess internal validation using a test set or cross-validation.

Subjects :: Logistic Models
Bias
statistics & research methods
Humans
Health Informatics
General Medicine
preventive medicine
Prognosis
Delivery of Health Care
Retrospective Studies

Details

Language :: English
ISSN :: 20446055
Database :: OpenAIRE
Journal :: BMJ Open, 11(12):e050146. BMJ Publishing Group, BMJ Open
Accession number :: edsair.doi.dedup.....4127fa65583accb032eeafe0bd8512b2

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Investigating the impact of development and internal validation design when training prognostic models using a retrospective cohort in big US observational healthcare data

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Investigating the impact of development and internal validation design when training prognostic models using a retrospective cohort in big US observational healthcare data

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources