Back to Search Start Over

Development and validation of a pancreatic cancer risk model for the general population using electronic health records: An observational study.

Authors :
Appelbaum, Limor
Cambronero, José P.
Stevens, Jennifer P.
Horng, Steven
Pollick, Karla
Silva, George
Haneuse, Sebastien
Piatkowski, Gail
Benhaga, Nordine
Duey, Stacey
Stevenson, Mary A.
Mamon, Harvey
Kaplan, Irving D.
Rinard, Martin C.
Source :
European Journal of Cancer. Jan2021, Vol. 143, p19-30. 12p.
Publication Year :
2021

Abstract

Pancreatic ductal adenocarcinoma (PDAC) is often diagnosed at a late, incurable stage. We sought to determine whether individuals at high risk of developing PDAC could be identified early using routinely collected data. Electronic health record (EHR) databases from two independent hospitals in Boston, Massachusetts, providing inpatient, outpatient, and emergency care, from 1979 through 2017, were used with case–control matching. PDAC cases were selected using International Classification of Diseases 9/10 codes and validated with tumour registries. A data-driven feature selection approach was used to develop neural networks and L2-regularised logistic regression (LR) models on training data (594 cases, 100,787 controls) and compared with a published model based on hand-selected diagnoses ('baseline'). Model performance was validated on an external database (408 cases, 160,185 controls). Three prediction lead times (180, 270 and 365 days) were considered. The LR model had the best performance, with an area under the curve (AUC) of 0.71 (confidence interval [CI]: 0.67–0.76) for the training set, and AUC 0.68 (CI: 0.65–0.71) for the validation set, 365 days before diagnosis. Data-driven feature selection improved results over 'baseline' (AUC = 0.55; CI: 0.52–0.58). The LR model flags 2692 (CI 2592–2791) of 156,485 as high risk, 365 days in advance, identifying 25 (CI: 16–36) cancer patients. Risk stratification showed that the high-risk group presented a cancer rate 3 to 5 times the prevalence in our data set. A simple EHR model, based on diagnoses, can identify high-risk individuals for PDAC up to one year in advance. This inexpensive, systematic approach may serve as the first sieve for selection of individuals for PDAC screening programs. • Medical records can be used to identify people at high risk for pancreatic cancer. • The high-risk group identified 6–12 months before diagnosis, allowing early detection. • A data-driven approach is superior to hand-selected features for model prediction. • External validation of the model shows generalisability to new data. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09598049
Volume :
143
Database :
Academic Search Index
Journal :
European Journal of Cancer
Publication Type :
Academic Journal
Accession number :
147855224
Full Text :
https://doi.org/10.1016/j.ejca.2020.10.019