103 results on '"Jenna Reps"'
Search Results
2. Adaptation and validation of a coding algorithm for the Charlson Comorbidity Index in administrative claims data using the SNOMED CT standardized vocabulary
- Author
-
Stephen P. Fortin, Jenna Reps, and Patrick Ryan
- Subjects
Charlson comorbidity index ,SNOMED ,Common data model ,Quan ,Standardized vocabulary ,Validation ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Objectives The Charlson comorbidity index (CCI), the most ubiquitous comorbid risk score, predicts one-year mortality among hospitalized patients and provides a single aggregate measure of patient comorbidity. The Quan adaptation of the CCI revised the CCI coding algorithm for applications to administrative claims data using the International Classification of Diseases (ICD). The purpose of the current study is to adapt and validate a coding algorithm for the CCI using the SNOMED CT standardized vocabulary, one of the most commonly used vocabularies for data collection in healthcare databases in the U.S. Methods The SNOMED CT coding algorithm for the CCI was adapted through the direct translation of the Quan coding algorithms followed by manual curation by clinical experts. The performance of the SNOMED CT and Quan coding algorithms were compared in the context of a retrospective cohort study of inpatient visits occurring during the calendar years of 2013 and 2018 contained in two U.S. administrative claims databases. Differences in the CCI or frequency of individual comorbid conditions were assessed using standardized mean differences (SMD). Performance in predicting one-year mortality among hospitalized patients was measured based on the c-statistic of logistic regression models. Results For each database and calendar year combination, no significant differences in the CCI or frequency of individual comorbid conditions were observed between vocabularies (SMD ≤ 0.10). Specifically, the difference in CCI measured using the SNOMED CT vs. Quan coding algorithms was highest in MDCD in 2013 (3.75 vs. 3.6; SMD = 0.03) and lowest in DOD in 2018 (3.93 vs. 3.86; SMD = 0.02). Similarly, as indicated by the c-statistic, there was no evidence of a difference in the performance between coding algorithms in predicting one-year mortality (SNOMED CT vs. Quan coding algorithms, range: 0.725–0.789 vs. 0.723–0.787, respectively). A total of 700 of 5,348 (13.1%) ICD code mappings were inconsistent between coding algorithms. The most common cause of discrepant codes was multiple ICD codes mapping to a SNOMED CT code (n = 560) of which 213 were deemed clinically relevant thereby leading to information gain. Conclusion The current study repurposed an important tool for conducting observational research to use the SNOMED CT standardized vocabulary.
- Published
- 2022
- Full Text
- View/download PDF
3. Privacy-Preserving Federated Model Predicting Bipolar Transition in Patients With Depression: Prediction Model Development Study
- Author
-
Dong Yun Lee, Byungjin Choi, Chungsoo Kim, Egill Fridgeirsson, Jenna Reps, Myoungsuk Kim, Jihyeong Kim, Jae-Won Jang, Sang Youl Rhee, Won-Woo Seo, Seunghoon Lee, Sang Joon Son, and Rae Woong Park
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Public aspects of medicine ,RA1-1270 - Abstract
BackgroundMood disorder has emerged as a serious concern for public health; in particular, bipolar disorder has a less favorable prognosis than depression. Although prompt recognition of depression conversion to bipolar disorder is needed, early prediction is challenging due to overlapping symptoms. Recently, there have been attempts to develop a prediction model by using federated learning. Federated learning in medical fields is a method for training multi-institutional machine learning models without patient-level data sharing. ObjectiveThis study aims to develop and validate a federated, differentially private multi-institutional bipolar transition prediction model. MethodsThis retrospective study enrolled patients diagnosed with the first depressive episode at 5 tertiary hospitals in South Korea. We developed models for predicting bipolar transition by using data from 17,631 patients in 4 institutions. Further, we used data from 4541 patients for external validation from 1 institution. We created standardized pipelines to extract large-scale clinical features from the 4 institutions without any code modification. Moreover, we performed feature selection in a federated environment for computational efficiency and applied differential privacy to gradient updates. Finally, we compared the federated and the 4 local models developed with each hospital's data on internal and external validation data sets. ResultsIn the internal data set, 279 out of 17,631 patients showed bipolar disorder transition. In the external data set, 39 out of 4541 patients showed bipolar disorder transition. The average performance of the federated model in the internal test (area under the curve [AUC] 0.726) and external validation (AUC 0.719) data sets was higher than that of the other locally developed models (AUC 0.642-0.707 and AUC 0.642-0.699, respectively). In the federated model, classifications were driven by several predictors such as the Charlson index (low scores were associated with bipolar transition, which may be due to younger age), severe depression, anxiolytics, young age, and visiting months (the bipolar transition was associated with seasonality, especially during the spring and summer months). ConclusionsWe developed and validated a differentially private federated model by using distributed multi-institutional psychiatric data with standardized pipelines in a real-world environment. The federated model performed better than models using local data only.
- Published
- 2023
- Full Text
- View/download PDF
4. DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models
- Author
-
Chongliang Luo, Md. Nazmul Islam, Natalie E. Sheils, John Buresh, Jenna Reps, Martijn J. Schuemie, Patrick B. Ryan, Mackenzie Edmondson, Rui Duan, Jiayi Tong, Arielle Marks-Anglin, Jiang Bian, Zhaoyi Chen, Talita Duarte-Salles, Sergio Fernández-Bertolín, Thomas Falconer, Chungsoo Kim, Rae Woong Park, Stephen R. Pfohl, Nigam H. Shah, Andrew E. Williams, Hua Xu, Yujia Zhou, Ebbing Lautenbach, Jalpa A. Doshi, Rachel M. Werner, David A. Asch, and Yong Chen
- Subjects
Science - Abstract
A lossless, one-shot and privacy-preserving distributed algorithm was revealed for fitting linear mixed models on multi-site data. The algorithm was applied to a study of 120,609 COVID-19 patients using only minimal aggregated data from each of 14 sites.
- Published
- 2022
- Full Text
- View/download PDF
5. International cohort study indicates no association between alpha-1 blockers and susceptibility to COVID-19 in benign prostatic hyperplasia patients
- Author
-
Akihiko Nishimura, Junqing Xie, Kristin Kostka, Talita Duarte-Salles, Sergio Fernández Bertolín, María Aragón, Clair Blacketer, Azza Shoaibi, Scott L. DuVall, Kristine Lynch, Michael E. Matheny, Thomas Falconer, Daniel R. Morales, Mitchell M. Conover, Seng Chan You, Nicole Pratt, James Weaver, Anthony G. Sena, Martijn J. Schuemie, Jenna Reps, Christian Reich, Peter R. Rijnbeek, Patrick B. Ryan, George Hripcsak, Daniel Prieto-Alhambra, and Marc A. Suchard
- Subjects
treatment for SARS CoV-2 ,observational study ,electronic health records ,federated data model ,causal inference ,open science ,Therapeutics. Pharmacology ,RM1-950 - Abstract
Purpose: Alpha-1 blockers, often used to treat benign prostatic hyperplasia (BPH), have been hypothesized to prevent COVID-19 complications by minimising cytokine storm release. The proposed treatment based on this hypothesis currently lacks support from reliable real-world evidence, however. We leverage an international network of large-scale healthcare databases to generate comprehensive evidence in a transparent and reproducible manner.Methods: In this international cohort study, we deployed electronic health records from Spain (SIDIAP) and the United States (Department of Veterans Affairs, Columbia University Irving Medical Center, IQVIA OpenClaims, Optum DOD, Optum EHR). We assessed association between alpha-1 blocker use and risks of three COVID-19 outcomes—diagnosis, hospitalization, and hospitalization requiring intensive services—using a prevalent-user active-comparator design. We estimated hazard ratios using state-of-the-art techniques to minimize potential confounding, including large-scale propensity score matching/stratification and negative control calibration. We pooled database-specific estimates through random effects meta-analysis.Results: Our study overall included 2.6 and 0.46 million users of alpha-1 blockers and of alternative BPH medications. We observed no significant difference in their risks for any of the COVID-19 outcomes, with our meta-analytic HR estimates being 1.02 (95% CI: 0.92–1.13) for diagnosis, 1.00 (95% CI: 0.89–1.13) for hospitalization, and 1.15 (95% CI: 0.71–1.88) for hospitalization requiring intensive services.Conclusion: We found no evidence of the hypothesized reduction in risks of the COVID-19 outcomes from the prevalent-use of alpha-1 blockers—further research is needed to identify effective therapies for this novel disease.
- Published
- 2022
- Full Text
- View/download PDF
6. Correction to: Adaptation and validation of a coding algorithm for the Charlson Comorbidity Index in administrative claims data using the SNOMED CT standardized vocabulary
- Author
-
Stephen P. Fortin, Jenna Reps, and Patrick Ryan
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 - Published
- 2023
- Full Text
- View/download PDF
7. Risk Factors for Interstitial Cystitis in the General Population and in Individuals With Depression
- Author
-
M. Soledad Cepeda, Jenna Reps, Anthony G. Sena, and Rachel Ochs-Ross
- Subjects
Depression ,Inflammation ,Interstitial cystitis ,Retrospective cohort study ,Risk factors ,Diseases of the genitourinary system. Urology ,RC870-923 - Abstract
Purpose To identify risk factors for interstitial cystitis (IC), a chronic bladder disorder that may have a significant detrimental impact on quality of life, in the general population and in individuals with depression. Methods This was a comparative study using a US claims database. Adults who had records of a visit to the health system in 2010 or later were included. The outcome was the development of IC within 2 years after the index date. The index date for the general population was the first outpatient visit, and for individuals with depression, it was the date of the diagnosis of depression. IC was defined using the concepts of ulcerative and IC. We included all medical conditions present any time prior to the index visit as potential risk factors. Results The incidence of IC was higher in individuals with depression than in the general population. Of the 3,973,000 subjects from the general population, 2,293 (0.06%) developed IC within 2 years. Of the 249,200 individuals with depression, 320 (0.13%) developed IC. The characteristics of the individuals who developed IC were similar in both populations. Those who developed IC were slightly older, more likely to be women, and had more chronic pain conditions, malaise, and inflammatory disorders than patients without IC. In the general population, subjects who developed IC were more likely to have mood disorders, anxiety, and hypothyroidism. Conclusions The incidence of IC was higher in individuals with depression. Subjects who developed IC had more chronic pain conditions, depression, malaise, and inflammatory disorders.
- Published
- 2019
- Full Text
- View/download PDF
8. Inferring disease severity in rheumatoid arthritis using predictive modeling in administrative claims databases.
- Author
-
Urmila Chandran, Jenna Reps, Paul E Stang, and Patrick B Ryan
- Subjects
Medicine ,Science - Abstract
BackgroundConfounding by disease severity is an issue in pharmacoepidemiology studies of rheumatoid arthritis (RA), due to channeling of sicker patients to certain therapies. To address the issue of limited clinical data for confounder adjustment, a patient-level prediction model to differentiate between patients prescribed and not prescribed advanced therapies was developed as a surrogate for disease severity, using all available data from a US claims database.MethodsData from adult RA patients were used to build regularized logistic regression models to predict current and future disease severity using a biologic or tofacitinib prescription claim as a surrogate for moderate-to-severe disease. Model discrimination was assessed using the area under the receiver (AUC) operating characteristic curve, tested and trained in Optum Clinformatics® Extended DataMart (Optum) and additionally validated in three external IBM MarketScan® databases. The model was further validated in the Optum database across a range of patient cohorts.ResultsIn the Optum database (n = 68,608), the AUC for discriminating RA patients with a prescription claim for a biologic or tofacitinib versus those without in the 90 days following index diagnosis was 0.80. Model AUCs were 0.77 in IBM CCAE (n = 75,579) and IBM MDCD (n = 7,537) and 0.75 in IBM MDCR (n = 36,090). There was little change in the prediction model assessing discrimination 730 days following index diagnosis (prediction model AUC in Optum was 0.79).ConclusionsA prediction model demonstrated good discrimination across multiple claims databases to identify RA patients with a prescription claim for advanced therapies during different time-at-risk periods as proxy for current and future moderate-to-severe disease. This work provides a robust model-derived risk score that can be used as a potential covariate and proxy measure to adjust for confounding by severity in multivariable models in the RA population. An R package to develop the prediction model and risk score are available in an open source platform for researchers.
- Published
- 2019
- Full Text
- View/download PDF
9. Treatment resistant depression incidence estimates from studies of health insurance databases depend strongly on the details of the operating definition
- Author
-
Daniel Fife, Jenna Reps, M. Soledad Cepeda, Paul Stang, Margaret Blacketer, and Jaskaran Singh
- Subjects
Evidence-based medicine ,Psychiatry ,Epidemiology ,Health sciences ,Clinical psychology ,Science (General) ,Q1-390 ,Social sciences (General) ,H1-99 - Abstract
Background: Health services databases provide population-based data that have been used to describe the epidemiology and costs of treatment resistant depression (TRD). This retrospective cohort study estimated TRD incidence and, via sensitivity analyses, assessed the variation of TRD incidence within the range of implementation choices. Methods: In three US databases widely used for observational studies, we defined TRD as failure of two medications as evidenced by their replacement or supplementation by other medications, and set maximum durations (caps) for how long a medication regimen could remain in use and still be eligible to fail. Results: TRD incidence estimates varied approximately 2-fold between the two databases (CCAE, Medicaid) that described socioeconomically different non-elderly populations; for a given cap varied 2-fold to 4-fold within each database across the other implementation choices; and if the cap was also allowed to vary, varied 6-fold or 7-fold within each database. Limitations: The main limitations were typical of studies from health services databases and included the lack of complete -rather than recent - medical histories, the limited amount of clinical information, and the assumption that medication dispensed was consumed as directed. Conclusion: In retrospective cohort studies from health services databases, TRD incidence estimates vary widely depending on the implementation choices. Unless a firm basis for narrowing the range of these choices can be found, or a different analytic approach not dependent on such choices is adopted, TRD incidence and prevalence estimates from such databases will be difficult to compare or interpret.
- Published
- 2018
- Full Text
- View/download PDF
10. Can machine-learning improve cardiovascular risk prediction using routine clinical data?
- Author
-
Stephen F Weng, Jenna Reps, Joe Kai, Jonathan M Garibaldi, and Nadeem Qureshi
- Subjects
Medicine ,Science - Abstract
BACKGROUND:Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction. METHODS:Prospective cohort study using routine clinical data of 378,256 patients from UK family practices, free from cardiovascular disease at outset. Four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years. Predictive accuracy was assessed by area under the 'receiver operating curve' (AUC); and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) to predict 7.5% cardiovascular risk (threshold for initiating statins). FINDINGS:24,970 incident cardiovascular events (6.6%) occurred. Compared to the established risk prediction algorithm (AUC 0.728, 95% CI 0.723-0.735), machine-learning algorithms improved prediction: random forest +1.7% (AUC 0.745, 95% CI 0.739-0.750), logistic regression +3.2% (AUC 0.760, 95% CI 0.755-0.766), gradient boosting +3.3% (AUC 0.761, 95% CI 0.755-0.766), neural networks +3.6% (AUC 0.764, 95% CI 0.759-0.769). The highest achieving (neural networks) algorithm predicted 4,998/7,404 cases (sensitivity 67.5%, PPV 18.4%) and 53,458/75,585 non-cases (specificity 70.7%, NPV 95.7%), correctly predicting 355 (+7.6%) more patients who developed cardiovascular disease compared to the established algorithm. CONCLUSIONS:Machine-learning significantly improves accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others.
- Published
- 2017
- Full Text
- View/download PDF
11. Illness Beliefs Predict Mortality in Patients with Diabetic Foot Ulcers.
- Author
-
Kavita Vedhara, Karen Dawe, Jeremy N V Miles, Mark A Wetherell, Nicky Cullum, Colin Dayan, Nicola Drake, Patricia Price, John Tarlton, John Weinman, Andrew Day, Rona Campbell, Jenna Reps, and Daniele Soria
- Subjects
Medicine ,Science - Abstract
BACKGROUND:Patients' illness beliefs have been associated with glycaemic control in diabetes and survival in other conditions. OBJECTIVE:We examined whether illness beliefs independently predicted survival in patients with diabetes and foot ulceration. METHODS:Patients (n=169) were recruited between 2002 and 2007. Data on illness beliefs were collected at baseline. Data on survival were extracted on 1st November 2011. Number of days survived reflected the number of days from date of recruitment to 1st November 2011. RESULTS:Cox regressions examined the predictors of time to death and identified ischemia and identity beliefs (beliefs regarding symptoms associated with foot ulceration) as significant predictors of time to death. CONCLUSIONS:Our data indicate that illness beliefs have a significant independent effect on survival in patients with diabetes and foot ulceration. These findings suggest that illness beliefs could improve our understanding of mortality risk in this patient group and could also be the basis for future therapeutic interventions to improve survival.
- Published
- 2016
- Full Text
- View/download PDF
12. Comparing stochastic differential equations and agent-based modelling and simulation for early-stage cancer.
- Author
-
Grazziela P Figueredo, Peer-Olaf Siebers, Markus R Owen, Jenna Reps, and Uwe Aickelin
- Subjects
Medicine ,Science - Abstract
There is great potential to be explored regarding the use of agent-based modelling and simulation as an alternative paradigm to investigate early-stage cancer interactions with the immune system. It does not suffer from some limitations of ordinary differential equation models, such as the lack of stochasticity, representation of individual behaviours rather than aggregates and individual memory. In this paper we investigate the potential contribution of agent-based modelling and simulation when contrasted with stochastic versions of ODE models using early-stage cancer examples. We seek answers to the following questions: (1) Does this new stochastic formulation produce similar results to the agent-based version? (2) Can these methods be used interchangeably? (3) Do agent-based models outcomes reveal any benefit when compared to the Gillespie results? To answer these research questions we investigate three well-established mathematical models describing interactions between tumour cells and immune elements. These case studies were re-conceptualised under an agent-based perspective and also converted to the Gillespie algorithm formulation. Our interest in this work, therefore, is to establish a methodological discussion regarding the usability of different simulation approaches, rather than provide further biological insights into the investigated case studies. Our results show that it is possible to obtain equivalent models that implement the same mechanisms; however, the incapacity of the Gillespie algorithm to retain individual memory of past events affects the similarity of some results. Furthermore, the emergent behaviour of ABMS produces extra patters of behaviour in the system, which was not obtained by the Gillespie algorithm.
- Published
- 2014
- Full Text
- View/download PDF
13. Health-Analytics Data to Evidence Suite (HADES): Open-Source Software for Observational Research.
- Author
-
Martijn J. Schuemie, Jenna Reps, Adam Black, Frank J. DeFalco, Lee Evans, Egill A. Fridgeirsson, James P. Gilbert, Chris Knoll, Martin Lavallee, Gowtham A. Rao, Peter R. Rijnbeek, Katy Sadowski, Anthony G. Sena, Joel N. Swerdel, Ross D. Williams, and Marc A. Suchard
- Published
- 2023
- Full Text
- View/download PDF
14. Learning Across a Healthcare Data Network to Improve Model Robustness and Evidence Reliability.
- Author
-
Noémie Elhadad, Iñigo Urteaga, Alison Callahan, Jenna Reps, and Patrick B. Ryan
- Published
- 2019
15. Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care
- Author
-
Urmila Chandran, Jenna Reps, Robert Yang, Anil Vachani, Fabien Maldonado, and Iftekhar Kalsekar
- Subjects
Oncology ,Epidemiology - Abstract
Background:This study used machine learning to develop a 3-year lung cancer risk prediction model with large real-world data in a mostly younger population.Methods:Over 4.7 million individuals, aged 45 to 65 years with no history of any cancer or lung cancer screening, diagnostic, or treatment procedures, with an outpatient visit in 2013 were identified in Optum's de-identified Electronic Health Record (EHR) dataset. A least absolute shrinkage and selection operator model was fit using all available data in the 365 days prior. Temporal validation was assessed with recent data. External validation was assessed with data from Mercy Health Systems EHR and Optum's de-identified Clinformatics Data Mart Database. Racial inequities in model discrimination were assessed with xAUCs.Results:The model AUC was 0.76. Top predictors included age, smoking, race, ethnicity, and diagnosis of chronic obstructive pulmonary disease. The model identified a high-risk group with lung cancer incidence 9 times the average cohort incidence, representing 10% of patients with lung cancer. Model performed well temporally and externally, while performance was reduced for Asians and Hispanics.Conclusions:A high-dimensional model trained using big data identified a subset of patients with high lung cancer risk. The model demonstrated transportability to EHR and claims data, while underscoring the need to assess racial disparities when using machine learning methods.Impact:This internally and externally validated real-world data-based lung cancer prediction model is available on an open-source platform for broad sharing and application. Model integration into an EHR system could minimize physician burden by automating identification of high-risk patients.
- Published
- 2022
- Full Text
- View/download PDF
16. Identifying Candidate Risk Factors for Prescription Drug Side Effects Using Causal Contrast Set Mining.
- Author
-
Jenna Reps, Zhaoyang Guo, Haoyue Zhu, and Uwe Aickelin
- Published
- 2015
- Full Text
- View/download PDF
17. Investigating distance metric learning in semi-supervised fuzzy c-means clustering.
- Author
-
Daphne Teck Ching Lai, Jonathan M. Garibaldi, and Jenna Reps
- Published
- 2014
- Full Text
- View/download PDF
18. Personalising Mobile Advertising Based on Users' Installed Apps.
- Author
-
Jenna Reps, Uwe Aickelin, Jonathan M. Garibaldi, and Chris Damski
- Published
- 2014
- Full Text
- View/download PDF
19. Comparing data-mining algorithms developed for longitudinal observational databases.
- Author
-
Jenna Reps, Jonathan M. Garibaldi, Uwe Aickelin, Daniele Soria, Jack E. Gibson, and Richard B. Hubbard
- Published
- 2012
- Full Text
- View/download PDF
20. Discovering sequential patterns in a UK general practice database.
- Author
-
Jenna Reps, Jonathan M. Garibaldi, Uwe Aickelin, Daniele Soria, Jack E. Gibson, and Richard B. Hubbard
- Published
- 2012
- Full Text
- View/download PDF
21. Quiet in Class: Classification, Noise and the Dendritic Cell Algorithm.
- Author
-
Feng Gu 0006, Jan Feyereisl, Robert F. Oates, Jenna Reps, Julie Greensmith, and Uwe Aickelin
- Published
- 2011
- Full Text
- View/download PDF
22. Feasibility and evaluation of a large-scale external validation approach for patient-level prediction in an international data network: validation of models predicting stroke in female patients newly diagnosed with atrial fibrillation
- Author
-
Alison Callahan, Ross D. Williams, Seng Chan You, Evan P. Minty, Jenna Reps, Thomas Falconer, Peter R. Rijnbeek, Patrick B. Ryan, Hong-Seok Lim, Rae Woong Park, and Medical Informatics
- Subjects
Standardization ,020205 medical informatics ,Epidemiology ,Computer science ,Collaborative network ,MEDLINE ,Health Informatics ,02 engineering and technology ,030204 cardiovascular system & hematology ,Machine learning ,computer.software_genre ,InformationSystems_GENERAL ,03 medical and health sciences ,0302 clinical medicine ,Atrial Fibrillation ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,030212 general & internal medicine ,lcsh:R5-920 ,Framingham Risk Score ,business.industry ,Reproducibility of Results ,Prognosis ,3. Good health ,External validation ,Transportability ,Stroke ,Tree (data structure) ,Patient-level prediction ,Informatics ,Feasibility Studies ,Observational study ,Female ,Artificial intelligence ,business ,Prognostic model ,lcsh:Medicine (General) ,computer ,Predictive modelling ,Research Article - Abstract
Background To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets. Methods Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. Results The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57–0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation. Conclusion This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.
- Published
- 2020
- Full Text
- View/download PDF
23. Development of multivariable models to predict perinatal depression before and after delivery using patient reported survey responses at weeks 4–10 of pregnancy
- Author
-
Marsha A. Wilcox, Kevin Wildenhaus, Marie Leonte, Lauren LaCross, Jenna Reps, and Beth Ann McGee
- Subjects
Depressive Disorder ,Pregnancy ,medicine.medical_specialty ,Depression ,business.industry ,Obstetrics ,Multivariable calculus ,Obstetrics and Gynecology ,medicine.disease ,Depression, Postpartum ,medicine ,Humans ,Female ,Patient Reported Outcome Measures ,Prospective Studies ,business ,Perinatal Depression - Abstract
Background Perinatal depression is estimated to affect ~ 12% of pregnancies and is linked to numerous negative outcomes. There is currently no model to predict perinatal depression at multiple time-points during and after pregnancy using variables ascertained early into pregnancy. Methods A prospective cohort design where 858 participants filled in a baseline self-reported survey at week 4–10 of pregnancy (that included social economics, health history, various psychiatric measures), with follow-up until 3 months after delivery. Our primary outcome was an Edinburgh Postnatal Depression Score (EPDS) score of 12 or more (a proxy for perinatal depression) assessed during each trimester and again at two time periods after delivery. Five gradient boosting machines were trained to predict the risk of having EPDS score > = 12 at each of the five follow-up periods. The predictors consisted of 21 variables from 3 validated psychometric scales. As a sensitivity analysis, we also investigated different predictor sets that contained: i) 17 of the 21 variables predictors by only including two of the psychometric scales and ii) including 143 additional social economics and health history predictors, resulting in 164 predictors. Results We developed five prognostic models: PND-T1 (trimester 1), PND-T2 (trimester 2), PND-T3 (trimester 3), PND-A1 (after delivery 1) and PND-A2 (delayed onset after delivery) that calculate personalised risks while only requiring that women be asked 21 questions from 3 validated psychometric scales at weeks 4–10 of pregnancy. C-statistics (also known as AUC) ranged between 0.69 (95% CI 0.65–0.73) and 0.77 (95% CI 0.74–0.80). At 50% sensitivity the positive predictive value ranged between 30%-50% across the models, generally identifying groups of patients with double the average risk. Models trained using the 17 predictors and 164 predictors did not improve model performance compared to the models trained using 21 predictors. Conclusions The five models can predict risk of perinatal depression within each trimester and in two post-natal periods using survey responses as early as week 4 of pregnancy with modest performance. The models need to be externally validated and prospectively tested to ensure generalizability to any pregnant patient.
- Published
- 2022
- Full Text
- View/download PDF
24. Identifying Candidate Risk Factors for Prescription Drug Side Effects using Causal Contrast Set Mining.
- Author
-
Jenna Reps, Zhaoyang Guo, Haoyue Zhu, and Uwe Aickelin
- Published
- 2016
25. Incorporating Spontaneous Reporting System Data to Aid Causal Inference in Longitudinal Healthcare Data.
- Author
-
Jenna Reps and Uwe Aickelin
- Published
- 2015
26. Refining Adverse Drug Reactions using Association Rule Mining for Electronic Healthcare Data.
- Author
-
Jenna Reps, Uwe Aickelin, Jiangang Ma, and Yanchun Zhang
- Published
- 2015
27. Personalising Mobile Advertising Based on Users Installed Apps.
- Author
-
Jenna Reps, Uwe Aickelin, Jonathan M. Garibaldi, and Chris Damski
- Published
- 2015
28. A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data
- Author
-
Chungsoo Kim, Jimyung Park, Martijn J. Schuemie, Clair Blacketer, Cynthia Yang, Sergio Fernandez-Bertolin, Seng Chan You, Jenna Reps, S Khalid, Rae Woong Park, Anthony G. Sena, Marc A. Suchard, Peter R. Rijnbeek, Talita Duarte-Salles, and Medical Informatics
- Subjects
Artificial Intelligence and Image Processing ,COVID19 ,Computer science ,Pipeline (computing) ,observational health data ,Biomedical Engineering ,Decision tree ,Bioengineering ,Health Informatics ,Machine learning ,computer.software_genre ,Article ,Machine Learning ,OHDSI ,Humans ,AdaBoost ,Electrical and Electronic Engineering ,Data quality control ,Pandemics ,prediction modeling ,business.industry ,SARS-CoV-2 ,Standardized approach ,Data harmonization ,COVID-19 ,Risk prediction ,Computer Science Applications ,Random forest ,Phenotypes ,Networking and Information Technology R&D ,Distributed data network ,Logistic Models ,Networking and Information Technology R&D (NITRD) ,Analytics ,Generic health relevance ,Gradient boosting ,Artificial intelligence ,business ,computer ,Medical Informatics ,Software ,Predictive modelling - Abstract
Author(s): Khalid, Sara; Yang, Cynthia; Blacketer, Clair; Duarte-Salles, Talita; Fernandez-Bertolin, Sergio; Kim, Chungsoo; Park, Rae Woong; Park, Jimyung; Schuemie, Martijn J; Sena, Anthony G; Suchard, Marc A; You, Seng Chan; Rijnbeek, Peter R; Reps, Jenna M | Abstract: Background and objectiveAs a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code).MethodsWe show step-by-step how to implement the analytics pipeline for the question: 'In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?'. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA.ResultsOur open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated.ConclusionOur results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world.
- Published
- 2021
- Full Text
- View/download PDF
29. Developing Predictive Models to Determine Patients in End-of-Life Care in Administrative Datasets
- Author
-
Joel N. Swerdel, Patrick B. Ryan, Daniel Fife, and Jenna Reps
- Subjects
Adult ,Male ,Databases, Factual ,Calibration (statistics) ,Psychological intervention ,Toxicology ,Logistic regression ,030226 pharmacology & pharmacy ,03 medical and health sciences ,0302 clinical medicine ,Statistics ,Humans ,Medicine ,Pharmacology (medical) ,Original Research Article ,030212 general & internal medicine ,Medical prescription ,Aged ,Aged, 80 and over ,Pharmacology ,Terminal Care ,Receiver operating characteristic ,business.industry ,Middle Aged ,Models, Theoretical ,Confidence interval ,Female ,Observational study ,business ,End-of-life care - Abstract
Introduction In observational studies with mortality endpoints, one needs to consider how to account for subjects whose interventions appear to be part of ‘end-of-life’ care. Objective The objective of this study was to develop a diagnostic predictive model to identify those in end-of-life care at the time of a drug exposure. Methods We used data from four administrative claims datasets from 2000 to 2017. The index date was the date of the first prescription for the last new drug subjects received during their observation period. The outcome of end-of-life care was determined by the presence of one or more codes indicating terminal or hospice care. Models were developed using regularized logistic regression. Internal validation was through examination of the area under the receiver operating characteristic curve (AUC) and through model calibration in a 25% subset of the data held back from model training. External validation was through examination of the AUC after applying the model learned on one dataset to the three other datasets. Results The models showed excellent performance characteristics. Internal validation resulted in AUCs ranging from 0.918 (95% confidence interval [CI] 0.905–0.930) to 0.983 (95% CI 0.978–0.987) for the four different datasets. Calibration results were also very good, with slopes near unity. External validation also produced very good to excellent performance metrics, with AUCs ranging from 0.840 (95% CI 0.834–0.846) to 0.956 (95% CI 0.952–0.960). Conclusion These results show that developing diagnostic predictive models for determining subjects in end-of-life care at the time of a drug treatment is possible and may improve the validity of the risk profile for those treatments. Electronic supplementary material The online version of this article (10.1007/s40264-020-00906-7) contains supplementary material, which is available to authorized users.
- Published
- 2020
- Full Text
- View/download PDF
30. Evaluating the impact of covariate lookback times on performance of patient-level prediction models
- Author
-
Jenna Reps and Jill Hardin
- Subjects
Medicine (General) ,Databases, Factual ,Epidemiology ,business.industry ,Calibration (statistics) ,Research ,Area under the curve ,Health Informatics ,Logistic regression ,Feature extraction lookback periods ,Time ,Stroke ,Patient-level prediction ,Logistic Models ,R5-920 ,Lasso (statistics) ,Area Under Curve ,Covariate ,Statistics ,Cohort ,Humans ,Medicine ,Observational study ,business ,Predictive modelling - Abstract
Background The goal of our study is to examine the impact of the lookback length when engineering features to use in developing predictive models using observational healthcare data. Using a longer lookback for feature engineering gives more insight about patients but increases the issue of left-censoring. Methods We used five US observational databases to develop patient-level prediction models. A target cohort of subjects with hypertensive drug exposures and outcome cohorts of subjects with acute (stroke and gastrointestinal bleeding) and chronic outcomes (diabetes and chronic kidney disease) were developed. Candidate predictors that exist on or prior to the target index date were derived within the following lookback periods: 14, 30, 90, 180, 365, 730, and all days prior to index were evaluated. We predicted the risk of outcomes occurring 1 day until 365 days after index. Ten lasso logistic models for each lookback period were generated to create a distribution of area under the curve (AUC) metrics to evaluate the discriminative performance of the models. Calibration intercept and slope were also calculated. Impact on external validation performance was investigated across five databases. Results The maximum differences in AUCs for the models developed using different lookback periods within a database was Conclusions In general the choice of covariate lookback had only a small impact on discrimination and calibration, with a short lookback (
- Published
- 2021
- Full Text
- View/download PDF
31. Comparison of algorithms that detect drug side effects using electronic healthcare databases.
- Author
-
Jenna Reps, Jonathan M. Garibaldi, Uwe Aickelin, Daniele Soria, Jack E. Gibson, and Richard B. Hubbard
- Published
- 2014
32. Tuning a Multiple Classifier System for Side Effect Discovery using Genetic Algorithms.
- Author
-
Jenna Reps, Uwe Aickelin, and Jonathan M. Garibaldi
- Published
- 2014
33. Signalling Paediatric Side Effects using an Ensemble of Simple Study Designs.
- Author
-
Jenna Reps, Jonathan M. Garibaldi, Uwe Aickelin, Daniele Soria, Jack E. Gibson, and Richard B. Hubbard
- Published
- 2014
34. Attributes for Causal Inference in Longitudinal Observational Databases.
- Author
-
Jenna Reps, Jonathan M. Garibaldi, Uwe Aickelin, Daniele Soria, Jack E. Gibson, and Richard B. Hubbard
- Published
- 2014
35. Comparing Stochastic Differential Equations and Agent-Based Modelling and Simulation for Early-stage Cancer.
- Author
-
Grazziela P. Figueredo, Peer-Olaf Siebers, Markus R. Owen, Jenna Reps, and Uwe Aickelin
- Published
- 2014
36. A Novel Semi-Supervised Algorithm for Rare Prescription Side Effect Discovery.
- Author
-
Jenna Reps, Jonathan M. Garibaldi, Uwe Aickelin, Daniele Soria, Jack E. Gibson, and Richard B. Hubbard
- Published
- 2014
37. Attributes for causal inference in electronic healthcare databases.
- Author
-
Jenna Reps, Jonathan M. Garibaldi, Uwe Aickelin, Daniele Soria, Jack E. Gibson, and Richard B. Hubbard
- Published
- 2013
- Full Text
- View/download PDF
38. DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models
- Author
-
Chongliang Luo, Md. Nazmul Islam, Natalie E. Sheils, John Buresh, Jenna Reps, Martijn J. Schuemie, Patrick B. Ryan, Mackenzie Edmondson, Rui Duan, Jiayi Tong, Arielle Marks-Anglin, Jiang Bian, Zhaoyi Chen, Talita Duarte-Salles, Sergio Fernández-Bertolín, Thomas Falconer, Chungsoo Kim, Rae Woong Park, Stephen R. Pfohl, Nigam H. Shah, Andrew E. Williams, Hua Xu, Yujia Zhou, Ebbing Lautenbach, Jalpa A. Doshi, Rachel M. Werner, David A. Asch, and Yong Chen
- Subjects
Multidisciplinary ,Databases, Factual ,Linear Models ,General Physics and Astronomy ,COVID-19 ,Humans ,General Chemistry ,General Biochemistry, Genetics and Molecular Biology ,Algorithms ,Confidentiality - Abstract
Linear mixed models are commonly used in healthcare-based association analyses for analyzing multi-site data with heterogeneous site-specific random effects. Due to regulations for protecting patients’ privacy, sensitive individual patient data (IPD) typically cannot be shared across sites. We propose an algorithm for fitting distributed linear mixed models (DLMMs) without sharing IPD across sites. This algorithm achieves results identical to those achieved using pooled IPD from multiple sites (i.e., the same effect size and standard error estimates), hence demonstrating the lossless property. The algorithm requires each site to contribute minimal aggregated data in only one round of communication. We demonstrate the lossless property of the proposed DLMM algorithm by investigating the associations between demographic and clinical characteristics and length of hospital stay in COVID-19 patients using administrative claims from the UnitedHealth Group Clinical Discovery Database. We extend this association study by incorporating 120,609 COVID-19 patients from 11 collaborative data sources worldwide.
- Published
- 2021
39. Alpha-1 blockers and susceptibility to COVID-19 in benign prostate hyperplasia patients : an international cohort study
- Author
-
Christian G. Reich, Kristin Kostka, Sergio Fernandez Bertolin, Seng Chan You, Anthony G. Sena, Akihiko Nishimura, Martijn J. Schuemie, Thomas Falconer, James Weaver, Clair Blacketer, Kristine E. Lynch, Michael E. Matheny, Jenna Reps, Patrick B. Ryan, George Hripcsak, Daniel Prieto-Alhambra, Talita Duarte-Salles, Daniel R. Morales, Junqing Xie, Azza Shoaibi, Peter R. Rijnbeek, Mitchell M. Conover, Marc A. Suchard, Nicole L. Pratt, Scott L. DuVall, and María Aragón
- Subjects
medicine.medical_specialty ,Coronavirus disease 2019 (COVID-19) ,COVID19 ,business.industry ,Confounding ,Hazard ratio ,MEDLINE ,Hyperplasia ,medicine.disease ,susceptibility ,Internal medicine ,Alpha-1 blockers ,benign prostate hyperplasia ,Propensity score matching ,medicine ,business ,Veterans Affairs ,Cohort study - Abstract
Alpha-1 blockers, often used to treat benign prostate hyperplasia (BPH), have been hypothesized to prevent COVID-19 complications by minimising cytokine storms release. We conducted a prevalent-user active-comparator cohort study to assess association between alpha-1 blocker use and risks of three COVID-19 outcomes: diagnosis, hospitalization, and hospitalization requiring intensive services. Our study included 2.6 and 0.46 million users of alpha-1 blockers and of alternative BPH therapy during the period between November 2019 and January 2020, found in electronic health records from Spain (SIDIAP) and the United States (Department of Veterans Affairs, Columbia University Irving Medical Center, IQVIA OpenClaims, Optum DOD, Optum EHR). We estimated hazard ratios using state-of-the-art techniques to minimize potential confounding, including large-scale propensity score matching/stratification and negative control calibration. We found no differential risk for any of COVID-19 outcome, pointing to the need for further research on potential COVID-19 therapies.
- Published
- 2021
- Full Text
- View/download PDF
40. Design Matters in Patient-Level Prediction: Evaluation of a Cohort vs. Case-Control Design When Developing Predictive Models in Observational Healthcare Datasets
- Author
-
Peter R. Rijnbeek, Martijn J. Schuemie, Jenna Reps, and Patrick B. Ryan
- Subjects
Computer engineering. Computer hardware ,Information Systems and Management ,Computer Networks and Communications ,Computer science ,Calibration (statistics) ,Information technology ,Prognostic ,Machine learning ,computer.software_genre ,TK7885-7895 ,Discriminative model ,Receiver operating characteristic ,business.industry ,Cohort model ,Cohort ,QA75.5-76.95 ,Case-control ,Classification ,T58.5-58.64 ,Patient-level prediction ,Hardware and Architecture ,Electronic computers. Computer science ,Observational study ,Artificial intelligence ,Prediction ,business ,computer ,Predictive modelling ,Information Systems ,Cohort study - Abstract
BackgroundThe design used to create labelled data for training prediction models from observational healthcare databases (e.g., case-control and cohort) may impact the clinical usefulness. We aim to investigate hypothetical design issues and determine how the design impacts prediction model performance.AimTo empirically investigate differences between models developed using a case-control design and a cohort design.MethodsUsing a US claims database, we replicated two published prediction models (dementia and type 2 diabetes) which were developed using a case-control design, and trained models for the same prediction questions using cohort designs. We validated each model on data mimicking the point in time the models would be applied in clinical practice. We calculated the models’ discrimination and calibration-in-the-large performances.ResultsThe dementia models obtained area under the receiver operating characteristics of 0.560 and 0.897 for the case-control and cohort designs respectively. The type 2 diabetes models obtained area under the receiver operating characteristics of 0.733 and 0.727 for the case-control and cohort designs respectively. The dementia and diabetes case-control models were both poorly calibrated, whereas the dementia cohort model achieved good calibration. We show that careful construction of a case-control design can lead to comparable discriminative performance as a cohort design, but case-control designs over-represent the outcome class leading to miscalibration.ConclusionsAny case-control design can be converted to a cohort design. We recommend that researchers with observational data use the less subjective and generally better calibrated cohort design when extracting labelled data. However, if a carefully constructed case-control design is used, then the model must be prospectively validated using a cohort design for fair evaluation and be recalibrated.
- Published
- 2021
- Full Text
- View/download PDF
41. Comparing Data-mining Algorithms Developed for Longitudinal Observational Databases.
- Author
-
Jenna Reps, Jonathan M. Garibaldi, Uwe Aickelin, Daniele Soria, Jack E. Gibson, and Richard B. Hubbard
- Published
- 2013
42. Discovering Sequential Patterns in a UK General Practice Database.
- Author
-
Jenna Reps, Jonathan M. Garibaldi, Uwe Aickelin, Daniele Soria, Jack E. Gibson, and Richard B. Hubbard
- Published
- 2013
43. Investigating the Detection of Adverse Drug Events in a UK General Practice Electronic Health-Care Database.
- Author
-
Jenna Reps, Jan Feyereisl, Jonathan M. Garibaldi, Uwe Aickelin, Jack E. Gibson, and Richard B. Hubbard
- Published
- 2013
44. Quiet in Class: Classification, Noise and the Dendritic Cell Algorithm.
- Author
-
Feng Gu 0006, Jan Feyereisl, Robert F. Oates, Jenna Reps, Julie Greensmith, and Uwe Aickelin
- Published
- 2013
45. Implementation of the COVID-19 Vulnerability Index Across an International Network of Health Care Data Sets: Collaborative External Validation Study
- Author
-
Carlos Areia, Thomas Falconer, Chungsoo Kim, George Hripcsak, Ross D. Williams, Michael E. Matheny, Peter R. Rijnbeek, Daniel R. Morales, Gowtham A. Rao, Sergio Fernandez-Bertolin, Ewout W. Steyerberg, Daniel Prieto-Alhambra, María Aragón, Maria Tereza Fernandes Abrahão, Andrew E. Williams, Kristine E. Lynch, Lin Zhang, Marc A. Suchard, Young Hwa Choi, Paula Casajust, Jitendra Jonnagaddala, Cynthia Yang, Anna Ostropolets, Kristin Kostka, Azza Shoaibi, Scott L. DuVall, Seng Chan You, Rae Woong Park, Siaw-Teng Liaw, Aniek F. Markus, Matthew E. Spotnitz, Christian G. Reich, Fredrik Nyberg, Jenna Reps, Talita Duarte-Salles, Benjamin Skov Kaas-Hansen, Patrick B. Ryan, Medical Informatics, and Public Health
- Subjects
observation ,bias ,Vulnerability index ,Population ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Health Informatics ,C-19 ,transportability ,030230 surgery ,03 medical and health sciences ,0302 clinical medicine ,external validation ,Health Information Management ,Health care ,medicine ,prognostic model ,030212 general & internal medicine ,education ,Statistic ,risk ,validation ,Original Paper ,education.field_of_study ,business.industry ,datasets ,COVID-19 ,modeling ,prediction ,decision-making ,Emergency department ,medicine.disease ,predictive analytics ,Observational study ,Medical emergency ,Model risk ,business ,Predictive modelling ,hospitalization - Abstract
Background SARS-CoV-2 is straining health care systems globally. The burden on hospitals during the pandemic could be reduced by implementing prediction models that can discriminate patients who require hospitalization from those who do not. The COVID-19 vulnerability (C-19) index, a model that predicts which patients will be admitted to hospital for treatment of pneumonia or pneumonia proxies, has been developed and proposed as a valuable tool for decision-making during the pandemic. However, the model is at high risk of bias according to the “prediction model risk of bias assessment” criteria, and it has not been externally validated. Objective The aim of this study was to externally validate the C-19 index across a range of health care settings to determine how well it broadly predicts hospitalization due to pneumonia in COVID-19 cases. Methods We followed the Observational Health Data Sciences and Informatics (OHDSI) framework for external validation to assess the reliability of the C-19 index. We evaluated the model on two different target populations, 41,381 patients who presented with SARS-CoV-2 at an outpatient or emergency department visit and 9,429,285 patients who presented with influenza or related symptoms during an outpatient or emergency department visit, to predict their risk of hospitalization with pneumonia during the following 0-30 days. In total, we validated the model across a network of 14 databases spanning the United States, Europe, Australia, and Asia. Results The internal validation performance of the C-19 index had a C statistic of 0.73, and the calibration was not reported by the authors. When we externally validated it by transporting it to SARS-CoV-2 data, the model obtained C statistics of 0.36, 0.53 (0.473-0.584) and 0.56 (0.488-0.636) on Spanish, US, and South Korean data sets, respectively. The calibration was poor, with the model underestimating risk. When validated on 12 data sets containing influenza patients across the OHDSI network, the C statistics ranged between 0.40 and 0.68. Conclusions Our results show that the discriminative performance of the C-19 index model is low for influenza cohorts and even worse among patients with COVID-19 in the United States, Spain, and South Korea. These results suggest that C-19 should not be used to aid decision-making during the COVID-19 pandemic. Our findings highlight the importance of performing external validation across a range of settings, especially when a prediction model is being extrapolated to a different population. In the field of prediction, extensive validation is required to create appropriate trust in a model.
- Published
- 2021
- Full Text
- View/download PDF
46. Development and validation of patient-level prediction models for adverse outcomes following total knee arthroplasty
- Author
-
M van Speybroeck, Ruth Costello, Daniel Prieto-Alhambra, A Bourke, Thomas Falconer, Antonella Delmestri, Evan P. Minty, Theresa Burkard, William Sproviero, James Weaver, David Culliford, R Williams, Patrick B. Ryan, Daniel R. Morales, Edward Burn, Anthony G. Sena, T Duarte-Salles, Danielle E Robinson, Jennifer C E Lane, Rafael Pinedo-Villanueva, Albert Prats-Uribe, Jenna Reps, Victoria Y Strauss, Spyros Kolovos, Peter R. Rijnbeek, H Morgan-Stewart, Belay Birlie, Dahai Yu, H. Ying, C O'Leary, Stephen R. Pfohl, and L John
- Subjects
medicine.medical_specialty ,Evidence-based practice ,business.industry ,Psychological intervention ,030204 cardiovascular system & hematology ,Logistic regression ,3. Good health ,Total knee replacement ,Treatment ,03 medical and health sciences ,0302 clinical medicine ,Prediction model ,Sample size determination ,Adverse events ,Emergency medicine ,media_common.cataloged_instance ,Medicine ,030212 general & internal medicine ,European union ,Adverse effect ,Risk assessment ,business ,Predictive modelling ,media_common - Abstract
Background Elective total knee replacement (TKR) is a safe and cost-effective surgical procedure for treating severe knee osteoarthritis (OA). Although complications following surgery are rare, prediction tools could help identify those patients who are at particularly high risk who could then be targeted with preventative interventions. We aimed to develop a simple model to help inform treatment choices. Methods We trained and externally validated adverse event prediction models for patients with TKR using electronic health records (EHR) and claims data from the US (OPTUM, CCAE, MDCR, and MDCD) and general practice data in the UK (IQVIA Medical Research Database ([IMRD], incorporating data from The Health Improvement Network [THIN], a Cegedim database). The target population consisted of patients undergoing a primary TKR, aged ≥40 years and registered in any of the contributing data sources for ≥1 year before surgery. LASSO logistic regression models were developed for four adverse outcomes: post-operative (90-day) mortality, venous thromboembolism (VTE), readmission, and long-term (5-year) revision surgery. A second model was developed with a reduced feature set to increase interpretability and usability. Findings A total of 508,082 patients were included, with sample size per data source ranging from 1,853 to 158,549 patients. Overall, 90-day mortality, VTE, and readmission prevalence occurred in a range of 0.20%-0.32%, 1.7%-3.0% and 2.2%-4.8%, respectively. Five-year revision surgery was observed in 1.5%-3.1% of patients. The full model predicting 90-day mortality yielded AUROC of 0.78 when trained in OPTUM and yielded an AUROC of 0.70 when externally validated on THIN. We then developed a 12 variable model which achieved internal AUROC of 0.77 and external AUROC of 0.71 in THIN. The discriminative performances of the models predicting 90-day VTE, readmission, and 5-year revision were consistently poor across the datasets (AUROC Interpretation We developed and externally validated a simple prediction model based on sex, age, and 10 comorbidities that can identify patients at high risk of short-term mortality following TKR. Our model had a greater discriminative ability than the Charlson Comorbidity Index in predicting 90-day mortality. The 12-feature mortality model is easily implemented and the performance suggests it could be used to inform evidence based shared decision-making prior to surgery and for appropriate precautions to be taken for those at high risk. The other outcomes examined had low performance. Funding This activity under the European Health Data & Evidence Network (EHDEN) has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 806968. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. The sponsor of the study did not have any involvement in the writing of the manuscript or the decision to submit it for publication. The research was supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC). DPA is funded by a National Institute for Health Research Clinician Scientist award (CS-2013-13-012). TDS is funded by the Department of Health of the Generalitat de Catalunya under the Strategic Plan for Research and Innovation in Health (PERIS; SLT002/16/00308). The views expressed in this publication are those of the authors and not those of the NHS, the National Institute for Health Research or the Department of Health. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication. Key Points Question Is it possible to predict adverse events following total knee replacement? Findings Mortality was the only adverse event studied that we were able to predict with adequate performance. We produced a 12 variable prediction model for 90-day post-operative mortality that achieved an AUROC of 0.77 on internal test validation (Optum) and 0.71 when externally validated in THIN. The model also showed adequate calibration. Meaning Patients can now be presented with an accurate risk assessment for short term mortality such that they are well-informed before the decision for surgery is taken. Importance Total Knee Replacement is generally a safe, effective procedure that is performed on thousands of patients each year. However, a small number of those patients will experience adverse events. Due to the surgery’s elective nature, a well calibrated, high performing risk model could pre-emptively inform the patient and clinician decision making process and help to guide preventative treatment.
- Published
- 2020
- Full Text
- View/download PDF
47. Improving visual communication of discriminative accuracy for predictive models: the probability threshold plot
- Author
-
Stephen S. Johnston, Stephen Fortin, Jenna Reps, Paul Coplan, and Iftekhar Kalsekar
- Subjects
Receiver operating characteristic ,AcademicSubjects/SCI01060 ,business.industry ,Computer science ,discriminative accuracy ,Health Informatics ,Pattern recognition ,Predictive analytics ,Medical decision making ,Plot (graphics) ,predictive analytics ,Discriminative model ,Range (statistics) ,Visual communication ,Artificial intelligence ,Sensitivity (control systems) ,receiver operating characteristic curve ,AcademicSubjects/SCI01530 ,business ,Brief Communications ,AcademicSubjects/MED00010 - Abstract
Objectives To propose a visual display—the probability threshold plot (PTP)—that transparently communicates a predictive models’ measures of discriminative accuracy along the range of model-based predicted probabilities (Pt). Materials and Methods We illustrate the PTP by replicating a previously-published and validated machine learning-based model to predict antihyperglycemic medication cessation within 1–2 years following metabolic surgery. The visual characteristics of the PTPs for each model were compared to receiver operating characteristic (ROC) curves. Results A total of 18 887 patients were included for analysis. Whereas during testing each predictive model had nearly identical ROC curves and corresponding area under the curve values (0.672 and 0.673), the visual characteristics of the PTPs revealed substantive between-model differences in sensitivity, specificity, PPV, and NPV across the range of Pt. Discussion and Conclusions The PTP provides improved visual display of a predictive model’s discriminative accuracy, which can enhance the practical application of predictive models for medical decision making.
- Published
- 2020
48. Lossless Distributed Linear Mixed Model with Application to Integration of Heterogeneous Healthcare Data
- Author
-
Yong Chen, Jenna Reps, Natalie E. Sheils, Rui Duan, Islam Mn, John Buresh, Martijn J. Schuemie, Mackenzie J Edmondson, Jiayi Tong, and Chongliang Luo
- Subjects
Mixed model ,Lossless compression ,Property (programming) ,Computer science ,Association (object-oriented programming) ,Patient data ,Data mining ,Healthcare data ,computer.software_genre ,Random effects model ,computer ,Generalized linear mixed model - Abstract
Linear mixed models (LMMs) are commonly used in many areas including epidemiology for analyzing multi-site data with heterogeneous site-specific random effects. However, due to the regulation of protecting patients’ privacy, sensitive individual patient data (IPD) are usually not allowed to be shared across sites. In this paper we propose a novel algorithm for distributed linear mixed models (DLMMs). Our proposed DLMM algorithm can achieve exactly the same results as if we had pooled IPD from all sites, hence the lossless property. The DLMM algorithm requires each site to contribute some aggregated data (AD) in only one iteration. We apply the proposed DLMM algorithm to analyze the association of length of stay of COVID-19 hospitalization with demographic and clinical characteristics using the administrative claims database from the UnitedHealth Group Clinical Research Database.
- Published
- 2020
- Full Text
- View/download PDF
49. Risk of hydroxychloroquine alone and in combination with azithromycin in the treatment of rheumatoid arthritis: a multinational, retrospective study
- Author
-
Jennifer C E Lane, James Weaver, Kristin Kostka, Talita Duarte-Salles, Maria Tereza F Abrahao, Heba Alghoul, Osaid Alser, Thamir M Alshammari, Patricia Biedermann, Juan M Banda, Edward Burn, Paula Casajust, Mitchell M Conover, Aedin C Culhane, Alexander Davydov, Scott L DuVall, Dmitry Dymshyts, Sergio Fernandez-Bertolin, Kristina Fišter, Jill Hardin, Laura Hester, George Hripcsak, Benjamin Skov Kaas-Hansen, Seamus Kent, Sajan Khosla, Spyros Kolovos, Christophe G Lambert, Johan van der Lei, Kristine E Lynch, Rupa Makadia, Andrea V Margulis, Michael E Matheny, Paras Mehta, Daniel R Morales, Henry Morgan-Stewart, Mees Mosseveld, Danielle Newby, Fredrik Nyberg, Anna Ostropolets, Rae Woong Park, Albert Prats-Uribe, Gowtham A Rao, Christian Reich, Jenna Reps, Peter Rijnbeek, Selva Muthu Kumaran Sathappan, Martijn Schuemie, Sarah Seager, Anthony G Sena, Azza Shoaibi, Matthew Spotnitz, Marc A Suchard, Carmen O Torre, David Vizcaya, Haini Wen, Marcel de Wilde, Junqing Xie, Seng Chan You, Lin Zhang, Oleg Zhuk, Patrick Ryan, Daniel Prieto-Alhambra, and OHDSI-COVID-19 consortium
- Subjects
Sulfasalazine ,Adverse events ,COVID-19 ,Pneumonia ,Azithromycin ,Rheumatoid arthritis ,Safety ,Hydroxychloroquine - Abstract
Background Hydroxychloroquine, a drug commonly used in the treatment of rheumatoid arthritis, has received much negative publicity for adverse events associated with its authorisation for emergency use to treat patients with COVID-19 pneumonia. We studied the safety of hydroxychloroquine, alone and in combination with azithromycin, to determine the risk associated with its use in routine care in patients with rheumatoid arthritis. Methods In this multinational, retrospective study, new user cohort studies in patients with rheumatoid arthritis aged 18 years or older and initiating hydroxychloroquine were compared with those initiating sulfasalazine and followed up over 30 days, with 16 severe adverse events studied. Self-controlled case series were done to further establish safety in wider populations, and included all users of hydroxychloroquine regardless of rheumatoid arthritis status or indication. Separately, severe adverse events associated with hydroxychloroquine plus azithromycin (compared with hydroxychloroquine plus amoxicillin) were studied. Data comprised 14 sources of claims data or electronic medical records from Germany, Japan, the Netherlands, Spain, the UK, and the USA. Propensity score stratification and calibration using negative control outcomes were used to address confounding. Cox models were fitted to estimate calibrated hazard ratios (HRs) according to drug use. Estimates were pooled where the I² value was less than 0·4. Findings The study included 956 374 users of hydroxychloroquine, 310 350 users of sulfasalazine, 323 122 users of hydroxychloroquine plus azithromycin, and 351 956 users of hydroxychloroquine plus amoxicillin. No excess risk of severe adverse events was identified when 30-day hydroxychloroquine and sulfasalazine use were compared. Selfcontrolled case series confirmed these findings. However, long-term use of hydroxychloroquine appeared to be associated with increased cardiovascular mortality (calibrated HR 1·65 [95% CI 1·12–2·44]). Addition of azithromycin appeared to be associated with an increased risk of 30-day cardiovascular mortality (calibrated HR 2·19 [95% CI 1·22–3·95]), chest pain or angina (1·15 [1·05–1·26]), and heart failure (1·22 [1·02–1·45]). Interpretation Hydroxychloroquine treatment appears to have no increased risk in the short term among patients with rheumatoid arthritis, but in the long term it appears to be associated with excess cardiovascular mortality. The addition of azithromycin increases the risk of heart failure and cardiovascular mortality even in the short term. We call for careful consideration of the benefit–risk trade-off when counselling those on hydroxychloroquine treatment.
- Published
- 2020
50. To Include, or Not Include, that is the Question: An Empirical Analysis of Dealing with Patients who are Lost to Follow-up when Developing Prognostic Models Using a Cohort Design
- Author
-
Peter R. Rijnbeek, Jenna Reps, Patrick B. Ryan, Alana Cuthbert, Martijn J. Schuemie, and Nicole L. Pratt
- Subjects
medicine.medical_specialty ,business.industry ,medicine ,Lost to follow-up ,Intensive care medicine ,business ,Prognostic models ,Cohort study - Abstract
Background: Researchers developing prediction models are faced with numerous design choices that may impact model performance. One of the main decisions is how to include patients who are lost to follow-up. In this paper we perform a large-scale empirical evaluation investigating the impact of this decision. In addition, we aim to provide guidelines for how to deal with loss to follow-up. Methods: We generate a synthetic dataset with complete follow-up and simulate loss to follow-up based either on random selection or on selection based on comorbidity. We investigate four simple strategies for developing models using data containing some patients with loss to follow-up. Three strategies employ a binary classifier with data that: i) include all patients (including those lost to follow-up), ii) exclude all patients lost to follow-up or iii) only exclude patients lost to follow-up who do not have the outcome before being lost to follow-up. The fourth strategy uses a survival model with data that include all patients. In addition to our synthetic data study, we empirically evaluate the discrimination and calibration performance of these strategies across 21 prediction problems using real-world data. Results: The synthetic data study results show that excluding patients lost to follow-up can introduce bias when loss to follow-up is common and does not occur at random. However, when loss to follow-up was completely at random, the choice of addressing it had negligible impact on the model performance. Our empirical results showed that the four design choices investigated to deal with loss to follow-up resulted in comparable performance when the time-at-risk was 1-year, but demonstrated differential bias when we looking into 3-year time-at-risk. Removing patients who are lost to follow-up before the outcome but keeping patients who are loss to follow-up after the outcome can bias a model and should be avoided. Conclusion: Based on this study we therefore recommend i) developing models using data that includes patients that are lost to follow-up and ii) evaluate the discrimination and calibration of models twice: on a test set including patients lost to follow-up and a test set excluding patients lost to follow-up.
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.