98 results on '"Computable phenotype"'
Search Results
2. Develop and validate a computable phenotype for the identification of Alzheimer's disease patients using electronic health record data.
- Author
-
He, Xing, Wei, Ruoqi, Huang, Yu, Chen, Zhaoyi, Lyu, Tianchen, Bost, Sarah, Tong, Jiayi, Li, Lu, Zhou, Yujia, Li, Zhao, Guo, Jingchuan, Tang, Huilin, Wang, Fei, DeKosky, Steven, Xu, Hua, Chen, Yong, Zhang, Rui, Xu, Jie, Guo, Yi, and Wu, Yonghui
- Subjects
ELECTRONIC health records ,ALZHEIMER'S patients ,ALZHEIMER'S disease ,PHENOTYPES ,DATA recorders & recording - Abstract
INTRODUCTION: Alzheimer's disease (AD) is often misclassified in electronic health records (EHRs) when relying solely on diagnosis codes. This study aimed to develop a more accurate, computable phenotype (CP) for identifying AD patients using structured and unstructured EHR data. METHODS: We used EHRs from the University of Florida Health (UFHealth) system and created rule‐based CPs iteratively through manual chart reviews. The CPs were then validated using data from the University of Texas Health Science Center at Houston (UTHealth) and the University of Minnesota (UMN). RESULTS: Our best‐performing CP was "patient has at least 2 AD diagnoses and AD‐related keywords in AD encounters," with an F1‐score of 0.817 at UF, 0.961 at UTHealth, and 0.623 at UMN, respectively. DISCUSSION: We developed and validated rule‐based CPs for AD identification with good performance, which will be crucial for studies that aim to use real‐world data like EHRs. Highlights: Developed a computable phenotype (CP) to identify Alzheimer's disease (AD) patients using EHR data.Utilized both structured and unstructured EHR data to enhance CP accuracy.Achieved a high F1‐score of 0.817 at UFHealth, and 0.961 and 0.623 at UTHealth and UMN.Validated the CP across different demographics, ensuring robustness and fairness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Development and validation of a computable phenotype for Turner syndrome utilizing electronic health records from a national pediatric network.
- Author
-
Huang, Sarah D., Bamba, Vaneeta, Bothwell, Samantha, Fechner, Patricia Y., Furniss, Anna, Ikomi, Chijioke, Nahata, Leena, Nokoff, Natalie J., Pyle, Laura, Seyoum, Helina, and Davis, Shanlee M.
- Abstract
Turner syndrome (TS) is a genetic condition occurring in ~1 in 2000 females characterized by the complete or partial absence of the second sex chromosome. TS research faces similar challenges to many other pediatric rare disease conditions, with homogenous, single‐center, underpowered studies. Secondary data analyses utilizing electronic health record (EHR) have the potential to address these limitations; however, an algorithm to accurately identify TS cases in EHR data is needed. We developed a computable phenotype to identify patients with TS using PEDSnet, a pediatric research network. This computable phenotype was validated through chart review; true positives and negatives and false positives and negatives were used to assess accuracy at both primary and external validation sites. The optimal algorithm consisted of the following criteria: female sex, ≥1 outpatient encounter, and ≥3 encounters with a diagnosis code that maps to TS, yielding an average sensitivity of 0.97, specificity of 0.88, and C‐statistic of 0.93 across all sites. The accuracy of any estradiol prescriptions yielded an average C‐statistic of 0.91 across sites and 0.80 for transdermal and oral formulations separately. PEDSnet and computable phenotyping are powerful tools in providing large, diverse samples to pragmatically study rare pediatric conditions like TS. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Develop and validate a computable phenotype for the identification of Alzheimer's disease patients using electronic health record data
- Author
-
Xing He, Ruoqi Wei, Yu Huang, Zhaoyi Chen, Tianchen Lyu, Sarah Bost, Jiayi Tong, Lu Li, Yujia Zhou, Zhao Li, Jingchuan Guo, Huilin Tang, Fei Wang, Steven DeKosky, Hua Xu, Yong Chen, Rui Zhang, Jie Xu, Yi Guo, Yonghui Wu, and Jiang Bian
- Subjects
Alzheimer's disease ,computable phenotype ,electronic health record ,Neurology. Diseases of the nervous system ,RC346-429 ,Geriatrics ,RC952-954.6 - Abstract
Abstract INTRODUCTION Alzheimer's disease (AD) is often misclassified in electronic health records (EHRs) when relying solely on diagnosis codes. This study aimed to develop a more accurate, computable phenotype (CP) for identifying AD patients using structured and unstructured EHR data. METHODS We used EHRs from the University of Florida Health (UFHealth) system and created rule‐based CPs iteratively through manual chart reviews. The CPs were then validated using data from the University of Texas Health Science Center at Houston (UTHealth) and the University of Minnesota (UMN). RESULTS Our best‐performing CP was “patient has at least 2 AD diagnoses and AD‐related keywords in AD encounters,” with an F1‐score of 0.817 at UF, 0.961 at UTHealth, and 0.623 at UMN, respectively. DISCUSSION We developed and validated rule‐based CPs for AD identification with good performance, which will be crucial for studies that aim to use real‐world data like EHRs. Highlights Developed a computable phenotype (CP) to identify Alzheimer's disease (AD) patients using EHR data. Utilized both structured and unstructured EHR data to enhance CP accuracy. Achieved a high F1‐score of 0.817 at UFHealth, and 0.961 and 0.623 at UTHealth and UMN. Validated the CP across different demographics, ensuring robustness and fairness.
- Published
- 2024
- Full Text
- View/download PDF
5. A Computable Phenotype Algorithm for Postvaccination Myocarditis/Pericarditis Detection Using Real-World Data: Validation Study.
- Author
-
Deady, Matthew, Duncan, Raymond, Sonesen, Matthew, Estiandan, Renier, Stimpert, Kelly, Cho, Sylvia, Beers, Jeffrey, Goodness, Brian, Jones, Lance Daniel, Forshee, Richard, Anderson, Steven A, and Ezzeldin, Hussein
- Subjects
MEDICAL personnel ,EMERGENCY use authorization ,VACCINE safety ,ELECTRONIC health records ,MEDICAL records - Abstract
Background: Adverse events (AEs) associated with vaccination have traditionally been evaluated by epidemiological studies. More recently, they have gained attention due to the emergency use authorization of several COVID-19 vaccines. As part of its responsibility to conduct postmarket surveillance, the US Food and Drug Administration continues to monitor several AEs of interest to ensure the safety of vaccines, including those for COVID-19. Objective: This study is part of the Biologics Effectiveness and Safety Initiative, which aims to improve the US Food and Drug Administration's postmarket surveillance capabilities while minimizing the burden of collecting clinical data on suspected postvaccination AEs. The objective of this study was to enhance active surveillance efforts through a pilot platform that can receive automatically reported AE cases through a health care data exchange. Methods: We detected cases by sharing and applying computable phenotype algorithms to real-world data in health care providers' electronic health records databases. Using the fast healthcare interoperability resources standard for secure data transmission, we implemented a computable phenotype algorithm on a new health care system. The study focused on the algorithm's positive predictive value, validated through clinical records, assessing both the time required for implementation and the accuracy of AE detection. Results: The algorithm required 200-250 hours to implement and optimize. Of the 6,574,420 clinical encounters across 694,151 patients, 30 cases were identified as potential myocarditis/pericarditis. Of these, 26 cases were retrievable, and 24 underwent clinical validation. In total, 14 cases were confirmed as definite or probable myocarditis/pericarditis, yielding a positive predictive value of 58.3% (95% CI 37.3%-76.9%). These findings underscore the algorithm's capability for real-time detection of AEs, though they also highlight variability in performance across different health care systems. Conclusions: The study advocates for the ongoing refinement and application of distributed computable phenotype algorithms to enhance AE detection capabilities. These tools are crucial for comprehensive postmarket surveillance and improved vaccine safety monitoring. The outcomes suggest the need for further optimization to achieve more consistent results across diverse health care settings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data
- Author
-
Klann, Jeffrey G, Estiri, Hossein, Weber, Griffin M, Moal, Bertrand, Avillach, Paul, Hong, Chuan, Tan, Amelia LM, Beaulieu-Jones, Brett K, Castro, Victor, Maulhardt, Thomas, Geva, Alon, Malovini, Alberto, South, Andrew M, Visweswaran, Shyam, Morris, Michele, Samayamuthu, Malarkodi J, Omenn, Gilbert S, Ngiam, Kee Yuan, Mandl, Kenneth D, Boeker, Martin, Olson, Karen L, Mowery, Danielle L, Follett, Robert W, Hanauer, David A, Bellazzi, Riccardo, Moore, Jason H, Loh, Ne-Hooi Will, Bell, Douglas S, Wagholikar, Kavishwar B, Chiovato, Luca, Tibollo, Valentina, Rieg, Siegbert, Li, Anthony LLJ, Jouhet, Vianney, Schriver, Emily, Xia, Zongqi, Hutch, Meghan, Luo, Yuan, Kohane, Isaac S, EHR, The Consortium for Clinical Characterization of COVID-19 by, Brat, Gabriel A, and Murphy, Shawn N
- Subjects
Health Services and Systems ,Health Sciences ,Emerging Infectious Diseases ,Infectious Diseases ,Patient Safety ,Machine Learning and Artificial Intelligence ,Networking and Information Technology R&D (NITRD) ,Coronaviruses ,Good Health and Well Being ,COVID-19 ,Electronic Health Records ,Hospitalization ,Humans ,Machine Learning ,Prognosis ,ROC Curve ,Sensitivity and Specificity ,Severity of Illness Index ,novel coronavirus ,disease severity ,computable phenotype ,medical informatics ,data networking ,data interoperability ,Consortium for Clinical Characterization of COVID-19 by EHR (4CE) ,Information and Computing Sciences ,Engineering ,Medical and Health Sciences ,Medical Informatics ,Biomedical and clinical sciences ,Health sciences ,Information and computing sciences - Abstract
ObjectiveThe Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity.Materials and methodsTwelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site.ResultsThe full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability-up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared with chart review.DiscussionWe developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions.ConclusionsWe developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.
- Published
- 2021
7. Development and Evaluation of a Rules-based Algorithm for Primary Open-Angle Glaucoma in the VA Million Veteran Program.
- Author
-
Nealon, Cari L., Halladay, Christopher W., Kinzy, Tyler G., Simpson, Piana, Canania, Rachael L., Anthony, Scott A., Roncone, David P., Sawicki Rogers, Lea R., Leber, Jenna N., Dougherty, Jacquelyn M., Sullivan, Jack M., Wu, Wen-Chih, Greenberg, Paul B., Iyengar, Sudha K., Crawford, Dana C., Peachey, Neal S., and Cooke Bailey, Jessica N.
- Subjects
- *
OPEN-angle glaucoma , *VETERANS , *ELECTRONIC health records , *ALGORITHMS , *AFRICAN Americans - Abstract
The availability of electronic health record (EHR)-linked biobank data for research presents opportunities to better understand complex ocular diseases. Developing accurate computable phenotypes for ocular diseases for which gold standard diagnosis includes imaging remains inaccessible in most biobank-linked EHRs. The objective of this study was to develop and validate a computable phenotype to identify primary open-angle glaucoma (POAG) through accessing the Department of Veterans Affairs (VA) Computerized Patient Record System (CPRS) and Million Veteran Program (MVP) biobank. Accessing CPRS clinical ophthalmology data from VA Medical Center Eye Clinic (VAMCEC) patients, we developed and iteratively refined POAG case and control algorithms based on clinical, prescription, and structured diagnosis data (ICD-CM codes). Refinement was performed via detailed chart review, initially at a single VAMCEC (n = 200) and validated at two additional VAMCECs (n = 100 each). Positive and negative predictive values (PPV, NPV) were computed as the proportion of CPRS patients correctly classified with POAG or without POAG, respectively, by the algorithms, validated by ophthalmologists and optometrists with access to gold-standard clinical diagnosis data. The final algorithms performed better than previously reported approaches in assuring the accuracy and reproducibility of POAG classification (PPV >83% and NPV >97%) with consistent performance in Black or African American and in White Veterans. Applied to the MVP to identify cases and controls, genetic analysis of a known POAG-associated locus further validated the algorithms. We conclude that ours is a viable approach to use combined EHR-genetic data to study patients with complex diseases that require imaging confirmation. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Capabilities and consequences of data mapping in emergent health scenarios: Using a multi-site COVID-19 research data set as an example
- Author
-
Kothari, Shikha Yashwant
- Subjects
Information technology ,Information science ,Medicine ,CDM ,computable phenotype ,Data ,healthcare ,infrastructure ,OMOP - Abstract
During the Coronavirus Disease 2019 (COVID-19) pandemic, a public health emergency (PHE) was declared by the United States (U.S.) government, reducing the number of in-person clinic visits and increasing telemedicine utilization.1–12 Healthcare reimbursement guidelines evolved on an ongoing basis and a lack of standardization in procedure coding for telemedicine visits created confusion amongst providers.13–17 This thesis focuses on a standardized, multi-site data repository, the University of California (UC) COVID-19 Research Dataset (UC CORDS) and uses it as an example to review the downstream consequences of ad-hoc data mapping of new services such as telemedicine visits to formalized coding systems during the COVID-19 pandemic. The findings are then translated to recommendations for creating best practices to combat challenges associated with building computable phenotypes for complex multi-site data in emergent health scenarios.Included patients had a COVID-19 test result mapping to the designated LOINC codes between Feb 2020 to Feb 2021. My study results reflect the lack of standardization in standard vocabulary naming conventions and concept mapping for telehealth. This makes it difficult for researchers to find telehealth-specific data from CDM datasets like UC CORDS, which only capture data mapped to standard vocabularies. My journey through this master’s thesis also highlights the multiple data access, data fluency, and data management challenges that clinical researchers face with complex healthcare datasets such as UC CORDS. In conclusion, although telemedicine has been considered beneficial for several years, the COVID-19 pandemic offered the best opportunity to improve telemedicine services and fully integrate them into healthcare reimbursement workflows and healthcare information systems. Based on the outcomes of this study, there is still room for process improvement in regard to handling the needs of data capture for new services in emergency scenarios, and healthcare institutions should involve multiple key stakeholders at an earlier stage when developing and implementing a digital infrastructure.
- Published
- 2023
9. Identification of patients with drug‐resistant epilepsy in electronic medical record data using the Observational Medical Outcomes Partnership Common Data Model.
- Author
-
Castano, Victor G., Spotnitz, Matthew, Waldman, Genna J., Joiner, Evan F., Choi, Hyunmi, Ostropolets, Anna, Natarajan, Karthik, McKhann, Guy M., Ottman, Ruth, Neugut, Alfred I., Hripcsak, George, and Youngerman, Brett E.
- Subjects
- *
ELECTRONIC health records , *PEOPLE with epilepsy , *MEDICAL partnership , *DATA modeling , *DATA recorders & recording - Abstract
Objective: More than one third of appropriately treated patients with epilepsy have continued seizures despite two or more medication trials, meeting criteria for drug‐resistant epilepsy (DRE). Accurate and reliable identification of patients with DRE in observational data would enable large‐scale, real‐world comparative effectiveness research and improve access to specialized epilepsy care. In the present study, we aim to develop and compare the performance of computable phenotypes for DRE using the Observational Medical Outcomes Partnership (OMOP) Common Data Model. Methods: We randomly sampled 600 patients from our academic medical center's electronic health record (EHR)‐derived OMOP database meeting previously validated criteria for epilepsy (January 2015–August 2021). Two reviewers manually classified patients as having DRE, drug‐responsive epilepsy, undefined drug responsiveness, or no epilepsy as of the last EHR encounter in the study period based on consensus definitions. Demographic characteristics and codes for diagnoses, antiseizure medications (ASMs), and procedures were tested for association with DRE. Algorithms combining permutations of these factors were applied to calculate sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) for DRE. The F1 score was used to compare overall performance. Results: Among 412 patients with source record‐confirmed epilepsy, 62 (15.0%) had DRE, 163 (39.6%) had drug‐responsive epilepsy, 124 (30.0%) had undefined drug responsiveness, and 63 (15.3%) had insufficient records. The best performing phenotype for DRE in terms of the F1 score was the presence of ≥1 intractable epilepsy code and ≥2 unique non‐gabapentinoid ASM exposures each with ≥90‐day drug era (sensitivity =.661, specificity =.937, PPV =.594, NPV =.952, F1 score =.626). Several phenotypes achieved higher sensitivity at the expense of specificity and vice versa. Significance: OMOP algorithms can identify DRE in EHR‐derived data with varying tradeoffs between sensitivity and specificity. These computable phenotypes can be applied across the largest international network of standardized clinical databases for further validation, reproducible observational research, and improving access to appropriate care. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. Computable Phenotype of a Crohn's Disease Natural History Model.
- Author
-
Kurowski, Jacob A., Achkar, Jean-Paul, Sugano, David, Milinovich, Alex, Ji, Xinge, Bauman, Janine, Griffin, Keyonna R., and Kattan, Michael W.
- Abstract
Background: Analytic tools to study important clinical issues in complex, chronic diseases such as Crohn's disease (CD) include randomized trials, claims database studies, or small longitudinal epidemiologic cohorts. Using natural language processing (NLP), we sought to define the computable phenotype health state of pediatric and adult CD and develop patient-level longitudinal histories for health outcomes. Methods: We defined 6 health states for CD using a subjective symptom-based assessment (symptomatic/asymptomatic) and an objective disease state assessment (active/inactive/no testing). Gold standard for the 6 health states was derived using an iterative process during review by our CD experts. We calculated the transition probabilities to estimate the time to transitions between the various health states using nonparametric Kaplan-Meier estimation and a Markov model. Finally, we determined a standard utility measure from clinical patients assigned to different health states. Results: The NLP computable phenotype health state model correctly ascertained the objective test results and symptoms 96% and 85% of the time, respectively, based on a blinded chart evaluation. In our model, >25% of patients who begin as asymptomatic/active transition to symptomatic/active over the following year. For both adult and pediatric CD health states, the utility assessments of a symptomatic/inactive health state closely resembled a symptomatic/active health state. Conclusions: Our methodology for a computable phenotype health state demonstrates the application of real-world data to define progression and optimal management of a chronic disease such as CD. The application of the model has the potential to lead to a better understanding of the true impact of a therapeutic intervention and can provide long-term cost-effectiveness analyses for a new therapy. Highlights: Using natural language processing, we defined the computable phenotype health state of Crohn's disease and developed patient-level longitudinal histories for health outcomes. Our methodology demonstrates the application of real-world data to define the progression of a chronic disease. The application of the model has the potential to provide better understanding of the true impact of a new therapy. Graphical Abstract This is a visual representation of the abstract. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. External validation of an opioid misuse machine learning classifier in hospitalized adult patients
- Author
-
Majid Afshar, Brihat Sharma, Sameer Bhalla, Hale M. Thompson, Dmitriy Dligach, Randy A. Boley, Ekta Kishen, Alan Simmons, Kathryn Perticone, and Niranjan S. Karnik
- Subjects
Opioid misuse ,Heroin ,Opioid use disorder ,Natural language processing ,Machine learning ,Computable phenotype ,Medicine (General) ,R5-920 ,Social pathology. Social and public welfare. Criminology ,HV1-9960 - Abstract
Abstract Background Opioid misuse screening in hospitals is resource-intensive and rarely done. Many hospitalized patients are never offered opioid treatment. An automated approach leveraging routinely captured electronic health record (EHR) data may be easier for hospitals to institute. We previously derived and internally validated an opioid classifier in a separate hospital setting. The aim is to externally validate our previously published and open-source machine-learning classifier at a different hospital for identifying cases of opioid misuse. Methods An observational cohort of 56,227 adult hospitalizations was examined between October 2017 and December 2019 during a hospital-wide substance use screening program with manual screening. Manually completed Drug Abuse Screening Test served as the reference standard to validate a convolutional neural network (CNN) classifier with coded word embedding features from the clinical notes of the EHR. The opioid classifier utilized all notes in the EHR and sensitivity analysis was also performed on the first 24 h of notes. Calibration was performed to account for the lower prevalence than in the original cohort. Results Manual screening for substance misuse was completed in 67.8% (n = 56,227) with 1.1% (n = 628) identified with opioid misuse. The data for external validation included 2,482,900 notes with 67,969 unique clinical concept features. The opioid classifier had an AUC of 0.99 (95% CI 0.99–0.99) across the encounter and 0.98 (95% CI 0.98–0.99) using only the first 24 h of notes. In the calibrated classifier, the sensitivity and positive predictive value were 0.81 (95% CI 0.77–0.84) and 0.72 (95% CI 0.68–0.75). For the first 24 h, they were 0.75 (95% CI 0.71–0.78) and 0.61 (95% CI 0.57–0.64). Conclusions Our opioid misuse classifier had good discrimination during external validation. Our model may provide a comprehensive and automated approach to opioid misuse identification that augments current workflows and overcomes manual screening barriers.
- Published
- 2021
- Full Text
- View/download PDF
12. Ensuring equitable, inclusive and meaningful gender identity- and sexual orientation-related data collection in the healthcare sector: insights from a critical, pragmatic systematic review of the literature.
- Author
-
Bragazzi, Nicola Luigi, Khamisy-Farah, Rola, and Converti, Manlio
- Subjects
- *
SEXUAL orientation , *HEALTH care industry , *ONLINE information services , *SYSTEMATIC reviews , *SELF-evaluation , *NATURAL language processing , *HUMAN sexuality , *ACQUISITION of data , *ARTIFICIAL intelligence , *GENDER identity , *LGBTQ+ people , *SEX customs , *SEXUAL orientation identity , *SOCIODEMOGRAPHIC factors , *MEDLINE , *RESPECT , *SOCIAL integration , *DATA mining - Abstract
In several countries, no gender identity- and sexual orientation-related data is routinely collected, if not for specific health or administrative/social purposes. Implementing and ensuring equitable and inclusive socio-demographic data collection is of paramount importance, given that the LGBTI community suffers from a disproportionate burden in terms of both communicable and non-communicable diseases. To the best of the authors' knowledge, there exists no systematic review addressing the methods that can be implemented in capturing gender identity- and sexual orientation-related data in the healthcare sector. A systematic literature review was conducted for filling in this gap of knowledge. Twenty-three articles were retained and analysed: two focussed on self-reported data, two on structured/semi-structured data, seven on text-mining, natural language processing, and other emerging artificial intelligence-based techniques, two on challenges in capturing sexual and gender-diverse populations, eight on the willingness to disclose gender identity and sexual orientation, and, finally, two on integrating structured and unstructured data. Our systematic literature review found that, despite the importance of collecting gender identity- and sexual orientation-related data and its increasing societal acceptance from the LGBTI community, several issues have to be addressed yet. Transgender, non-binary identities, and also intersex individuals remain often invisible and marginalized. In the last decades, there has been an increasing adoption of structured data. However, exploiting unstructured data seems to overperform in identifying LGBTI members, especially integrating structured and unstructured data. Self-declared/self-perceived/self-disclosed definitions, while being respectful of one's perception, may not completely be aligned with sexual behaviours and activities. Incorporating different levels of information (biological, socio-demographic, behavioural, and clinical) would enable overcoming this pitfall. A shift from a rigid/static nomenclature towards a more nuanced, dynamic, 'fuzzy' concept of a 'computable phenotype' has been proposed in the literature to capture the complexity of sexual identities and trajectories. On the other hand, excessive fragmentation has to be avoided considering that: (i) a full list of options including all gender identities and sexual orientations will never be available; (ii) these options should be easily understood by the general population, and (iii) these options should be consistent in such a way that can be compared among various studies and surveys. Only in this way, data collection can be clinically meaningful: that is to say, to impact clinical outcomes at the individual and population level, and to promote further research in the field. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
- Author
-
Brihat Sharma, Dmitriy Dligach, Kristin Swope, Elizabeth Salisbury-Afshar, Niranjan S. Karnik, Cara Joyce, and Majid Afshar
- Subjects
Opioid misuse ,Heroin ,Opioid use disorder ,Natural language processing ,Machine learning ,Computable phenotype ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Background Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier. Methods An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling (n = 1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration. Results Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs > 0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms ‘Heroin’ and ‘Victim of abuse’. Conclusions We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns.
- Published
- 2020
- Full Text
- View/download PDF
14. Applying computable phenotypes within a common data model to identify heart failure patients for an implantable cardiac device registry
- Author
-
Jove Graham, Andy Iverson, Joao Monteiro, Katherine Weiner, Kara Southall, Katherine Schiller, Mudit Gupta, and Edgar P. Simard
- Subjects
Comorbidities heart failure ,Clinical trial ,Registry ,Electronic health record ,Computable phenotype ,Common data model ,Diseases of the circulatory (Cardiovascular) system ,RC666-701 - Abstract
Background: Use of existing data in electronic health records (EHRs) could be used more extensively to better leverage real world data for clinical studies, but only if standard, reliable processes are developed. Numerous computable phenotypes have been validated against manual chart review, and common data models (CDMs) exist to aid implementation of such phenotypes across platforms and sites. Our objective was to measure consistency between data that had previously been manually collected for an implantable cardiac device registry and CDM-based phenotypes for the condition of heart failure (HF). Methods: Patients enrolled in an implantable cardiac device registry at two hospitals from 2013 to 2018 contributed to this analysis wherein registry data were compared to PCORnet CDM-formatted EHR data. Seven different phenotype algorithms were used to search for the presence of HF and compare the results with the registry. Sensitivity, specificity, predictive value and congruence were calculated for each phenotype. Results: In the registry, 176 of 319 (55%) patients had history of HF, compared with different phenotypes estimating between 96 (30%) and 188 (59%). The least-restrictive phenotypes (any diagnosis) had high sensitivity and specificity (90%/80%), but more restrictive phenotypes had higher specificity (e.g., code present in problem list, 94%). Differences were observed using time-based criteria (e.g., days between visit diagnoses) and between participating hospitals. Conclusions: Consistency between manually-collected registry data and CDM-based phenotypes for history of HF was high overall, but use of different phenotypes impacted sensitivity and specificity, and results may differ depending on the medical condition of interest.
- Published
- 2022
- Full Text
- View/download PDF
15. Identification of Incident Atrial Fibrillation From Electronic Medical Records
- Author
-
Alanna M. Chamberlain, Véronique L. Roger, Peter A. Noseworthy, Lin Y. Chen, Susan A. Weston, Ruoxiang Jiang, and Alvaro Alonso
- Subjects
atrial fibrillation ,computable phenotype ,electronic medical records ,Diseases of the circulatory (Cardiovascular) system ,RC666-701 - Abstract
Background Electronic medical records are increasingly used to identify disease cohorts; however, computable phenotypes using electronic medical record data are often unable to distinguish between prevalent and incident cases. Methods and Results We identified all Olmsted County, Minnesota residents aged ≥18 with a first‐ever International Classification of Diseases, Ninth Revision (ICD‐9) diagnostic code for atrial fibrillation or atrial flutter from 2000 to 2014 (N=6177), and a random sample with an International Classification of Diseases, Tenth Revision (ICD‐10) code from 2016 to 2018 (N=200). Trained nurse abstractors reviewed all medical records to validate the events and ascertain the date of onset (incidence date). Various algorithms based on number and types of codes (inpatient/outpatient), medications, and procedures were evaluated. Positive predictive value (PPV) and sensitivity of the algorithms were calculated. The lowest PPV was observed for 1 code (64.4%), and the highest PPV was observed for 2 codes (any type) >7 days apart but within 1 year (71.6%). Requiring either 1 inpatient or 2 outpatient codes separated by >7 days but within 1 year had the best balance between PPV (69.9%) and sensitivity (95.5%). PPVs were slightly higher using ICD‐10 codes. Requiring an anticoagulant or antiarrhythmic prescription or electrical cardioversion in addition to diagnostic code(s) modestly improved the PPVs at the expense of large reductions in sensitivity. Conclusions We developed simple, exportable, computable phenotypes for atrial fibrillation using structured electronic medical record data. However, use of diagnostic codes to identify incident atrial fibrillation is prone to some misclassification. Further study is warranted to determine whether more complex phenotypes, including unstructured data sources or using machine learning techniques, may improve the accuracy of identifying incident atrial fibrillation.
- Published
- 2022
- Full Text
- View/download PDF
16. Development and Evaluation of Computable Phenotypes in Pediatric Epilepsy:3 Cases.
- Author
-
Pan, Sabrina, Wu, Alan, Weiner, Mark, and M Grinspan, Zachary
- Subjects
- *
EPILEPSY , *CEREBRAL anoxia-ischemia , *MAGNETIC resonance imaging , *ELECTRONIC health records , *CHILDHOOD epilepsy , *DIAGNOSIS of epilepsy , *PHENOTYPES - Abstract
Introduction: Computable phenotypes allow identification of well-defined patient cohorts from electronic health record data. Little is known about the accuracy of diagnostic codes for important clinical concepts in pediatric epilepsy, such as (1) risk factors like neonatal hypoxic-ischemic encephalopathy; (2) clinical concepts like treatment resistance; (3) and syndromes like juvenile myoclonic epilepsy. We developed and evaluated the performance of computable phenotypes for these examples using electronic health record data at one center. Methods: We identified gold standard cohorts for neonatal hypoxic-ischemic encephalopathy, pediatric treatment-resistant epilepsy, and juvenile myoclonic epilepsy via existing registries and review of clinical notes. From the electronic health record, we extracted diagnostic and procedure codes for all children with a diagnosis of epilepsy and seizures. We used these codes to develop computable phenotypes and evaluated by sensitivity, positive predictive value, and the F-measure. Results: For neonatal hypoxic-ischemic encephalopathy, the best-performing computable phenotype (HIE ICD-9 /10 and [brain magnetic resonance imaging (MRI) or electroencephalography (EEG) within 120 days of life] and absence of commonly miscoded conditions) had high sensitivity (95.7%, 95% confidence interval [CI] 85-99), positive predictive value (100%, 95% CI 95-100), and F measure (0.98). For treatment-resistant epilepsy, the best-performing computable phenotype (3 or more antiseizure medicines in the last 2 years or treatment-resistant ICD-10) had a sensitivity of 86.9% (95% CI 79-93), positive predictive value of 69.6% (95% CI 60-79), and F-measure of 0.77. For juvenile myoclonic epilepsy, the best performing computable phenotype (JME ICD-10) had poor sensitivity (52%, 95% CI 43-60) but high positive predictive value (90.4%, 95% CI 81-96); the F measure was 0.66. Conclusion: The variable accuracy of our computable phenotypes (hypoxic-ischemic encephalopathy high, treatment resistance medium, and juvenile myoclonic epilepsy low) demonstrates the heterogeneity of success using administrative data to identify cohorts important for pediatric epilepsy research. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
17. Phenotyping COVID-19 Patients by Ventilation Therapy: Data Quality Challenges and Cohort Characterization.
- Author
-
ESSAY, Patrick, MOSIER, Jarrod, and SUBBIAN, Vignesh
- Abstract
The COVID-19 pandemic introduced unique challenges for treating acute respiratory failure patients and highlighted the need for reliable phenotyping of patients using retrospective electronic health record data. In this study, we applied a rule-based phenotyping algorithm to classify COVID-19 patients requiring ventilatory support. We analyzed patient outcomes of the different phenotypes based on type and sequence of ventilation therapy. Invasive mechanical ventilation, noninvasive positive pressure ventilation, and high flow nasal insufflation were three therapies used to phenotype patients leading to a total of seven subgroups; patients treated with a single therapy (3), patients treated with either form of noninvasive ventilation and subsequently requiring intubation (2), and patients initially intubated and then weaned onto a noninvasive therapy (2). In addition to summary statistics for each phenotype, we highlight data quality challenges and importance of mapping to standard terminologies. This work illustrates potential impact of accurate phenotyping on patient-level and system-level outcomes including appropriate resource allocation under resource constrained circumstances. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
18. External validation of an opioid misuse machine learning classifier in hospitalized adult patients.
- Author
-
Afshar, Majid, Sharma, Brihat, Bhalla, Sameer, Thompson, Hale M., Dligach, Dmitriy, Boley, Randy A., Kishen, Ekta, Simmons, Alan, Perticone, Kathryn, and Karnik, Niranjan S.
- Subjects
OPIOID abuse ,DRUG use testing ,MACHINE learning ,SUBSTANCE abuse ,HOSPITAL patients - Abstract
Background: Opioid misuse screening in hospitals is resource-intensive and rarely done. Many hospitalized patients are never offered opioid treatment. An automated approach leveraging routinely captured electronic health record (EHR) data may be easier for hospitals to institute. We previously derived and internally validated an opioid classifier in a separate hospital setting. The aim is to externally validate our previously published and open-source machine-learning classifier at a different hospital for identifying cases of opioid misuse. Methods: An observational cohort of 56,227 adult hospitalizations was examined between October 2017 and December 2019 during a hospital-wide substance use screening program with manual screening. Manually completed Drug Abuse Screening Test served as the reference standard to validate a convolutional neural network (CNN) classifier with coded word embedding features from the clinical notes of the EHR. The opioid classifier utilized all notes in the EHR and sensitivity analysis was also performed on the first 24 h of notes. Calibration was performed to account for the lower prevalence than in the original cohort. Results: Manual screening for substance misuse was completed in 67.8% (n = 56,227) with 1.1% (n = 628) identified with opioid misuse. The data for external validation included 2,482,900 notes with 67,969 unique clinical concept features. The opioid classifier had an AUC of 0.99 (95% CI 0.99–0.99) across the encounter and 0.98 (95% CI 0.98–0.99) using only the first 24 h of notes. In the calibrated classifier, the sensitivity and positive predictive value were 0.81 (95% CI 0.77–0.84) and 0.72 (95% CI 0.68–0.75). For the first 24 h, they were 0.75 (95% CI 0.71–0.78) and 0.61 (95% CI 0.57–0.64). Conclusions: Our opioid misuse classifier had good discrimination during external validation. Our model may provide a comprehensive and automated approach to opioid misuse identification that augments current workflows and overcomes manual screening barriers. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
19. Optimizing Identification of People Living with HIV from Electronic Medical Records: Computable Phenotype Development and Validation.
- Author
-
Liu, Yiyang, Siddiqi, Khairul A., Cook, Robert L., Bian, Jiang, Squires, Patrick J., Shenkman, Elizabeth A., Prosperi, Mattia, and Jayaweera, Dushyantha T.
- Abstract
Background: Electronic health record (EHR)-based computable phenotype algorithms allow researchers to efficiently identify a large virtual cohort of Human Immunodeficiency Virus (HIV) patients. Built upon existing algorithms, we refined, improved, and validated an HIV phenotype algorithm using data from the OneFlorida Data Trust, a repository of linked claims data and EHRs from its clinical partners, which provide care to over 15 million patients across all 67 counties in Florida.Methods: Our computable phenotype examined information from multiple EHR domains, including clinical encounters with diagnoses, prescription medications, and laboratory tests. To identify an HIV case, the algorithm requires the patient to have at least one diagnostic code for HIV and meet one of the following criteria: have 1+ positive HIV laboratory, have been prescribed with HIV medications, or have 3+ visits with HIV diagnostic codes. The computable phenotype was validated against a subset of clinical notes.Results: Among the 15+ million patients from OneFlorida, we identified 61,313 patients with confirmed HIV diagnosis. Among them, 8.05% met all four inclusion criteria, 69.7% met the 3+ HIV encounters criteria in addition to having HIV diagnostic code, and 8.1% met all criteria except for having positive laboratories. Our algorithm achieved higher sensitivity (98.9%) and comparable specificity (97.6%) relative to existing algorithms (77-83% sensitivity, 86-100% specificity). The mean age of the sample was 42.7 years, 58% male, and about half were Black African American. Patients' average follow-up period (the time between the first and last encounter in the EHRs) was approximately 4.6 years. The median number of all encounters and HIV-related encounters were 79 and 21, respectively.Conclusion: By leveraging EHR data from multiple clinical partners and domains, with a considerably diverse population, our algorithm allows more flexible criteria for identifying patients with incomplete laboratory test results and medication prescribing history compared with prior studies. [ABSTRACT FROM AUTHOR]- Published
- 2021
- Full Text
- View/download PDF
20. Development of Interoperable Computable Phenotype Algorithms for Adverse Events of Special Interest to Be Used for Biologics Safety Surveillance: Validation Study.
- Author
-
Holdefer AA, Pizarro J, Saunders-Hastings P, Beers J, Sang A, Hettinger AZ, Blumenthal J, Martinez E, Jones LD, Deady M, Ezzeldin H, and Anderson SA
- Subjects
- Humans, United States epidemiology, Biological Products adverse effects, United States Food and Drug Administration, Adverse Drug Reaction Reporting Systems statistics & numerical data, Drug-Related Side Effects and Adverse Reactions epidemiology, Product Surveillance, Postmarketing methods, Product Surveillance, Postmarketing statistics & numerical data, COVID-19 prevention & control, COVID-19 epidemiology, Algorithms, Phenotype
- Abstract
Background: Adverse events associated with vaccination have been evaluated by epidemiological studies and more recently have gained additional attention with the emergency use authorization of several COVID-19 vaccines. As part of its responsibility to conduct postmarket surveillance, the US Food and Drug Administration continues to monitor several adverse events of special interest (AESIs) to ensure vaccine safety, including for COVID-19., Objective: This study is part of the Biologics Effectiveness and Safety Initiative, which aims to improve the Food and Drug Administration's postmarket surveillance capabilities while minimizing public burden. This study aimed to enhance active surveillance efforts through a rules-based, computable phenotype algorithm to identify 5 AESIs being monitored by the Center for Disease Control and Prevention for COVID-19 or other vaccines: anaphylaxis, Guillain-Barré syndrome, myocarditis/pericarditis, thrombosis with thrombocytopenia syndrome, and febrile seizure. This study examined whether these phenotypes have sufficiently high positive predictive value (PPV) to ensure that the cases selected for surveillance are reasonably likely to be a postbiologic adverse event. This allows patient privacy, and security concerns for the data sharing of patients who had nonadverse events can be properly accounted for when evaluating the cost-benefit aspect of our approach., Methods: AESI phenotype algorithms were developed to apply to electronic health record data at health provider organizations across the country by querying for standard and interoperable codes. The codes queried in the rules represent symptoms, diagnoses, or treatments of the AESI sourced from published case definitions and input from clinicians. To validate the performance of the algorithms, we applied them to electronic health record data from a US academic health system and provided a sample of cases for clinicians to evaluate. Performance was assessed using PPV., Results: With a PPV of 93.3%, our anaphylaxis algorithm performed the best. The PPVs for our febrile seizure, myocarditis/pericarditis, thrombocytopenia syndrome, and Guillain-Barré syndrome algorithms were 89%, 83.5%, 70.2%, and 47.2%, respectively., Conclusions: Given our algorithm design and performance, our results support continued research into using interoperable algorithms for widespread AESI postmarket detection., (©Ashley A Holdefer, Jeno Pizarro, Patrick Saunders-Hastings, Jeffrey Beers, Arianna Sang, Aaron Zachary Hettinger, Joseph Blumenthal, Erik Martinez, Lance Daniel Jones, Matthew Deady, Hussein Ezzeldin, Steven A Anderson. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 15.07.2024.)
- Published
- 2024
- Full Text
- View/download PDF
21. TICS-M scores in an oldest-old normative cohort identified by computable phenotype.
- Author
-
Ying G, Perez-Lao A, Adrien T, Maraganore D, Marra D, and Smith G
- Abstract
Objective: To (1) examine the distribution of Telephone Interview for Cognitive Status modified (TICS-m) scores in oldest-old individuals (age 85 and above) identified as cognitively healthy by a previously validated electronic health records-based computable phenotype (CP) and (2) to compare different cutoff scores for cognitive impairment in this population. Method: CP identified 24,024 persons, 470 were contacted and 252 consented and completed the assessment. Associations of TICS-m score with age, sex, and educational categories (<10 years, 11-15 years, and >16 years) were examined. The number of participants perceived as impaired was studied with commonly used cutoff scores (27-31). Results: TICS-m score ranged from 18 to 44 with a mean of 32.6 (SD = 4.7) in older adults aged 85-99 years old. A linear regression model including (range-restricted) age, education, and sex, showed beta estimates comparable to previous findings. Different cutoff scores (27 to 31) generated slightly lower MCI and dementia prevalence rates of participants meeting the criteria for the impairments than studies of younger elderly using traditional recruitment methods. Conclusions: The use of validated computable phenotype to identify a normative cohort generated a normative distribution for the TICS-m consistent with prior findings from more effortful approaches to cohort identification and established expected TICS-m performance in the oldest-old population.
- Published
- 2024
- Full Text
- View/download PDF
22. Developing a computable phenotype for glioblastoma.
- Author
-
Yan S, Melnick K, He X, Lyu T, Moor RSF, Still MEH, Mitchell DA, Shenkman EA, Wang H, Guo Y, Bian J, and Ghiaseddin AP
- Subjects
- Humans, Algorithms, Female, Glioblastoma pathology, Glioblastoma diagnosis, Brain Neoplasms pathology, Brain Neoplasms diagnosis, Phenotype, Electronic Health Records
- Abstract
Background: Glioblastoma is the most common malignant brain tumor, and thus it is important to be able to identify patients with this diagnosis for population studies. However, this can be challenging as diagnostic codes are nonspecific. The aim of this study was to create a computable phenotype (CP) for glioblastoma multiforme (GBM) from structured and unstructured data to identify patients with this condition in a large electronic health record (EHR)., Methods: We used the University of Florida (UF) Health Integrated Data Repository, a centralized clinical data warehouse that stores clinical and research data from various sources within the UF Health system, including the EHR system. We performed multiple iterations to refine the GBM-relevant diagnosis codes, procedure codes, medication codes, and keywords through manual chart review of patient data. We then evaluated the performances of various possible proposed CPs constructed from the relevant codes and keywords., Results: We underwent six rounds of manual chart reviews to refine the CP elements. The final CP algorithm for identifying GBM patients was selected based on the best F1-score. Overall, the CP rule "if the patient had at least 1 relevant diagnosis code and at least 1 relevant keyword" demonstrated the highest F1-score using both structured and unstructured data. Thus, it was selected as the best-performing CP rule., Conclusions: We developed and validated a CP algorithm for identifying patients with GBM using both structured and unstructured EHR data from a large tertiary care center. The final algorithm achieved an F1-score of 0.817, indicating a high performance, which minimizes possible biases from misclassification errors., (© The Author(s) 2023. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.)
- Published
- 2024
- Full Text
- View/download PDF
23. Development and evaluation of an EHR‐based computable phenotype for identification of pediatric Crohn's disease patients in a National Pediatric Learning Health System
- Author
-
Ritu Khare, Michael D. Kappelman, Charles Samson, Jennifer Pyrzanowski, Rahul A. Darwar, Christopher B. Forrest, Charles C. Bailey, Peter Margolis, Amanda Dempsey, and And the PEDSnet Computable Phenotype Working Group
- Subjects
computable phenotype ,Crohn's disease ,electronic health records ,PEDSnet ,Medicine (General) ,R5-920 ,Public aspects of medicine ,RA1-1270 - Abstract
Abstract Objectives To develop and evaluate the classification accuracy of a computable phenotype for pediatric Crohn's disease using electronic health record data from PEDSnet, a large, multi‐institutional research network and Learning Health System. Study Design Using clinician and informatician input, algorithms were developed using combinations of diagnostic and medication data drawn from the PEDSnet clinical dataset which is comprised of 5.6 million children from eight U.S. academic children's health systems. Six test algorithms (four cases, two non‐cases) that combined use of specific medications for Crohn's disease plus the presence of Crohn's diagnosis were initially tested against the entire PEDSnet dataset. From these, three were selected for performance assessment using manual chart review (primary case algorithm, n = 360, primary non‐case algorithm, n = 360, and alternative case algorithm, n = 80). Non‐cases were patients having gastrointestinal diagnoses other than inflammatory bowel disease. Sensitivity, specificity, and positive predictive value (PPV) were assessed for the primary case and primary non‐case algorithms. Results Of the six algorithms tested, the least restrictive algorithm requiring just ≥1 Crohn's diagnosis code yielded 11 950 cases across PEDSnet (prevalence 21/10 000). The most restrictive algorithm requiring ≥3 Crohn's disease diagnoses plus at least one medication yielded 7868 patients (prevalence 14/10 000). The most restrictive algorithm had the highest PPV (95%) and high sensitivity (91%) and specificity (94%). False positives were due primarily to a diagnosis reversal (from Crohn's disease to ulcerative colitis) or having a diagnosis of “indeterminate colitis.” False negatives were rare. Conclusions Using diagnosis codes and medications available from PEDSnet, we developed a computable phenotype for pediatric Crohn's disease that had high specificity, sensitivity and predictive value. This process will be of use for developing computable phenotypes for other pediatric diseases, to facilitate cohort identification for retrospective and prospective studies, and to optimize clinical care through the PEDSnet Learning Health System.
- Published
- 2020
- Full Text
- View/download PDF
24. Claims‐Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine‐Learning Approaches
- Author
-
Mei‐Sing Ong, Jeffrey G. Klann, Kueiyu Joshua Lin, Bradley A. Maron, Shawn N. Murphy, Marc D. Natter, and Kenneth D. Mandl
- Subjects
computable phenotype ,machine learning ,pulmonary hypertension ,Diseases of the circulatory (Cardiovascular) system ,RC666-701 - Abstract
Background Real‐world healthcare data are an important resource for epidemiologic research. However, accurate identification of patient cohorts—a crucial first step underpinning the validity of research results—remains a challenge. We developed and evaluated claims‐based case ascertainment algorithms for pulmonary hypertension (PH), comparing conventional decision rules with state‐of‐the‐art machine‐learning approaches. Methods and Results We analyzed an electronic health record‐Medicare linked database from two large academic tertiary care hospitals (years 2007–2013). Electronic health record charts were reviewed to form a gold standard cohort of patients with (n=386) and without PH (n=164). Using health encounter data captured in Medicare claims (including patients’ demographics, diagnoses, medications, and procedures), we developed and compared 2 approaches for identifying patients with PH: decision rules and machine‐learning algorithms using penalized lasso regression, random forest, and gradient boosting machine. The most optimal rule‐based algorithm—having ≥3 PH‐related healthcare encounters and having undergone right heart catheterization—attained an area under the receiver operating characteristic curve of 0.64 (sensitivity, 0.75; specificity, 0.48). All 3 machine‐learning algorithms outperformed the most optimal rule‐based algorithm (P
- Published
- 2020
- Full Text
- View/download PDF
25. Accuracy of Asthma Computable Phenotypes to Identify Pediatric Asthma at an Academic Institution.
- Author
-
Ross, Mindy K., Zheng, Henry, Zhu, Bing, Lao, Ailina, Hong, Hyejin, Natesan, Alamelu, Radparvar, Melina, and Bui, Alex A.T.
- Abstract
Copyright of Methods of Information in Medicine is the property of Thieme Medical Publishing Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2020
- Full Text
- View/download PDF
26. Development and evaluation of an EHR‐based computable phenotype for identification of pediatric Crohn's disease patients in a National Pediatric Learning Health System.
- Author
-
Khare, Ritu, Kappelman, Michael D., Samson, Charles, Pyrzanowski, Jennifer, Darwar, Rahul A., Forrest, Christopher B., Bailey, Charles C., Margolis, Peter, and Dempsey, Amanda
- Subjects
CROHN'S disease ,INFLAMMATORY bowel diseases ,ELECTRONIC health records ,INSTRUCTIONAL systems ,ULCERATIVE colitis - Abstract
Objectives: To develop and evaluate the classification accuracy of a computable phenotype for pediatric Crohn's disease using electronic health record data from PEDSnet, a large, multi‐institutional research network and Learning Health System. Study Design: Using clinician and informatician input, algorithms were developed using combinations of diagnostic and medication data drawn from the PEDSnet clinical dataset which is comprised of 5.6 million children from eight U.S. academic children's health systems. Six test algorithms (four cases, two non‐cases) that combined use of specific medications for Crohn's disease plus the presence of Crohn's diagnosis were initially tested against the entire PEDSnet dataset. From these, three were selected for performance assessment using manual chart review (primary case algorithm, n = 360, primary non‐case algorithm, n = 360, and alternative case algorithm, n = 80). Non‐cases were patients having gastrointestinal diagnoses other than inflammatory bowel disease. Sensitivity, specificity, and positive predictive value (PPV) were assessed for the primary case and primary non‐case algorithms. Results: Of the six algorithms tested, the least restrictive algorithm requiring just ≥1 Crohn's diagnosis code yielded 11 950 cases across PEDSnet (prevalence 21/10 000). The most restrictive algorithm requiring ≥3 Crohn's disease diagnoses plus at least one medication yielded 7868 patients (prevalence 14/10 000). The most restrictive algorithm had the highest PPV (95%) and high sensitivity (91%) and specificity (94%). False positives were due primarily to a diagnosis reversal (from Crohn's disease to ulcerative colitis) or having a diagnosis of "indeterminate colitis." False negatives were rare. Conclusions: Using diagnosis codes and medications available from PEDSnet, we developed a computable phenotype for pediatric Crohn's disease that had high specificity, sensitivity and predictive value. This process will be of use for developing computable phenotypes for other pediatric diseases, to facilitate cohort identification for retrospective and prospective studies, and to optimize clinical care through the PEDSnet Learning Health System. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
27. Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients.
- Author
-
Sharma, Brihat, Dligach, Dmitriy, Swope, Kristin, Salisbury-Afshar, Elizabeth, Karnik, Niranjan S., Joyce, Cara, and Afshar, Majid
- Subjects
ARTIFICIAL neural networks ,RECEIVER operating characteristic curves ,MACHINE learning ,MEDICAL databases ,HEROIN ,OPIOIDS ,HOSPITAL patients ,ELECTRONIC health records ,NATURAL language processing - Abstract
Background: Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier.Methods: An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling (n = 1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration.Results: Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs > 0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms 'Heroin' and 'Victim of abuse'.Conclusions: We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns. [ABSTRACT FROM AUTHOR]- Published
- 2020
- Full Text
- View/download PDF
28. Computable Phenotypes: Standardized Ways to Classify People Using Electronic Health Record Data.
- Author
-
Verchinina, Lilia, Ferguson, Lisa, Flynn, Allen, Wichorek, Michelle, and Markel, Dorene
- Abstract
Computable phenotypes (CPs) are an increasingly important structured and reproducible method of using electronic health record data to classify people. CPs have the potential to provide important benefits to health information management (HIM) professionals in their everyday work. A CP is a precise algorithm, including inclusion and exclusion criteria, that can be used to identify a cohort of patients with a specific set of observable and measurable traits. With the use of CPs, a series of technical steps can be taken to automatically identify people with specific traits, such as people with a particular disease or condition. CPs were first used outside of the HIM domain for clinical trials and network-based research. Because CPs are becoming more easily shareable, they have the potential to be used by HIM professionals to help improve coding, reporting, management, sharing, and reuse of clinical information. [ABSTRACT FROM AUTHOR]
- Published
- 2018
29. Sharing and Reusing Computable Phenotype Definitions.
- Author
-
Visweswaran S, Zhang LY, Bui K, Sadhu EM, Samayamuthu MJ, and Morris MM
- Abstract
Background: A scalable approach for the sharing and reuse of human-readable and computer-executable phenotype definitions can facilitate the reuse of electronic health records for cohort identification and research studies., Description: We developed a tool called Sharephe for the Informatics for Integrating Biology and the Bedside (i2b2) platform. Sharephe consists of a plugin for i2b2 and a cloud-based searchable repository of computable phenotypes, has the functionality to import to and export from the repository, and has the ability to link to supporting metadata., Discussion: The i2b2 platform enables researchers to create, evaluate, and implement phenotypes without knowing complex query languages. In an initial evaluation, two sites on the Evolve to Next-Gen ACT (ENACT) network used Sharephe to successfully create, share, and reuse phenotypes., Conclusion: The combination of a cloud-based computable repository and an i2b2 plugin for accessing the repository enables investigators to store and retrieve phenotypes from anywhere and at any time and to collaborate across sites in a research network.
- Published
- 2023
- Full Text
- View/download PDF
30. Trends and opportunities in computable clinical phenotyping: A scoping review.
- Author
-
He, Ting, Belouali, Anas, Patricoski, Jessica, Lehmann, Harold, Ball, Robert, Anagnostou, Valsamo, Kreimeyer, Kory, and Botsis, Taxiarchis
- Abstract
[Display omitted] Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
31. Comparing Natural Language Processing and Structured Medical Data to Develop a Computable Phenotype for Patients Hospitalized Due to COVID-19: Retrospective Analysis.
- Author
-
Chang F, Krishnan J, Hurst JH, Yarrington ME, Anderson DJ, O'Brien EC, and Goldstein BA
- Abstract
Background: Throughout the COVID-19 pandemic, many hospitals conducted routine testing of hospitalized patients for SARS-CoV-2 infection upon admission. Some of these patients are admitted for reasons unrelated to COVID-19 and incidentally test positive for the virus. Because COVID-19-related hospitalizations have become a critical public health indicator, it is important to identify patients who are hospitalized because of COVID-19 as opposed to those who are admitted for other indications., Objective: We compared the performance of different computable phenotype definitions for COVID-19 hospitalizations that use different types of data from electronic health records (EHRs), including structured EHR data elements, clinical notes, or a combination of both data types., Methods: We conducted a retrospective data analysis, using clinician chart review-based validation at a large academic medical center. We reviewed and analyzed the charts of 586 hospitalized individuals who tested positive for SARS-CoV-2 in January 2022. We used LASSO (least absolute shrinkage and selection operator) regression and random forests to fit classification algorithms that incorporated structured EHR data elements, clinical notes, or a combination of structured data and clinical notes. We used natural language processing to incorporate data from clinical notes. The performance of each model was evaluated based on the area under the receiver operator characteristic curve (AUROC) and an associated decision rule based on sensitivity and positive predictive value. We also identified top words and clinical indicators of COVID-19-specific hospitalization and assessed the impact of different phenotyping strategies on estimated hospital outcome metrics., Results: Based on a chart review, 38.2% (224/586) of patients were determined to have been hospitalized for reasons other than COVID-19, despite having tested positive for SARS-CoV-2. A computable phenotype that used clinical notes had significantly better discrimination than one that used structured EHR data elements (AUROC: 0.894 vs 0.841; P <.001) and performed similarly to a model that combined clinical notes with structured data elements (AUROC: 0.894 vs 0.893; P=.91). Assessments of hospital outcome metrics significantly differed based on whether the population included all hospitalized patients who tested positive for SARS-CoV-2 or those who were determined to have been hospitalized due to COVID-19., Conclusions: These findings highlight the importance of cause-specific phenotyping for COVID-19 hospitalizations. More generally, this work demonstrates the utility of natural language processing approaches for deriving information related to patient hospitalizations in cases where there may be multiple conditions that could serve as the primary indication for hospitalization., (© Feier Chang, Jay Krishnan, Jillian H Hurst, Michael E Yarrington, Deverick J Anderson, Emily C O'Brien, Benjamin A Goldstein. Originally published in JMIR Medical Informatics (https://medinform.jmir.org).)
- Published
- 2023
- Full Text
- View/download PDF
32. Development and Validation of a Computable Phenotype for Turner Syndrome Utilizing Electronic Health Records from a National Pediatric Network.
- Author
-
Huang SD, Bamba V, Bothwell S, Fechner PY, Furniss A, Ikomi C, Nahata L, Nokoff NJ, Pyle L, Seyoum H, and Davis SM
- Abstract
Turner syndrome (TS) is a genetic condition occurring in ~1 in 2,000 females characterized by the complete or partial absence of the second sex chromosome. TS research faces similar challenges to many other pediatric rare disease conditions, with homogenous, single-center, underpowered studies. Secondary data analyses utilizing Electronic Health Record (EHR) have the potential to address these limitations, however, an algorithm to accurately identify TS cases in EHR data is needed. We developed a computable phenotype to identify patients with TS using PEDSnet, a pediatric research network. This computable phenotype was validated through chart review; true positives and negatives and false positives and negatives were used to assess accuracy at both primary and external validation sites. The optimal algorithm consisted of the following criteria: female sex, ≥1 outpatient encounter, and ≥3 encounters with a diagnosis code that maps to TS, yielding average sensitivity 0.97, specificity 0.88, and C-statistic 0.93 across all sites. The accuracy of any estradiol prescriptions yielded an average C-statistic of 0.91 across sites and 0.80 for transdermal and oral formulations separately. PEDSnet and computable phenotyping are powerful tools in providing large, diverse samples to pragmatically study rare pediatric conditions like TS., Competing Interests: CONFLICT OF INTEREST STATEMENT SMD and PYF are site investigators for a clinical trial of growth hormone in Turner syndrome sponsored by Ascendis Pharma. NJN is a consultant for Neurocrine Biosciences, Inc. and Ionis Pharmaceuticals. CI is a site investigator for clinical trial of growth hormone in children with growth hormone deficiency and Turner syndrome sponsored by Novo Nordisk, and treatment of type 2 diabetes in children sponsored by Eli Lilly. All other authors have no conflicts of interest to declare.
- Published
- 2023
- Full Text
- View/download PDF
33. Single-reviewer electronic phenotyping validation in operational settings: Comparison of strategies and recommendations.
- Author
-
Kukhareva, Polina, Staes, Catherine, Warner, Phillip, Shields, David E., Kawamoto, Kensaku, Noonan, Kevin W., Mueller, Heather L., and Weeks, Howard
- Abstract
Objective: Develop evidence-based recommendations for single-reviewer validation of electronic phenotyping results in operational settings.Material and Methods: We conducted a randomized controlled study to evaluate whether electronic phenotyping results should be used to support manual chart review during single-reviewer electronic phenotyping validation (N=3104). We evaluated the accuracy, duration and cost of manual chart review with and without the availability of electronic phenotyping results, including relevant patient-specific details. The cost of identification of an erroneous electronic phenotyping result was calculated based on the personnel time required for the initial chart review and subsequent adjudication of discrepancies between manual chart review results and electronic phenotype determinations.Results: Providing electronic phenotyping results (vs not providing those results) was associated with improved overall accuracy of manual chart review (98.90% vs 92.46%, p<0.001), decreased review duration per test case (62.43 vs 76.78s, p<0.001), and insignificantly reduced estimated marginal costs of identification of an erroneous electronic phenotyping result ($48.54 vs $63.56, p=0.16). The agreement between chart review and electronic phenotyping results was higher when the phenotyping results were provided (Cohen's kappa 0.98 vs 0.88, p<0.001). As a result, while accuracy improved when initial electronic phenotyping results were correct (99.74% vs 92.67%, N=3049, p<0.001), there was a trend towards decreased accuracy when initial electronic phenotyping results were erroneous (56.67% vs 80.00%, N=55, p=0.07). Electronic phenotyping results provided the greatest benefit for the accurate identification of rare exclusion criteria.Discussion: Single-reviewer chart review of electronic phenotyping can be conducted more accurately, quickly, and at lower cost when supported by electronic phenotyping results. However, human reviewers tend to agree with electronic phenotyping results even when those results are wrong. Thus, the value of providing electronic phenotyping results depends on the accuracy of the underlying electronic phenotyping algorithm.Conclusion: We recommend using a mix of phenotyping validation strategies, with the balance of strategies based on the anticipated electronic phenotyping error rate, the tolerance for missed electronic phenotyping errors, as well as the expertise, cost, and availability of personnel involved in chart review and discrepancy adjudication. [ABSTRACT FROM AUTHOR]- Published
- 2017
- Full Text
- View/download PDF
34. Machine learning in data abstraction: A computable phenotype for sepsis and septic shock diagnosis in the intensive care unit
- Author
-
Timothy J. Weister, Laura Piccolo Serafim, Danette L. Bruns, Prabij Dhungana, Nathan J. Smischney, Arnaldo Lopez Ruiz, and Rahul Kashyap
- Subjects
Data abstraction ,business.industry ,Septic shock ,Computable phenotype ,030208 emergency & critical care medicine ,medicine.disease ,Machine learning ,computer.software_genre ,Intensive care unit ,law.invention ,Sepsis ,Critical care ,03 medical and health sciences ,0302 clinical medicine ,030228 respiratory system ,law ,medicine ,Retrospective Cohort Study ,Artificial intelligence ,business ,computer - Abstract
BACKGROUND With the recent change in the definition (Sepsis-3 Definition) of sepsis and septic shock, an electronic search algorithm was required to identify the cases for data automation. This supervised machine learning method would help screen a large amount of electronic medical records (EMR) for efficient research purposes. AIM To develop and validate a computable phenotype via supervised machine learning method for retrospectively identifying sepsis and septic shock in critical care patients. METHODS A supervised machine learning method was developed based on culture orders, Sequential Organ Failure Assessment (SOFA) scores, serum lactate levels and vasopressor use in the intensive care units (ICUs). The computable phenotype was derived from a retrospective analysis of a random cohort of 100 patients admitted to the medical ICU. This was then validated in an independent cohort of 100 patients. We compared the results from computable phenotype to a gold standard by manual review of EMR by 2 blinded reviewers. Disagreement was resolved by a critical care clinician. A SOFA score ≥ 2 during the ICU stay with a culture 72 h before or after the time of admission was identified. Sepsis versions as V1 was defined as blood cultures with SOFA ≥ 2 and Sepsis V2 was defined as any culture with SOFA score ≥ 2. A serum lactate level ≥ 2 mmol/L from 24 h before admission till their stay in the ICU and vasopressor use with Sepsis-1 and-2 were identified as Septic Shock-V1 and-V2 respectively. RESULTS In the derivation subset of 100 random patients, the final machine learning strategy achieved a sensitivity-specificity of 100% and 84% for Sepsis-1, 100% and 95% for Sepsis-2, 78% and 80% for Septic Shock-1, and 80% and 90% for Septic Shock-2. An overall percent of agreement between two blinded reviewers had a k = 0.86 and 0.90 for Sepsis 2 and Septic shock 2 respectively. In validation of the algorithm through a separate 100 random patient subset, the reported sensitivity and specificity for all 4 diagnoses were 100%-100% each. CONCLUSION Supervised machine learning for identification of sepsis and septic shock is reliable and an efficient alternative to manual chart review.
- Published
- 2019
- Full Text
- View/download PDF
35. The relative risk of bleeding after medical hospitalization: the medical inpatient thrombosis and hemorrhage study.
- Author
-
Gergi M, Wilkinson K, Koh I, Munger J, Al-Samkari H, Smith NL, Roetker NS, Plante TB, Cushman M, Repp AB, Holmes CE, and Zakai NA
- Subjects
- Humans, United States, Risk, Cohort Studies, Hemorrhage, Hospitalization, Inpatients, Thrombosis
- Abstract
Background: Clinically relevant bleeding risk in discharged medical patients is underestimated and leads to rehospitalization, morbidity, and mortality. Studies assessing this risk are lacking., Objective: The aim of this study was to develop and validate a computable phenotype for clinically relevant bleeding using electronic health record (EHR) data and quantify the relative and absolute risks of this bleeding after medical hospitalization., Methods: We conducted an observational cohort study of people receiving their primary care at sites affiliated with an academic medical center in northwest Vermont, United States. We developed a computable phenotype using EHR data (diagnosis codes, procedure codes, laboratory, and transfusion data) and validated it by manual chart review. Cox proportional hazard models with hospitalization modeled as a time-varying covariate were used to estimate clinically relevant bleeding risk., Results: The computable phenotype had a positive predictive value of 80% and a negative predictive value of 99%. The bleeding rate in individuals with no medical hospitalizations in the past 3 months was 2.9 per 1000 person-years versus 98.9 per 1000 person-years in those who were discharged in the past 3 months. This translates into a hazard ratio (95% CI) of clinically relevant bleeding of 22.9 (18.9, 27.7), 13.0 (10.0, 16.9), and 6.8 (4.7, 9.8) over the first, second, and third months after discharge, respectively., Conclusion: We developed and validated a computable phenotype for clinically relevant bleeding and determined its relative and absolute risk in the 3 months after medical hospitalization discharge. The high rates of bleeding observed underscore the clinical importance of capturing and further studying bleeding after medical discharge., (Copyright © 2022 International Society on Thrombosis and Haemostasis. Published by Elsevier Inc. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
36. Validation of an Internationally Derived Patient Severity Phenotype to Support COVID-19 Analytics from Electronic Health Record Data
- Author
-
Ne Hooi Will Loh, Shawn N. Murphy, Bertrand Moal, Siegbert Rieg, Kenneth D. Mandl, Yuan Luo, Douglas S. Bell, Riccardo Bellazzi, Martin Boeker, Michele I. Morris, Alberto Malovini, Thomas Maulhardt, Victor Castro, Valentina Tibollo, Hossein Estiri, Shyam Visweswaran, Kavishwar B. Wagholikar, Vianney Jouhet, Anthony L L J Li, Amelia L.M. Tan, Kee Yuan Ngiam, Chuan Hong, Alon Geva, Andrew M South, Emily Schriver, Gabriel A. Brat, Griffin M. Weber, Danielle L. Mowery, David A. Hanauer, Meghan R Hutch, Zongqi Xia, Jason H. Moore, Robert W Follett, Jeffrey G. Klann, Malarkodi J Samayamuthu, Gilbert S. Omenn, Luca Chiovato, Karen L. Olson, Paul Avillach, Brett K. Beaulieu-Jones, Isaac S. Kohane, Bordeaux population health (BPH), and Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)
- Subjects
medicine.medical_specialty ,Coronavirus disease 2019 (COVID-19) ,AcademicSubjects/SCI01060 ,High variability ,novel coronavirus ,Health Informatics ,Research and Applications ,01 natural sciences ,Health informatics ,Sensitivity and Specificity ,Severity of Illness Index ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,Electronic health record ,data interoperability ,Chart review ,Medicine ,medical informatics ,Electronic Health Records ,Humans ,030212 general & internal medicine ,0101 mathematics ,AcademicSubjects/MED00580 ,business.industry ,010102 general mathematics ,COVID-19 ,Prognosis ,Phenotype ,3. Good health ,Icu admission ,Hospitalization ,computable phenotype ,data networking ,ROC Curve ,Analytics ,Emergency medicine ,[SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie ,disease severity ,AcademicSubjects/SCI01530 ,business - Abstract
Objective The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity. Materials and Methods Twelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site. Results The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability—up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared with chart review. Discussion We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions. Conclusions We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.
- Published
- 2021
- Full Text
- View/download PDF
37. Desiderata for the development of next-generation electronic health record phenotype libraries
- Author
-
Luke V. Rasmussen, Vasa Curcin, Martin Chapman, Emily Jefferson, Daniel Thayer, Shahzad Mumtaz, Jennifer A. Pacheco, Spiros Denaxas, Chuang Gao, Helen Parkinson, Rachel Richesson, Georgios V. Gkoutos, and Andreas Karwath
- Subjects
Computer science ,AcademicSubjects/SCI02254 ,Best practice ,media_common.quotation_subject ,Reproducibility of Results ,Health Informatics ,Review ,phenotype library ,Data science ,Phenotype ,Field (computer science) ,Computer Science Applications ,Software portability ,electronic health records ,computable phenotype ,Phenomics ,EHR-based phenotyping ,Humans ,AcademicSubjects/SCI00960 ,Quality (business) ,Set (psychology) ,Host (network) ,media_common - Abstract
Background High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. Methods A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. Results We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. Conclusions There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.
- Published
- 2021
38. An Electronic Search Algorithm for Early Disseminated Intravascular Coagulopathy Diagnosis in the Intensive Care Unit: A Derivation and Validation Study
- Author
-
Rahul Kashyap, Naseema Gangat, Timothy J. Weister, and Tabinda Jawaid
- Subjects
Pediatrics ,medicine.medical_specialty ,030204 cardiovascular system & hematology ,intensive care unit ,coagulopathy ,law.invention ,03 medical and health sciences ,0302 clinical medicine ,Anesthesiology ,law ,Search algorithm ,Intensive care ,medicine ,Coagulopathy ,Derivation ,disseminated intravascular coagulopathy ,automated algorithm ,business.industry ,Medical record ,General Engineering ,Hematology ,medicine.disease ,Intensive care unit ,computable phenotype ,Cohort ,Diagnosis code ,business ,030217 neurology & neurosurgery - Abstract
Aim We aim to create and validate an electronic search algorithm for accurate detection of disseminated intravascular coagulopathy (DIC) from medical records. Methods Patients with DIC in Mayo Clinic's intensive care units (ICUs) from Jan 1, 2007, to May 4, 2018, were included in the study. An algorithm was developed based on clinical notes and ICD diagnosis codes. A cohort of 50 patients was included with DIC diagnosis, its variations, and no diagnosis of DIC. Then, the next set of 50 patients was used to refine the algorithm. Results were compared with a manual reviewer and the disagreements were resolved by the third reviewer. The same process was repeated with 'revised clinical note search' for the first and second derivation cohort with additional exclusion terms. The obtained sensitivity and specificity were reported. The generated algorithm was applied to another set of 50 patients for validation. Results In the first derivation cohort- DIC search by clinical notes and diagnosis codes had 92% sensitivity and 100% specificity. Sensitivity dropped to 71% in the second cohort although specificity remains the same. Therefore, the algorithm was refined to clinical notes search only. The revised search was reapplied to first and second derivation cohorts and results obtained for the first derivation were the same but 91.3% sensitive and 100% specific for the second derivation. The search was locked and applied in the validation cohort with 95.8% sensitivity and 100% specificity, respectively. Conclusion The revised clinical note based electronic search algorithm was found to be highly sensitive and specific for DIC during the corresponding ICU duration.
- Published
- 2020
- Full Text
- View/download PDF
39. Claims‐Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine‐Learning Approaches
- Author
-
Bradley A. Maron, Shawn N. Murphy, Mei-Sing Ong, Kenneth D. Mandl, Kueiyu Joshua Lin, Jeffrey G. Klann, and Marc D. Natter
- Subjects
Male ,Epidemiology ,Hypertension, Pulmonary ,030204 cardiovascular system & hematology ,Machine learning ,computer.software_genre ,Decision Support Techniques ,Machine Learning ,Insurance Claim Review ,03 medical and health sciences ,0302 clinical medicine ,pulmonary hypertension ,Humans ,Medicine ,Sensitivity (control systems) ,Medical diagnosis ,Original Research ,Aged ,Receiver operating characteristic ,business.industry ,Gold standard (test) ,Decision rule ,Random forest ,Identification (information) ,computable phenotype ,030228 respiratory system ,Female ,Gradient boosting ,Artificial intelligence ,Cardiology and Cardiovascular Medicine ,business ,computer ,Algorithm ,Algorithms - Abstract
Background Real‐world healthcare data are an important resource for epidemiologic research. However, accurate identification of patient cohorts—a crucial first step underpinning the validity of research results—remains a challenge. We developed and evaluated claims‐based case ascertainment algorithms for pulmonary hypertension (PH), comparing conventional decision rules with state‐of‐the‐art machine‐learning approaches. Methods and Results We analyzed an electronic health record‐Medicare linked database from two large academic tertiary care hospitals (years 2007–2013). Electronic health record charts were reviewed to form a gold standard cohort of patients with (n=386) and without PH (n=164). Using health encounter data captured in Medicare claims (including patients’ demographics, diagnoses, medications, and procedures), we developed and compared 2 approaches for identifying patients with PH: decision rules and machine‐learning algorithms using penalized lasso regression, random forest, and gradient boosting machine. The most optimal rule‐based algorithm—having ≥3 PH‐related healthcare encounters and having undergone right heart catheterization—attained an area under the receiver operating characteristic curve of 0.64 (sensitivity, 0.75; specificity, 0.48). All 3 machine‐learning algorithms outperformed the most optimal rule‐based algorithm ( P Conclusions Research‐grade case identification algorithms for PH can be derived and rigorously validated using machine‐learning algorithms. Simple decision rules commonly applied in published literature performed poorly; more complex rule‐based algorithms may potentially address the limitation of this approach. PH research using claims data would be considerably strengthened through the use of validated algorithms for cohort ascertainment.
- Published
- 2020
- Full Text
- View/download PDF
40. An Iterative Process for Identifying Pediatric Patients With Type 1 Diabetes: Retrospective Observational Study
- Author
-
Heather L. Morris, William T. Donahoo, Brittany S. Bruggeman, Chelsea Zimmerman, Victor W. Zhong, Desmond A. Schatz, and Paul Hiers
- Subjects
medicine.medical_specialty ,endocrine system diseases ,type 1 diabetes ,Computer applications to medicine. Medical informatics ,Population ,R858-859.7 ,030209 endocrinology & metabolism ,Health Informatics ,Type 2 diabetes ,03 medical and health sciences ,0302 clinical medicine ,Health Information Management ,Diabetes mellitus ,medicine ,030212 general & internal medicine ,Intensive care medicine ,education ,Type 1 diabetes ,education.field_of_study ,Original Paper ,business.industry ,Incidence (epidemiology) ,Medical record ,nutritional and metabolic diseases ,Retrospective cohort study ,electronic health record ,medicine.disease ,Clinical research ,computable phenotype ,pediatric ,business - Abstract
Background The incidence of both type 1 diabetes (T1DM) and type 2 diabetes (T2DM) in children and youth is increasing. However, the current approach for identifying pediatric diabetes and separating by type is costly, because it requires substantial manual efforts. Objective The purpose of this study was to develop a computable phenotype for accurately and efficiently identifying diabetes and separating T1DM from T2DM in pediatric patients. Methods This retrospective study utilized a data set from the University of Florida Health Integrated Data Repository to identify 300 patients aged 18 or younger with T1DM, T2DM, or that were healthy based on a developed computable phenotype. Three endocrinology residents/fellows manually reviewed medical records of all probable cases to validate diabetes status and type. This refined computable phenotype was then used to identify all cases of T1DM and T2DM in the OneFlorida Clinical Research Consortium. Results A total of 295 electronic health records were manually reviewed; of these, 128 cases were found to have T1DM, 35 T2DM, and 132 no diagnosis. The positive predictive value was 94.7%, the sensitivity was 96.9%, specificity was 95.8%, and the negative predictive value was 97.6%. Overall, the computable phenotype was found to be an accurate and sensitive method to pinpoint pediatric patients with T1DM. Conclusions We developed a computable phenotype for identifying T1DM correctly and efficiently. The computable phenotype that was developed will enable researchers to identify a population accurately and cost-effectively. As such, this will vastly improve the ease of identifying patients for future intervention studies.
- Published
- 2020
41. Validation of a claims-based algorithm identifying eligible study subjects in the ADAPTABLE pragmatic clinical trial
- Author
-
W. Schuyler Jones, Jade Dinh, Amanda Marshall, Rebecca Merkh, Ezra Fishman, Holly Robertson, John Barron, and Kevin Haynes
- Subjects
medicine.medical_treatment ,Computable phenotype ,Acute myocardial infarction ,030204 cardiovascular system & hematology ,Article ,Coronary artery disease ,03 medical and health sciences ,0302 clinical medicine ,medicine ,030212 general & internal medicine ,Myocardial infarction ,Real-world evidence ,Pharmacology ,lcsh:R5-920 ,Aspirin ,business.industry ,Medical record ,Percutaneous coronary intervention ,General Medicine ,Cardiovascular disease ,medicine.disease ,Confidence interval ,Clinical trial ,Conventional PCI ,Current Procedural Terminology ,lcsh:Medicine (General) ,business ,Algorithm - Abstract
Objective: Validate an algorithm that uses administrative claims data to identify eligible study subjects for the ADAPTABLE (Aspirin Dosing: A Patient-centric Trial Assessing Benefits and Long-Term Effectiveness) pragmatic clinical trial (PCT). Materials and methods: This study used medical records from a random sample of patients identified as eligible for the ADAPTABLE trial. The inclusion criteria for ADAPTABLE were a history of acute myocardial infarction (AMI) or percutaneous coronary intervention (PCI) or coronary artery bypass grafting (CABG), or other coronary artery disease (CAD), plus at least one of several risk-enrichment factors. Exclusion criteria included a history of bleeding disorders or aspirin allergy. Using a claims-based algorithm, based on International Classification of Diseases, 9th Edition, Clinical Modification (ICD-9-CM) and 10th Edition (ICD-10) codes and Current Procedural Terminology (CPT) codes, we identified patients eligible for the PCT. The primary outcome was the positive predictive value (PPV) of the identification algorithm: the proportion of sampled patients whose medical records confirmed their ADAPTABLE study eligibility. Exact 95% confidence limits for binomial random variables were calculated for the PPV estimates. Results: Of the 185 patients whose medical records were reviewed, 168 (90.8%; 95% Confidence Interval: 85.7%, 94.6%) were confirmed study eligible. This proportion did not differ between patients identified with codes for AMI and patients identified with codes for PCI or CABG. Conclusion: The estimated PPV was similar to those in claims-based identification of drug safety surveillance events, indicating that administrative claims data can accurately identify study-eligible subjects for pragmatic clinical trials. Keywords: Cardiovascular disease, Aspirin, Real-world evidence, Acute myocardial infarction, Computable phenotype
- Published
- 2018
- Full Text
- View/download PDF
42. Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
- Author
-
Dmitriy Dligach, Majid Afshar, Cara Joyce, Elizabeth Salisbury-Afshar, Kristin Swope, Niranjan S. Karnik, and Brihat Sharma
- Subjects
FOS: Computer and information sciences ,Adult ,Vocabulary ,020205 medical informatics ,Computer science ,media_common.quotation_subject ,Health Informatics ,Computable phenotype ,02 engineering and technology ,lcsh:Computer applications to medicine. Medical informatics ,Machine learning ,computer.software_genre ,Logistic regression ,Convolutional neural network ,Medical Records ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,Opioid misuse ,0202 electrical engineering, electronic engineering, information engineering ,Electronic Health Records ,Humans ,030212 general & internal medicine ,media_common ,Protected health information ,Inpatients ,Artificial neural network ,Receiver operating characteristic ,business.industry ,Health Policy ,Natural language processing ,Unified Medical Language System ,Opioid-Related Disorders ,Computer Science Applications ,Heroin ,Opioid use disorder ,lcsh:R858-859.7 ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Information Systems ,Research Article - Abstract
Background Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier. Methods An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling (n = 1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration. Results Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs > 0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms ‘Heroin’ and ‘Victim of abuse’. Conclusions We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns.
- Published
- 2019
43. Validation of an Electronic Phenotyping Algorithm for Patients With Acute Respiratory Failure.
- Author
-
Essay P, Fisher JM, Mosier JM, and Subbian V
- Abstract
Acute respiratory failure is a common reason for ICU admission and imposes significant strain on patients and the healthcare system. Noninvasive positive-pressure ventilation and high-flow nasal oxygen are increasingly used as an alternative to invasive mechanical ventilation to treat acute respiratory failure. As such, there is a need to accurately cohort patients using large, routinely collected, clinical data to better understand utilization patterns and patient outcomes. The primary objective of this retrospective observational study was to externally validate our computable phenotyping algorithm for patients with acute respiratory failure requiring various sequences of respiratory support in real-world data from a large healthcare delivery network., Design: This is a cross-sectional observational study to validate our algorithm for phenotyping acute respiratory patients by method of respiratory support. We randomly selected 5% ( n = 4,319) from each phenotype for manual validation. We calculated the algorithm performance and generated summary statistics for each phenotype and a priori defined clinical subgroups., Setting: Data were extracted from a clinical data warehouse containing electronic health record data from 46 ICUs in the southwest United States., Patients: All adult (≥ 18 yr) patient records requiring any type of oxygen therapy or mechanical ventilation between November 1, 2013, and September 30, 2020, were extracted for the study., Interventions: None., Measurements and Main Results: Micro- and macroaveraged multiclass specificities of the algorithm were 0.902 and 0.896, respectively. Sensitivity and specificity of phenotypes individually were greater than 0.90 for all phenotypes except for those patients extubated from invasive to noninvasive ventilation. We successfully created clinical subgroups of common illnesses requiring ventilatory support and provide high-level comparison of outcomes., Conclusions: The electronic phenotyping algorithm is robust and provides a necessary tool for retrospective research for characterizing patients with acute respiratory failure across modalities of respiratory support., Competing Interests: Drs. Fisher, Mosier, and Subbian received grant support for this work by the Emergency Medicine Foundation. Dr. Essay has disclosed that he does not have any potential conflicts of interest., (Copyright © 2022 The Authors. Published by Wolters Kluwer Health, Inc. on behalf of the Society of Critical Care Medicine.)
- Published
- 2022
- Full Text
- View/download PDF
44. Challenges in replicating secondary analysis of electronic health records data with multiple computable phenotypes: A case study on methicillin-resistant Staphylococcus aureus bacteremia infections.
- Author
-
Jun, Inyoung, Rich, Shannan N., Chen, Zhaoyi, Bian, Jiang, and Prosperi, Mattia
- Abstract
Background: Replication of prediction modeling using electronic health records (EHR) is challenging because of the necessity to compute phenotypes including study cohort, outcomes, and covariates. However, some phenotypes may not be easily replicated across EHR data sources due to a variety of reasons such as the lack of gold standard definitions and documentation variations across systems, which may lead to measurement error and potential bias. Methicillin-resistant Staphylococcus aureus (MRSA) infections are responsible for high mortality worldwide. With limited treatment options for the infection, the ability to predict MRSA outcome is of interest. However, replicating these MRSA outcome prediction models using EHR data is problematic due to the lack of well-defined computable phenotypes for many of the predictors as well as study inclusion and outcome criteria.Objective: In this study, we aimed to evaluate a prediction model for 30-day mortality after MRSA bacteremia infection diagnosis with reduced vancomycin susceptibility (MRSA-RVS) considering multiple computable phenotypes using EHR data.Methods: We used EHR data from a large academic health center in the United States to replicate the original study conducted in Taiwan. We derived multiple computable phenotypes of risk factors and predictors used in the original study, reported stratified descriptive statistics, and assessed the performance of the prediction model.Results: In our replication study, it was possible to (re)compute most of the original variables. Nevertheless, for certain variables, their computable phenotypes can only be approximated by proxy with structured EHR data items, especially the composite clinical indices such as the Pitt bacteremia score. Even computable phenotype for the outcome variable was subject to variation on the basis of the admission/discharge windows. The replicated prediction model exhibited only a mild discriminatory ability.Conclusion: Despite the rich information in EHR data, replication of prediction models involving complex predictors is still challenging, often due to the limited availability of validated computable phenotypes. On the other hand, it is often possible to derive proxy computable phenotypes that can be further validated and calibrated. [ABSTRACT FROM AUTHOR]- Published
- 2021
- Full Text
- View/download PDF
45. Rule-Based Cohort Definitions for Acute Respiratory Failure: Electronic Phenotyping Algorithm
- Author
-
Vignesh Subbian, Jarrod Mosier, and Patrick Essay
- Subjects
Telemedicine ,intensive care units ,medicine.medical_treatment ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Health Informatics ,law.invention ,03 medical and health sciences ,0302 clinical medicine ,Health Information Management ,law ,medicine ,Intubation ,030212 general & internal medicine ,Medical diagnosis ,Respiratory system ,Mechanical ventilation ,Original Paper ,critical care informatics ,business.industry ,electronic health record ,respiratory ,Intensive care unit ,computable phenotype ,030228 respiratory system ,Cohort ,Breathing ,telemedicine ,business ,Algorithm - Abstract
Background Acute respiratory failure is generally treated with invasive mechanical ventilation or noninvasive respiratory support strategies. The efficacies of the various strategies are not fully understood. There is a need for accurate therapy-based phenotyping for secondary analyses of electronic health record data to answer research questions regarding respiratory management and outcomes with each strategy. Objective The objective of this study was to address knowledge gaps related to ventilation therapy strategies across diverse patient populations by developing an algorithm for accurate identification of patients with acute respiratory failure. To accomplish this objective, our goal was to develop rule-based computable phenotypes for patients with acute respiratory failure using remotely monitored intensive care unit (tele-ICU) data. This approach permits analyses by ventilation strategy across broad patient populations of interest with the ability to sub-phenotype as research questions require. Methods Tele-ICU data from ≥200 hospitals were used to create a rule-based algorithm for phenotyping patients with acute respiratory failure, defined as an adult patient requiring invasive mechanical ventilation or a noninvasive strategy. The dataset spans a wide range of hospitals and ICU types across all US regions. Structured clinical data, including ventilation therapy start and stop times, medication records, and nurse and respiratory therapy charts, were used to define clinical phenotypes. All adult patients of any diagnoses with record of ventilation therapy were included. Patients were categorized by ventilation type, and analysis of event sequences using record timestamps defined each phenotype. Manual validation was performed on 5% of patients in each phenotype. Results We developed 7 phenotypes: (0) invasive mechanical ventilation, (1) noninvasive positive-pressure ventilation, (2) high-flow nasal insufflation, (3) noninvasive positive-pressure ventilation subsequently requiring intubation, (4) high-flow nasal insufflation subsequently requiring intubation, (5) invasive mechanical ventilation with extubation to noninvasive positive-pressure ventilation, and (6) invasive mechanical ventilation with extubation to high-flow nasal insufflation. A total of 27,734 patients met our phenotype criteria and were categorized into these ventilation subgroups. Manual validation of a random selection of 5% of records from each phenotype resulted in a total accuracy of 88% and a precision and recall of 0.8789 and 0.8785, respectively, across all phenotypes. Individual phenotype validation showed that the algorithm categorizes patients particularly well but has challenges with patients that require ≥2 management strategies. Conclusions Our proposed computable phenotyping algorithm for patients with acute respiratory failure effectively identifies patients for therapy-focused research regardless of admission diagnosis or comorbidities and allows for management strategy comparisons across populations of interest.
- Published
- 2020
- Full Text
- View/download PDF
46. Optimizing the Electronic Health Record for Clinical Research: Has the Time Come?
- Author
-
Rheault MN
- Subjects
- Phenotype, Algorithms, Electronic Health Records
- Abstract
Competing Interests: M. Rheault reports having consultancy agreements with Visterra; reports receiving research funding from Chinook, Reata, Sanofi, and Travere; and reports being a scientific advisor or member of the Alport Syndrome Foundation Medical Advisory Board; NephJC (501c3) Board of Directors; Peds Nephrology Research Consortium (501c3) Steering Committee, and Women In Nephrology (501c3).
- Published
- 2021
- Full Text
- View/download PDF
47. Validating a Computable Phenotype for Nephrotic Syndrome in Children and Adults Using PCORnet Data.
- Author
-
Oliverio AL, Marchel D, Troost JP, Ayoub I, Almaani S, Greco J, Tran CL, Denburg MR, Matheny M, Dorn C, Massengill SF, Desmond H, Gipson DS, and Mariani LH
- Subjects
- Electronic Health Records, Female, Humans, International Classification of Diseases, Male, Natural Language Processing, Phenotype, United States, Nephrotic Syndrome diagnosis
- Abstract
Background: Primary nephrotic syndromes are rare diseases which can impede adequate sample size for observational patient-oriented research and clinical trial enrollment. A computable phenotype may be powerful in identifying patients with these diseases for research across multiple institutions., Methods: A comprehensive algorithm of inclusion and exclusion ICD-9 and ICD-10 codes to identify patients with primary nephrotic syndrome was developed. The algorithm was executed against the PCORnet CDM at three institutions from January 1, 2009 to January 1, 2018, where a random selection of 50 cases and 50 noncases (individuals not meeting case criteria seen within the same calendar year and within 5 years of age of a case) were reviewed by a nephrologist, for a total of 150 cases and 150 noncases reviewed. The classification accuracy (sensitivity, specificity, positive and negative predictive value, F1 score) of the computable phenotype was determined., Results: The algorithm identified a total of 2708 patients with nephrotic syndrome from 4,305,092 distinct patients in the CDM at all sites from 2009 to 2018. For all sites, the sensitivity, specificity, and area under the curve of the algorithm were 99% (95% CI, 97% to 99%), 79% (95% CI, 74% to 85%), and 0.9 (0.84 to 0.97), respectively. The most common causes of false positive classification were secondary FSGS (nine out of 39) and lupus nephritis (nine out of 39)., Conclusion: This computable phenotype had good classification in identifying both children and adults with primary nephrotic syndrome utilizing only ICD-9 and ICD-10 codes, which are available across institutions in the United States. This may facilitate future screening and enrollment for research studies and enable comparative effectiveness research. Further refinements to the algorithm including use of laboratory data or addition of natural language processing may help better distinguish primary and secondary causes of nephrotic syndrome., Competing Interests: C. Tran reports being a scientific advisor or member of Frontiers in Pediatrics Review Editor on the Editorial Board of Pediatric Nephrology. D. Gipson reports having consultancy agreements, through the University of Michigan, with AstraZeneca, Boehringer Ingelheim, Roche/Genentech, and Vertex Pharmaceuticals; reports receiving research funding, through the University of Michigan, from Atrium Health Medical Foundation, Boehringer Ingelheim, Centers for Disease Control, Food and Drug Administration, Goldfinch Bio, National Institutes of Health, Novartis, Reata, and Travere; and reports being a scientific advisor or member of the American Society of Pediatric Nephrology, American Society of Nephrology, and International Society of Pediatric Nephrology. H. Desmond reports receiving research funding through a percentage of salary from the University of Michigan, funded by Boehringer Ingelheim. I. Ayoub reports receiving honoraria from the American College of Rheumatology; reports being a scientific advisor or member of the Journal of Clinical Nephrology (Editorial Board) and the Lupus Foundation of America (advisory board). L. Mariani reports having consultancy agreements with, and receiving honoraria from, Calliditas Therapeutics Advisory Board, CKD Advisory Committee, Reata Pharmaceuticals, and Travere Therapeutics Advisory Board; reports receiving research funding from Boehringer Ingelheim; reports receiving honoraria from American Society of Nephrology Board Review Course and Update; and reports being a scientific advisor or member of Calliditas Therapeutics, Reata Pharmaceuticals, and Travere Therapeutics. M. Denburg reports having consultancy agreements with Trisalus Life Sciences (spouse); reports having an ownership interest in In-Bore LLC (spouse) and Precision Guided Interventions LLC (spouse); reports receiving research funding from Mallinckrodt; reports being a scientific advisor or member of NKF Delaware Valley Medical Advisory Board and Trisalus Life Sciences Scientific Advisory Board (spouse); and reports other interests/relationships with the American Society of Pediatric Nephrology Research and Program Committees and the National Kidney Foundation Pediatric Education Planning Committee. M. Matheny reports consultancy agreements with National Institutes of Health- Veterans Affairs- Department of Defense Pain Management Grant Consortium (PMC3); reports being a scientific advisor or member of Scientific Merit Review Board Study Section, VA Health Services Research and Development (HSR&D), Informatics and Methods Section, the Steering Committee of Indianapolis VA HSR&D Center of Innovation; and the Steering Committee of VA HSR&D VA Information Resource Center. S. Almaani reports having consultancy agreements with Aurinia Pharmaceuticals and Kezar Life Sciences; reports receiving research funding from Gilead Sciences; reports being a scientific advisor or member Clinical Nephrology Editorial Board; and reports speakers bureau from Aurinia Pharmaceuticals. S. Massengill reports having consultancy agreements with Guidepoint Group; reports being a scientific advisor or member of the Editorial Board of online Glomerular Disease Journal (Karger Publishers). All remaining authors have nothing to disclose., (Copyright © 2021 by the American Society of Nephrology.)
- Published
- 2021
- Full Text
- View/download PDF
48. A Computable Phenotype for Autosomal Dominant Polycystic Kidney Disease.
- Author
-
Kalot MA, El Alayli A, Al Khatib M, Husainat N, McGreal K, Jalal DI, Yu ASL, and Mustafa RA
- Subjects
- Algorithms, Data Collection, Humans, International Classification of Diseases, Phenotype, Polycystic Kidney, Autosomal Dominant diagnosis
- Abstract
Background: A computable phenotype is an algorithm used to identify a group of patients within an electronic medical record system. Developing a computable phenotype that can accurately identify patients with autosomal dominant polycystic kidney disease (ADPKD) will assist researchers in defining patients eligible to participate in clinical trials and other studies. Our objective was to assess the accuracy of a computable phenotype using International Classification of Diseases 9th and 10th revision (ICD-9/10) codes to identify patients with ADPKD., Methods: We reviewed four random samples of approximately 250 patients on the basis of ICD-9/10 codes from the EHR from the Kansas University Medical Center database: patients followed in nephrology clinics who had ICD-9/10 codes for ADPKD (Neph+), patients seen in nephrology clinics without ICD codes for ADPKD (Neph-), patients who were not followed in nephrology clinics with ICD codes for ADPKD (No Neph+), and patients not seen in nephrology clinics without ICD codes for ADPKD (No Neph-). We reviewed the charts and determined ADPKD status on the basis of internationally accepted diagnostic criteria for ADPKD., Results: The computable phenotype to identify patients with ADPKD who attended nephrology clinics has a sensitivity of 99% (95% confidence interval [95% CI], 96.4 to 99.7) and a specificity of 84% (95% CI, 79.5 to 88.1). For those who did not attend nephrology clinics, the sensitivity was 97% (95% CI, 93.3 to 99.0), and a specificity was 82% (95% CI, 77.4 to 86.1)., Conclusion: A computable phenotype using the ICD-9/10 codes can correctly identify most patients with ADPKD, and can be utilized by researchers to screen health care records for cohorts of patients with ADPKD with acceptable accuracy., Competing Interests: A. Yu reports having consultancy agreements with Calico, Otsuka, Navitor, and Regulus Therapeutics; reports having an ownership interest in Amgen Corp., Gilead Sciences, and Prothena; reports receiving honoraria from Elsevier and Wolters Kluwer; reports being a scientific advisor or member of the Otsuka Advisory Board; and reports having other interests/relationships with the Jared Grantham Kidney Institute, which receives royalties from Otsuka for tolvaptan, and The University of Kansas Medical Center. D. Jalal reports receiving research funding from AstraZeneca and Corvidia; reports receiving honoraria from Kansas Idea Network of Biomedical Research Excellence (K-INBRE) and Reata; and reports being a scientific advisor or member of Reata. K. McGreal reports receiving Medical Center research funding from the Sanofi Staged PKD trial sub-investigator and Reata as principle investigator on the FAlcon study. R. Mustafa reports being a scientific advisor to or member of The American College of Physicians Clinical Guideline Committee, The Canadian Society of Nephrology Clinical Practice Guidelines Committee, and The GRADE guidance group; reports having other interests/relationships with the advisory board for the renal round table, and the National Kidney Foundation Midwest. All remaining authors have nothing to disclose., (Copyright © 2021 by the American Society of Nephrology.)
- Published
- 2021
- Full Text
- View/download PDF
49. Desiderata for the development of next-generation electronic health record phenotype libraries.
- Author
-
Chapman M, Mumtaz S, Rasmussen LV, Karwath A, Gkoutos GV, Gao C, Thayer D, Pacheco JA, Parkinson H, Richesson RL, Jefferson E, Denaxas S, and Curcin V
- Subjects
- Humans, Phenotype, Reproducibility of Results, Electronic Health Records
- Abstract
Background: High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling., Methods: A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices., Results: We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing., Conclusions: There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains., (© The Author(s) 2021. Published by Oxford University Press GigaScience.)
- Published
- 2021
- Full Text
- View/download PDF
50. An Iterative Process for Identifying Pediatric Patients With Type 1 Diabetes: Retrospective Observational Study.
- Author
-
Morris HL, Donahoo WT, Bruggeman B, Zimmerman C, Hiers P, Zhong VW, and Schatz D
- Abstract
Background: The incidence of both type 1 diabetes (T1DM) and type 2 diabetes (T2DM) in children and youth is increasing. However, the current approach for identifying pediatric diabetes and separating by type is costly, because it requires substantial manual efforts., Objective: The purpose of this study was to develop a computable phenotype for accurately and efficiently identifying diabetes and separating T1DM from T2DM in pediatric patients., Methods: This retrospective study utilized a data set from the University of Florida Health Integrated Data Repository to identify 300 patients aged 18 or younger with T1DM, T2DM, or that were healthy based on a developed computable phenotype. Three endocrinology residents/fellows manually reviewed medical records of all probable cases to validate diabetes status and type. This refined computable phenotype was then used to identify all cases of T1DM and T2DM in the OneFlorida Clinical Research Consortium., Results: A total of 295 electronic health records were manually reviewed; of these, 128 cases were found to have T1DM, 35 T2DM, and 132 no diagnosis. The positive predictive value was 94.7%, the sensitivity was 96.9%, specificity was 95.8%, and the negative predictive value was 97.6%. Overall, the computable phenotype was found to be an accurate and sensitive method to pinpoint pediatric patients with T1DM., Conclusions: We developed a computable phenotype for identifying T1DM correctly and efficiently. The computable phenotype that was developed will enable researchers to identify a population accurately and cost-effectively. As such, this will vastly improve the ease of identifying patients for future intervention studies., (©Heather Lynne Morris, William Troy Donahoo, Brittany Bruggeman, Chelsea Zimmerman, Paul Hiers, Victor W Zhong, Desmond Schatz. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 04.09.2020.)
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.