77 results on '"Dligach D"'
Search Results
2. External Validation of an Acute Respiratory Distress Syndrome Prediction Model Using Clinical Text
- Author
-
Su, X., primary, Mayampurath, A., additional, Churpek, M., additional, Shah, S., additional, Patel, B., additional, Dligach, D., additional, and Afshar, M., additional
- Published
- 2020
- Full Text
- View/download PDF
3. Phenotypic Clusters Derived from Clinical Notes of Patients with Respiratory Failure
- Author
-
Sharma, B., primary, Dligach, D., additional, Joyce, C., additional, Littleton, S.W., additional, and Afshar, M., additional
- Published
- 2019
- Full Text
- View/download PDF
4. Towards comprehensive syntactic and semantic annotations of the clinical narrative
- Author
-
Albright, D, Lanfranchi, A, Fredriksen, A, Styler, WF, Warner, C, Hwang, JD, Choi, JD, Dligach, D, Nielsen, RD, Martin, J, Ward, W, Palmer, M, and Savova, GK
- Subjects
Narration ,Linguistics ,Medical and Health Sciences ,Semantics ,Gold Standard Annotations ,UMLS ,Treebank ,Engineering ,Information and Computing Sciences ,cTAKES ,Humans ,Electronic Health Records ,Medical Informatics ,Natural Language Processing ,Propbank - Abstract
ObjectiveTo create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components.MethodsManual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed.ResultsThe final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891-0.931), NE (0.697-0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations.ConclusionsThis project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible.
- Published
- 2013
- Full Text
- View/download PDF
5. A common type system for clinical natural language processing.
- Author
-
Wu, ST, Kaggal, VC, Dligach, D, Masanz, JJ, Chen, P, Becker, L, Chapman, WW, Savova, GK, Liu, H, Chute, CG, Wu, ST, Kaggal, VC, Dligach, D, Masanz, JJ, Chen, P, Becker, L, Chapman, WW, Savova, GK, Liu, H, and Chute, CG
- Abstract
BACKGROUND: One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. RESULTS: We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. CONCLUSIONS: We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.
- Published
- 2013
6. Improving Verb Sense Disambiguation with Automatically Retrieved Semantic Knowledge.
- Author
-
Dligach, D. and Palmer, M.
- Published
- 2008
- Full Text
- View/download PDF
7. Towards Large-scale High-Performance English Verb Sense Disambiguation by Using Linguistically Motivated Features.
- Author
-
Jinying Chen, Dligach, D., and Palmer, M.
- Published
- 2007
- Full Text
- View/download PDF
8. The Addition of United States Census-Tract Data Does Not Improve the Prediction of Substance Misuse
- Author
-
To, D., Joyce, C., Sujay Kulshrestha, Sharma, B., Dligach, D., Churpek, M., and Afshar, M.
9. A Computable Phenotype for Acute Respiratory Distress Syndrome Using Natural Language Processing and Machine Learning
- Author
-
Afshar, M., Joyce, C., Oakey, A., Formanek, P., Yang, P., Matthew Churpek, Cooper, R. S., Zelisko, S., Price, R., and Dligach, D.
10. A common type system for clinical natural language processing
- Author
-
Wu Stephen T, Kaggal Vinod C, Dligach Dmitriy, Masanz James J, Chen Pei, Becker Lee, Chapman Wendy W, Savova Guergana K, Liu Hongfang, and Chute Christopher G
- Subjects
Natural Language Processing ,Standards and interoperability ,Clinical information extraction ,Clinical Element Models ,Common type system ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Background One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings. Results We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later. Conclusions We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.
- Published
- 2013
- Full Text
- View/download PDF
11. Outcomes and Cost-Effectiveness of an EHR-Embedded AI Screener for Identifying Hospitalized Adults at Risk for Opioid Use Disorder.
- Author
-
Afshar M, Resnik F, Joyce C, Oguss M, Dligach D, Burnside E, Sullivan A, Churpek M, Patterson B, Salisbury-Afshar E, Liao F, Brown R, and Mundt M
- Abstract
Hospitalized adults with opioid use disorder (OUD) are at high risk for adverse events and rehospitalizations. This pre-post quasi-experimental study evaluated whether an AI-driven OUD screener embedded in the electronic health record (EHR) was non-inferior to usual care in identifying patients for Addiction Medicine consults, aiming to provide a similarly effective but more scalable alternative to human-led ad hoc consultations. The AI screener analyzed EHR notes in real-time with a convolutional neural network to identify patients at risk and recommend consultation. The primary outcome was the proportion of patients receiving consults, comparing a 16-month pre-intervention period to an 8-month post-intervention period with the AI screener. Consults did not change between periods (1.35% vs 1.51%, p < 0.001 for non-inferiority). The AI screener was associated with a reduction in 30-day readmissions (OR: 0.53, 95% CI: 0.30-0.91, p = 0.02) with an incremental cost of $6,801 per readmission avoided, demonstrating its potential as a scalable, cost-effective solution for OUD care., Clinicaltrialsgov Id: NCT05745480., Competing Interests: Declarations DECLARATION OF INTEREST All authors have no conflicts of interest.
- Published
- 2024
- Full Text
- View/download PDF
12. Family history as the strongest predictor of aortic and peripheral aneurysms in patients with intracranial aneurysms.
- Author
-
Lai PMR, Akama-Garren E, Can A, Tirado SR, Castro VM, Dligach D, Finan S, Gainer VS, Shadick NA, Savova G, Murphy SN, Cai T, Weiss ST, and Du R
- Subjects
- Humans, Male, Female, Middle Aged, Retrospective Studies, Aged, Risk Factors, Adult, Aortic Aneurysm epidemiology, Aortic Aneurysm genetics, Aortic Aneurysm diagnostic imaging, Aged, 80 and over, Intracranial Aneurysm epidemiology, Intracranial Aneurysm complications
- Abstract
Objective: Intracranial aneurysms (IA) and aortic aneurysms (AA) are both abnormal dilations of arteries with familial predisposition and have been proposed to share co-prevalence and pathophysiology. Associations of IA and non-aortic peripheral aneurysms are less well-studied. The goal of the study was to understand the patterns of aortic and peripheral (extracranial) aneurysms in patients with IA, and risk factors associated with the development of these aneurysms., Methods: 4701 patients were included in our retrospective analysis of all patients with intracranial aneurysms at our institution over the past 26 years. Patient demographics, comorbidities, and aneurysmal locations were analyzed. Univariate and multivariate analyses were performed to study associations with and without extracranial aneurysms., Results: A total of 3.4% of patients (161 of 4701) with IA had at least one extracranial aneurysm. 2.8% had thoracic or abdominal aortic aneurysms. Age, male sex, hypertension, coronary artery disease, history of ischemic cerebral infarction, connective tissues disease, and family history of extracranial aneurysms in a 1st degree relative were associated with the presence of extracranial aneurysms and a higher number of extracranial aneurysms. In addition, family history of extracranial aneurysms in a second degree relative is associated with the presence of extracranial aneurysms and atrial fibrillation is associated with a higher number of extracranial aneurysms., Conclusion: Significant comorbidities are associated with extracranial aneurysms in patients with IA. Family history of extracranial aneurysms has the strongest association and suggests that IA patients with a family history of extracranial aneurysms may benefit from screening., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 Elsevier Ltd. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
13. The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use.
- Author
-
Gallifant J, Afshar M, Ameen S, Aphinyanaphongs Y, Chen S, Cacciamani G, Demner-Fushman D, Dligach D, Daneshjou R, Fernandes C, Hansen LH, Landman A, Lehmann L, McCoy LG, Miller T, Moreno A, Munch N, Restrepo D, Savova G, Umeton R, Gichoya JW, Collins GS, Moons KGM, Celi LA, and Bitterman DS
- Abstract
Large Language Models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present TRIPOD-LLM, an extension of the TRIPOD+AI statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight, and task-specific performance reporting. We also introduce an interactive website ( https://tripod-llm.vercel.app/ ) facilitating easy guideline completion and PDF generation for submission. As a living document, TRIPOD-LLM will evolve with the field, aiming to enhance the quality, reproducibility, and clinical applicability of LLM research in healthcare through comprehensive reporting., Coi: DSB: Editorial, unrelated to this work: Associate Editor of Radiation Oncology, HemOnc.org (no financial compensation); Research funding, unrelated to this work: American Association for Cancer Research; Advisory and consulting, unrelated to this work: MercurialAI. DDF: Editorial, unrelated to this work: Associate Editor of JAMIA, Editorial Board of Scientific Data, Nature; Funding, unrelated to this work: the intramural research program at the U.S. National Library of Medicine, National Institutes of Health. JWG: Editorial, unrelated to this work: Editorial Board of Radiology: Artificial Intelligence, British Journal of Radiology AI journal and NEJM AI. All other authors declare no conflicts of interest.
- Published
- 2024
- Full Text
- View/download PDF
14. LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models.
- Author
-
Yoon W, Chen S, Gao Y, Zhao Z, Dligach D, Bitterman DS, Afshar M, and Miller T
- Abstract
Objective: The application of Natural Language Processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models (LMs) capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent., Materials and Methods: To address this issue, we propose LCD benchmark, a benchmark for the task of predicting 30-day out-of-hospital mortality using discharge notes of MIMIC-IV and statewide death data. We evaluated this benchmark dataset using baseline models, from bag-of-words and CNN to instruction-tuned large language models. Additionally, we provide a comprehensive analysis of the model outputs, including manual review and visualization of model weights, to offer insights into their predictive capabilities and limitations., Results and Discussion: Baseline models showed 28.9% for best-performing supervised models and 32.2% for GPT-4 in F1-metrics. Notes in our dataset have a median word count of 1687. Our analysis of the model outputs showed that our dataset is challenging for both models and human experts, but the models can find meaningful signals from the text., Conclusion: We expect our LCD benchmark to be a resource for the development of advanced supervised models, or prompting methods, tailored for clinical text. The benchmark dataset is available at https://github.com/Machine-Learning-for-Medical-Language/long-clinical-doc.
- Published
- 2024
- Full Text
- View/download PDF
15. Automated stratification of trauma injury severity across multiple body regions using multi-modal, multi-class machine learning models.
- Author
-
Gao J, Chen G, O'Rourke AP, Caskey J, Carey KA, Oguss M, Stey A, Dligach D, Miller T, Mayampurath A, Churpek MM, and Afshar M
- Subjects
- Humans, Injury Severity Score, Registries, Trauma Severity Indices, Natural Language Processing, Machine Learning, Electronic Health Records, Wounds and Injuries classification
- Abstract
Objective: The timely stratification of trauma injury severity can enhance the quality of trauma care but it requires intense manual annotation from certified trauma coders. The objective of this study is to develop machine learning models for the stratification of trauma injury severity across various body regions using clinical text and structured electronic health records (EHRs) data., Materials and Methods: Our study utilized clinical documents and structured EHR variables linked with the trauma registry data to create 2 machine learning models with different approaches to representing text. The first one fuses concept unique identifiers (CUIs) extracted from free text with structured EHR variables, while the second one integrates free text with structured EHR variables. Temporal validation was undertaken to ensure the models' temporal generalizability. Additionally, analyses to assess the variable importance were conducted., Results: Both models demonstrated impressive performance in categorizing leg injuries, achieving high accuracy with macro-F1 scores of over 0.8. Additionally, they showed considerable accuracy, with macro-F1 scores exceeding or near 0.7, in assessing injuries in the areas of the chest and head. We showed in our variable importance analysis that the most important features in the model have strong face validity in determining clinically relevant trauma injuries., Discussion: The CUI-based model achieves comparable performance, if not higher, compared to the free-text-based model, with reduced complexity. Furthermore, integrating structured EHR data improves performance, particularly when the text modalities are insufficiently indicative., Conclusions: Our multi-modal, multiclass models can provide accurate stratification of trauma injury severity and clinically relevant interpretations., (© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
- Published
- 2024
- Full Text
- View/download PDF
16. Development of a Human Evaluation Framework and Correlation with Automated Metrics for Natural Language Generation of Medical Diagnoses.
- Author
-
Croxford E, Gao Y, Patterson B, To D, Tesch S, Dligach D, Mayampurath A, Churpek MM, and Afshar M
- Abstract
In the evolving landscape of clinical Natural Language Generation (NLG), assessing abstractive text quality remains challenging, as existing methods often overlook generative task complexities. This work aimed to examine the current state of automated evaluation metrics in NLG in healthcare. To have a robust and well-validated baseline with which to examine the alignment of these metrics, we created a comprehensive human evaluation framework. Employing ChatGPT-3.5-turbo generative output, we correlated human judgments with each metric. None of the metrics demonstrated high alignment; however, the SapBERT score-a Unified Medical Language System (UMLS)- showed the best results. This underscores the importance of incorporating domain-specific knowledge into evaluation efforts. Our work reveals the deficiency in quality evaluations for generated text and introduces our comprehensive human evaluation framework as a baseline. Future efforts should prioritize integrating medical knowledge databases to enhance the alignment of automated metrics, particularly focusing on refining the SapBERT score for improved assessments.
- Published
- 2024
- Full Text
- View/download PDF
17. Development and external validation of multimodal postoperative acute kidney injury risk machine learning models.
- Author
-
Karway GK, Koyner JL, Caskey J, Spicer AB, Carey KA, Gilbert ER, Dligach D, Mayampurath A, Afshar M, and Churpek MM
- Abstract
Objectives: To develop and externally validate machine learning models using structured and unstructured electronic health record data to predict postoperative acute kidney injury (AKI) across inpatient settings., Materials and Methods: Data for adult postoperative admissions to the Loyola University Medical Center (2009-2017) were used for model development and admissions to the University of Wisconsin-Madison (2009-2020) were used for validation. Structured features included demographics, vital signs, laboratory results, and nurse-documented scores. Unstructured text from clinical notes were converted into concept unique identifiers (CUIs) using the clinical Text Analysis and Knowledge Extraction System. The primary outcome was the development of Kidney Disease Improvement Global Outcomes stage 2 AKI within 7 days after leaving the operating room. We derived unimodal extreme gradient boosting machines (XGBoost) and elastic net logistic regression (GLMNET) models using structured-only data and multimodal models combining structured data with CUI features. Model comparison was performed using the receiver operating characteristic curve (AUROC), with Delong's test for statistical differences., Results: The study cohort included 138 389 adult patient admissions (mean [SD] age 58 [16] years; 11 506 [8%] African-American; and 70 826 [51%] female) across the 2 sites. Of those, 2959 (2.1%) developed stage 2 AKI or higher. Across all data types, XGBoost outperformed GLMNET (mean AUROC 0.81 [95% confidence interval (CI), 0.80-0.82] vs 0.78 [95% CI, 0.77-0.79]). The multimodal XGBoost model incorporating CUIs parameterized as term frequency-inverse document frequency (TF-IDF) showed the highest discrimination performance (AUROC 0.82 [95% CI, 0.81-0.83]) over unimodal models (AUROC 0.79 [95% CI, 0.78-0.80])., Discussion: A multimodality approach with structured data and TF-IDF weighting of CUIs increased model performance over structured data-only models., Conclusion: These findings highlight the predictive power of CUIs when merged with structured data for clinical prediction models, which may improve the detection of postoperative AKI., Competing Interests: Dr Churpek is a named inventor on a patent for a risk stratification algorithm for hospitalized patients (U.S. patent # 11410777). The remaining authors have disclosed that they do not have any potential conflicts of interest., (© The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2023
- Full Text
- View/download PDF
18. End-to-end clinical temporal information extraction with multi-head attention.
- Author
-
Miller T, Bethard S, Dligach D, and Savova G
- Abstract
Understanding temporal relationships in text from electronic health records can be valuable for many important downstream clinical applications. Since Clinical TempEval 2017, there has been little work on end-to-end systems for temporal relation extraction, with most work focused on the setting where gold standard events and time expressions are given. In this work, we make use of a novel multi-headed attention mechanism on top of a pre-trained transformer encoder to allow the learning process to attend to multiple aspects of the contextualized embeddings. Our system achieves state of the art results on the THYME corpus by a wide margin, in both the in-domain and cross-domain settings.
- Published
- 2023
19. Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning.
- Author
-
Sharma B, Gao Y, Miller T, Churpek MM, Afshar M, and Dligach D
- Abstract
Generative artificial intelligence (AI) is a promising direction for augmenting clinical diagnostic decision support and reducing diagnostic errors, a leading contributor to medical errors. To further the development of clinical AI systems, the Diagnostic Reasoning Benchmark (DR.BENCH) was introduced as a comprehensive generative AI framework, comprised of six tasks representing key components in clinical reasoning. We present a comparative analysis of in-domain versus out-of-domain language models as well as multi-task versus single task training with a focus on the problem summarization task in DR.BENCH (Gao et al., 2023). We demonstrate that a multi-task, clinically-trained language model outperforms its general domain counterpart by a large margin, establishing a new state-of-the-art performance, with a ROUGE-L score of 28.55. This research underscores the value of domain-specific training for optimizing clinical diagnostic reasoning tasks.
- Published
- 2023
20. Overview of the Problem List Summarization (ProbSum) 2023 Shared Task on Summarizing Patients' Active Diagnoses and Problems from Electronic Health Record Progress Notes.
- Author
-
Gao Y, Dligach D, Miller T, Churpek MM, and Afshar M
- Abstract
The BioNLP Workshop 2023 initiated the launch of a shared task on Problem List Summarization (ProbSum) in January 2023. The aim of this shared task is to attract future research efforts in building NLP models for real-world diagnostic decision support applications, where a system generating relevant and accurate diagnoses will augment the healthcare providers' decision-making process and improve the quality of care for patients. The goal for participants is to develop models that generated a list of diagnoses and problems using input from the daily care notes collected from the hospitalization of critically ill patients. Eight teams submitted their final systems to the shared task leaderboard. In this paper, we describe the tasks, datasets, evaluation metrics, and baseline systems. Additionally, the techniques and results of the evaluation of the different approaches tried by the participating teams are summarized.
- Published
- 2023
- Full Text
- View/download PDF
21. Improving the Transferability of Clinical Note Section Classification Models with BERT and Large Language Model Ensembles.
- Author
-
Zhou W, Dligach D, Afshar M, Gao Y, and Miller TA
- Abstract
Text in electronic health records is organized into sections, and classifying those sections into section categories is useful for downstream tasks. In this work, we attempt to improve the transferability of section classification models by combining the dataset-specific knowledge in supervised learning models with the world knowledge inside large language models (LLMs). Surprisingly, we find that zero-shot LLMs out-perform supervised BERT-based models applied to out-of-domain data. We also find that their strengths are synergistic, so that a simple ensemble technique leads to additional performance gains.
- Published
- 2023
22. Progress Note Understanding - Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 shared task.
- Author
-
Gao Y, Dligach D, Miller T, Churpek MM, Uzuner O, and Afshar M
- Subjects
- Humans, Natural Language Processing, Health Personnel, Electronic Health Records, Information Storage and Retrieval
- Abstract
Daily progress notes are a common note type in the electronic health record (EHR) where healthcare providers document the patient's daily progress and treatment plans. The EHR is designed to document all the care provided to patients, but it also enables note bloat with extraneous information that distracts from the diagnoses and treatment plans. Applications of natural language processing (NLP) in the EHR is a growing field with the majority of methods in information extraction. Few tasks use NLP methods for downstream diagnostic decision support. We introduced the 2022 National NLP Clinical Challenge (N2C2) Track 3: Progress Note Understanding - Assessment and Plan Reasoning as one step towards a new suite of tasks. The Assessment and Plan Reasoning task focuses on the most critical components of progress notes, Assessment and Plan subsections where health problems and diagnoses are contained. The goal of the task was to develop and evaluate NLP systems that automatically predict causal relations between the overall status of the patient contained in the Assessment section and its relation to each component of the Plan section which contains the diagnoses and treatment plans. The goal of the task was to identify and prioritize diagnoses as the first steps in diagnostic decision support to find the most relevant information in long documents like daily progress notes. We present the results of the 2022 N2C2 Track 3 and provide a description of the data, evaluation, participation and system performance., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2023. Published by Elsevier Inc.)
- Published
- 2023
- Full Text
- View/download PDF
23. Deployment of Real-time Natural Language Processing and Deep Learning Clinical Decision Support in the Electronic Health Record: Pipeline Implementation for an Opioid Misuse Screener in Hospitalized Adults.
- Author
-
Afshar M, Adelaine S, Resnik F, Mundt MP, Long J, Leaf M, Ampian T, Wills GJ, Schnapp B, Chao M, Brown R, Joyce C, Sharma B, Dligach D, Burnside ES, Mahoney J, Churpek MM, Patterson BW, and Liao F
- Abstract
Background: The clinical narrative in electronic health records (EHRs) carries valuable information for predictive analytics; however, its free-text form is difficult to mine and analyze for clinical decision support (CDS). Large-scale clinical natural language processing (NLP) pipelines have focused on data warehouse applications for retrospective research efforts. There remains a paucity of evidence for implementing NLP pipelines at the bedside for health care delivery., Objective: We aimed to detail a hospital-wide, operational pipeline to implement a real-time NLP-driven CDS tool and describe a protocol for an implementation framework with a user-centered design of the CDS tool., Methods: The pipeline integrated a previously trained open-source convolutional neural network model for screening opioid misuse that leveraged EHR notes mapped to standardized medical vocabularies in the Unified Medical Language System. A sample of 100 adult encounters were reviewed by a physician informaticist for silent testing of the deep learning algorithm before deployment. An end user interview survey was developed to examine the user acceptability of a best practice alert (BPA) to provide the screening results with recommendations. The planned implementation also included a human-centered design with user feedback on the BPA, an implementation framework with cost-effectiveness, and a noninferiority patient outcome analysis plan., Results: The pipeline was a reproducible workflow with a shared pseudocode for a cloud service to ingest, process, and store clinical notes as Health Level 7 messages from a major EHR vendor in an elastic cloud computing environment. Feature engineering of the notes used an open-source NLP engine, and the features were fed into the deep learning algorithm, with the results returned as a BPA in the EHR. On-site silent testing of the deep learning algorithm demonstrated a sensitivity of 93% (95% CI 66%-99%) and specificity of 92% (95% CI 84%-96%), similar to published validation studies. Before deployment, approvals were received across hospital committees for inpatient operations. Five interviews were conducted; they informed the development of an educational flyer and further modified the BPA to exclude certain patients and allow the refusal of recommendations. The longest delay in pipeline development was because of cybersecurity approvals, especially because of the exchange of protected health information between the Microsoft (Microsoft Corp) and Epic (Epic Systems Corp) cloud vendors. In silent testing, the resultant pipeline provided a BPA to the bedside within minutes of a provider entering a note in the EHR., Conclusions: The components of the real-time NLP pipeline were detailed with open-source tools and pseudocode for other health systems to benchmark. The deployment of medical artificial intelligence systems in routine clinical care presents an important yet unfulfilled opportunity, and our protocol aimed to close the gap in the implementation of artificial intelligence-driven CDS., Trial Registration: ClinicalTrials.gov NCT05745480; https://www.clinicaltrials.gov/ct2/show/NCT05745480., (©Majid Afshar, Sabrina Adelaine, Felice Resnik, Marlon P Mundt, John Long, Margaret Leaf, Theodore Ampian, Graham J Wills, Benjamin Schnapp, Michael Chao, Randy Brown, Cara Joyce, Brihat Sharma, Dmitriy Dligach, Elizabeth S Burnside, Jane Mahoney, Matthew M Churpek, Brian W Patterson, Frank Liao. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 20.04.2023.)
- Published
- 2023
- Full Text
- View/download PDF
24. DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing.
- Author
-
Gao Y, Dligach D, Miller T, Caskey J, Sharma B, Churpek MM, and Afshar M
- Subjects
- Humans, Benchmarking, Problem Solving, Information Storage and Retrieval, Natural Language Processing, Artificial Intelligence
- Abstract
The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgement that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, Dr.Bench, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models for diagnostic reasoning. The goal of DR. BENCH is to advance the science in cNLP to support downstream applications in computerized diagnostic decision support and improve the efficiency and accuracy of healthcare providers during patient care. We fine-tune and evaluate the state-of-the-art generative models on DR.BENCH. Experiments show that with domain adaptation pre-training on medical knowledge, the model demonstrated opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community. We also discuss the carbon footprint produced during the experiments and encourage future work on DR.BENCH to report the carbon footprint., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2023 Elsevier Inc. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
25. The Evaluation of a Clinical Decision Support Tool Using Natural Language Processing to Screen Hospitalized Adults for Unhealthy Substance Use: Protocol for a Quasi-Experimental Design.
- Author
-
Joyce C, Markossian TW, Nikolaides J, Ramsey E, Thompson HM, Rojas JC, Sharma B, Dligach D, Oguss MK, Cooper RS, and Afshar M
- Abstract
Background: Automated and data-driven methods for screening using natural language processing (NLP) and machine learning may replace resource-intensive manual approaches in the usual care of patients hospitalized with conditions related to unhealthy substance use. The rigorous evaluation of tools that use artificial intelligence (AI) is necessary to demonstrate effectiveness before system-wide implementation. An NLP tool to use routinely collected data in the electronic health record was previously validated for diagnostic accuracy in a retrospective study for screening unhealthy substance use. Our next step is a noninferiority design incorporated into a research protocol for clinical implementation with prospective evaluation of clinical effectiveness in a large health system., Objective: This study aims to provide a study protocol to evaluate health outcomes and the costs and benefits of an AI-driven automated screener compared to manual human screening for unhealthy substance use., Methods: A pre-post design is proposed to evaluate 12 months of manual screening followed by 12 months of automated screening across surgical and medical wards at a single medical center. The preintervention period consists of usual care with manual screening by nurses and social workers and referrals to a multidisciplinary Substance Use Intervention Team (SUIT). Facilitated by a NLP pipeline in the postintervention period, clinical notes from the first 24 hours of hospitalization will be processed and scored by a machine learning model, and the SUIT will be similarly alerted to patients who flagged positive for substance misuse. Flowsheets within the electronic health record have been updated to capture rates of interventions for the primary outcome (brief intervention/motivational interviewing, medication-assisted treatment, naloxone dispensing, and referral to outpatient care). Effectiveness in terms of patient outcomes will be determined by noninferior rates of interventions (primary outcome), as well as rates of readmission within 6 months, average time to consult, and discharge rates against medical advice (secondary outcomes) in the postintervention period by a SUIT compared to the preintervention period. A separate analysis will be performed to assess the costs and benefits to the health system by using automated screening. Changes from the pre- to postintervention period will be assessed in covariate-adjusted generalized linear mixed-effects models., Results: The study will begin in September 2022. Monthly data monitoring and Data Safety Monitoring Board reporting are scheduled every 6 months throughout the study period. We anticipate reporting final results by June 2025., Conclusions: The use of augmented intelligence for clinical decision support is growing with an increasing number of AI tools. We provide a research protocol for prospective evaluation of an automated NLP system for screening unhealthy substance use using a noninferiority design to demonstrate comprehensive screening that may be as effective as manual screening but less costly via automated solutions., Trial Registration: ClinicalTrials.gov NCT03833804; https://clinicaltrials.gov/ct2/show/NCT03833804., International Registered Report Identifier (irrid): DERR1-10.2196/42971., (©Cara Joyce, Talar W Markossian, Jenna Nikolaides, Elisabeth Ramsey, Hale M Thompson, Juan C Rojas, Brihat Sharma, Dmitriy Dligach, Madeline K Oguss, Richard S Cooper, Majid Afshar. Originally published in JMIR Research Protocols (https://www.researchprotocols.org), 19.12.2022.)
- Published
- 2022
- Full Text
- View/download PDF
26. Summarizing Patients' Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models.
- Author
-
Gao Y, Miller T, Xu D, Dligach D, Churpek MM, and Afshar M
- Abstract
Automatically summarizing patients' main problems from daily progress notes using natural language processing methods helps to battle against information and cognitive overload in hospital settings and potentially assists providers with computerized diagnostic decision support. Problem list summarization requires a model to understand, abstract, and generate clinical documentation. In this work, we propose a new NLP task that aims to generate a list of problems in a patient's daily care plan using input from the provider's progress notes during hospitalization. We investigate the performance of T5 and BART, two state-of-the-art seq2seq transformer architectures, in solving this problem. We provide a corpus built on top of progress notes from publicly available electronic health record progress notes in the Medical Information Mart for Intensive Care (MIMIC)-III. T5 and BART are trained on general domain text, and we experiment with a data augmentation method and a domain adaptation pre-training method to increase exposure to medical vocabulary and knowledge. Evaluation methods include ROUGE, BERTScore, cosine similarity on sentence embedding, and F-score on medical concepts. Results show that T5 with domain adaptive pre-training achieves significant performance gains compared to a rule-based system and general domain pre-trained language models, indicating a promising direction for tackling the problem summarization task.
- Published
- 2022
27. A scoping review of publicly available language tasks in clinical natural language processing.
- Author
-
Gao Y, Dligach D, Christensen L, Tesch S, Laffin R, Xu D, Miller T, Uzuner O, Churpek MM, and Afshar M
- Subjects
- Data Collection, Data Management, Humans, Information Storage and Retrieval, Electronic Health Records, Natural Language Processing
- Abstract
Objective: To provide a scoping review of papers on clinical natural language processing (NLP) shared tasks that use publicly available electronic health record data from a cohort of patients., Materials and Methods: We searched 6 databases, including biomedical research and computer science literature databases. A round of title/abstract screening and full-text screening were conducted by 2 reviewers. Our method followed the PRISMA-ScR guidelines., Results: A total of 35 papers with 48 clinical NLP tasks met inclusion criteria between 2007 and 2021. We categorized the tasks by the type of NLP problems, including named entity recognition, summarization, and other NLP tasks. Some tasks were introduced as potential clinical decision support applications, such as substance abuse detection, and phenotyping. We summarized the tasks by publication venue and dataset type., Discussion: The breadth of clinical NLP tasks continues to grow as the field of NLP evolves with advancements in language systems. However, gaps exist with divergent interests between the general domain NLP community and the clinical informatics community for task motivation and design, and in generalizability of the data sources. We also identified issues in data preparation., Conclusion: The existing clinical NLP tasks cover a wide range of topics and the field is expected to grow and attract more attention from both general domain NLP and clinical informatics community. We encourage future work to incorporate multidisciplinary collaboration, reporting transparency, and standardization in data preparation. We provide a listing of all the shared task papers and datasets from this review in a GitLab repository., (© The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
- Published
- 2022
- Full Text
- View/download PDF
28. Hierarchical Annotation for Building A Suite of Clinical Natural Language Processing Tasks: Progress Note Understanding.
- Author
-
Gao Y, Dligach D, Miller T, Tesch S, Laffin R, Churpek MM, and Afshar M
- Abstract
Applying methods in natural language processing on electronic health records (EHR) data is a growing field. Existing corpus and annotation focus on modeling textual features and relation prediction. However, there is a paucity of annotated corpus built to model clinical diagnostic thinking, a process involving text understanding, domain knowledge abstraction and reasoning. This work introduces a hierarchical annotation schema with three stages to address clinical text understanding, clinical reasoning, and summarization. We created an annotated corpus based on an extensive collection of publicly available daily progress notes, a type of EHR documentation that is collected in time series in a problem-oriented format. The conventional format for a progress note follows a Subjective, Objective, Assessment and Plan heading (SOAP). We also define a new suite of tasks, Progress Note Understanding, with three tasks utilizing the three annotation stages. The novel suite of tasks was designed to train and evaluate future NLP models for clinical text understanding, clinical knowledge representation, inference, and summarization.
- Published
- 2022
29. Development and multimodal validation of a substance misuse algorithm for referral to treatment using artificial intelligence (SMART-AI): a retrospective deep learning study.
- Author
-
Afshar M, Sharma B, Dligach D, Oguss M, Brown R, Chhabra N, Thompson HM, Markossian T, Joyce C, Churpek MM, and Karnik NS
- Subjects
- Adult, Artificial Intelligence, Humans, Referral and Consultation, Retrospective Studies, United States, Alcoholism complications, Alcoholism diagnosis, Alcoholism therapy, Deep Learning, Opioid-Related Disorders
- Abstract
Background: Substance misuse is a heterogeneous and complex set of behavioural conditions that are highly prevalent in hospital settings and frequently co-occur. Few hospital-wide solutions exist to comprehensively and reliably identify these conditions to prioritise care and guide treatment. The aim of this study was to apply natural language processing (NLP) to clinical notes collected in the electronic health record (EHR) to accurately screen for substance misuse., Methods: The model was trained and developed on a reference dataset derived from a hospital-wide programme at Rush University Medical Center (RUMC), Chicago, IL, USA, that used structured diagnostic interviews to manually screen admitted patients over 27 months (between Oct 1, 2017, and Dec 31, 2019; n=54 915). The Alcohol Use Disorder Identification Test and Drug Abuse Screening Tool served as reference standards. The first 24 h of notes in the EHR were mapped to standardised medical vocabulary and fed into single-label, multilabel, and multilabel with auxillary-task neural network models. Temporal validation of the model was done using data from the subsequent 12 months on a subset of RUMC patients (n=16 917). External validation was done using data from Loyola University Medical Center, Chicago, IL, USA between Jan 1, 2007, and Sept 30, 2017 (n=1991 adult patients). The primary outcome was discrimination for alcohol misuse, opioid misuse, or non-opioid drug misuse. Discrimination was assessed by the area under the receiver operating characteristic curve (AUROC). Calibration slope and intercept were measured with the unreliability index. Bias assessments were performed across demographic subgroups., Findings: The model was trained on a cohort that had 3·5% misuse (n=1 921) with any type of substance. 220 (11%) of 1921 patients with substance misuse had more than one type of misuse. The multilabel convolutional neural network classifier had a mean AUROC of 0·97 (95% CI 0·96-0·98) during temporal validation for all types of substance misuse. The model was well calibrated and showed good face validity with model features containing explicit mentions of aberrant drug-taking behaviour. A false-negative rate of 0·18-0·19 and a false-positive rate of 0·03 between non-Hispanic Black and non-Hispanic White groups occurred. In external validation, the AUROCs for alcohol and opioid misuse were 0·88 (95% CI 0·86-0·90) and 0·94 (0·92-0·95), respectively., Interpretation: We developed a novel and accurate approach to leveraging the first 24 h of EHR notes for screening multiple types of substance misuse., Funding: National Institute On Drug Abuse, National Institutes of Health., Competing Interests: Declaration of interests MMC has a patent pending (ARCD. P0535US.P2) for risk stratification algorithms, not related to this Article, for hospitalised patients, and has received research support from EarlySense (Tel Aviv, Israel). All other authors declare no competing interests., (Copyright © 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY-NC-ND 4.0 license. Published by Elsevier Ltd.. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
30. Correction: Identifying COVID-19 Outbreaks From Contact-Tracing Interview Forms for Public Health Departments: Development of a Natural Language Processing Pipeline.
- Author
-
Caskey J, McConnell IL, Oguss M, Dligach D, Kulikoff R, Grogan B, Gibson C, Wimmer E, DeSalvo TE, Nyakoe-Nyasani EE, Churpek MM, and Afshar M
- Abstract
[This corrects the article DOI: 10.2196/36119.]., (©John Caskey, Iain L McConnell, Madeline Oguss, Dmitriy Dligach, Rachel Kulikoff, Brittany Grogan, Crystal Gibson, Elizabeth Wimmer, Traci E DeSalvo, Edwin E Nyakoe-Nyasani, Matthew M Churpek, Majid Afshar. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 24.03.2022.)
- Published
- 2022
- Full Text
- View/download PDF
31. Identifying COVID-19 Outbreaks From Contact-Tracing Interview Forms for Public Health Departments: Development of a Natural Language Processing Pipeline.
- Author
-
Caskey J, McConnell IL, Oguss M, Dligach D, Kulikoff R, Grogan B, Gibson C, Wimmer E, DeSalvo TE, Nyakoe-Nyasani EE, Churpek MM, and Afshar M
- Subjects
- Contact Tracing, Disease Outbreaks, Humans, Public Health, SARS-CoV-2, COVID-19 epidemiology, Natural Language Processing
- Abstract
Background: In Wisconsin, COVID-19 case interview forms contain free-text fields that need to be mined to identify potential outbreaks for targeted policy making. We developed an automated pipeline to ingest the free text into a pretrained neural language model to identify businesses and facilities as outbreaks., Objective: We aimed to examine the precision and recall of our natural language processing pipeline against existing outbreaks and potentially new clusters., Methods: Data on cases of COVID-19 were extracted from the Wisconsin Electronic Disease Surveillance System (WEDSS) for Dane County between July 1, 2020, and June 30, 2021. Features from the case interview forms were fed into a Bidirectional Encoder Representations from Transformers (BERT) model that was fine-tuned for named entity recognition (NER). We also developed a novel location-mapping tool to provide addresses for relevant NER. Precision and recall were measured against manually verified outbreaks and valid addresses in WEDSS., Results: There were 46,798 cases of COVID-19, with 4,183,273 total BERT tokens and 15,051 unique tokens. The recall and precision of the NER tool were 0.67 (95% CI 0.66-0.68) and 0.55 (95% CI 0.54-0.57), respectively. For the location-mapping tool, the recall and precision were 0.93 (95% CI 0.92-0.95) and 0.93 (95% CI 0.92-0.95), respectively. Across monthly intervals, the NER tool identified more potential clusters than were verified in WEDSS., Conclusions: We developed a novel pipeline of tools that identified existing outbreaks and novel clusters with associated addresses. Our pipeline ingests data from a statewide database and may be deployed to assist local health departments for targeted interventions., (©John Caskey, Iain L McConnell, Madeline Oguss, Dmitriy Dligach, Rachel Kulikoff, Brittany Grogan, Crystal Gibson, Elizabeth Wimmer, Traci E DeSalvo, Edwin E Nyakoe-Nyasani, Matthew M Churpek, Majid Afshar. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 08.03.2022.)
- Published
- 2022
- Full Text
- View/download PDF
32. Geometric Features Associated with Middle Cerebral Artery Bifurcation Aneurysm Formation: A Matched Case-Control Study.
- Author
-
Zhang J, Can A, Lai PMR, Mukundan S, Castro VM, Dligach D, Finan S, Gainer VS, Shadick NA, Savova G, Murphy SN, Cai T, Weiss ST, and Du R
- Subjects
- Case-Control Studies, Computed Tomography Angiography, Female, Humans, Intracranial Aneurysm diagnostic imaging, Middle Cerebral Artery diagnostic imaging
- Abstract
Objectives: The pathogenesis of intracranial aneurysms is multifactorial and includes genetic, environmental, and anatomic influences. We aimed to identify image-based morphological parameters that were associated with middle cerebral artery (MCA) bifurcation aneurysms., Materials and Methods: We evaluated three-dimensional morphological parameters obtained from CT angiography (CTA) or digital subtraction angiography (DSA) from 317 patients with unilateral MCA bifurcation aneurysms diagnosed at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016. We chose the contralateral unaffected MCA bifurcation as the control group, in order to control for genetic and environmental risk factors. Diameters and angles of surrounding parent and daughter vessels of 634 MCAs were examined., Results: Univariable and multivariable statistical analyses were performed to determine statistical significance. Sensitivity analyses with smaller (≤ 3 mm) aneurysms only and with angles excluded, were also performed. In a multivariable conditional logistic regression model we showed that smaller diameter size ratio (OR 0.0004, 95% CI 0.0001-0.15), larger daughter-daughter angles (OR 1.08, 95% CI 1.06-1.11) and larger parent-daughter angle ratios (OR 4.24, 95% CI 1.77-10.16) were significantly associated with MCA aneurysm presence after correcting for other variables. In order to account for possible changes to the vasculature by the aneurysm, a subgroup analysis of small aneurysms (≤ 3 mm) was performed and showed that the results were similar., Conclusions: Easily measurable morphological parameters of the surrounding vasculature of the MCA may provide objective metrics to assess MCA aneurysm formation risk in high-risk patients., (Copyright © 2021 Elsevier Inc. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
33. The Addition of United States Census-Tract Data Does Not Improve the Prediction of Substance Misuse.
- Author
-
To D, Joyce C, Kulshrestha S, Sharma B, Dligach D, Churpek M, and Afshar M
- Subjects
- Cohort Studies, Electronic Health Records, Humans, Social Class, United States, Censuses, Opioid-Related Disorders
- Abstract
Predictors from the structured data in the electronic health record (EHR) have previously been used for case-identification in substance misuse. We aim to examine the added benefit from census-tract data, a proxy for socioeconomic status, to improve identification. A cohort of 186,611 hospitalizations was derived between 2007 and 2017. Reference labels included alcohol misuse only, opioid misuse only, and both alcohol and opioid misuse. Baseline models were created using 24 EHR variables, and enhanced models were created with the addition of 48 census-tract variables from the United States American Community Survey. The absolute net reclassification index (NRI) was applied to measure the benefit in adding census-tract variables to baseline models. The baseline models already had good calibration and discrimination. Adding census-tract variables provided negligible improvement to sensitivity and specificity and NRI was less than 1% across substance groups. Our results show the census-tract added minimal value to prediction models., (©2021 AMIA - All rights reserved.)
- Published
- 2022
34. Geometric variations associated with posterior communicating artery aneurysms.
- Author
-
Zhang J, Can A, Lai PMR, Mukundan S Jr, Castro VM, Dligach D, Finan S, Gainer V, Shadick N, Savova G, Murphy SN, Cai T, Weiss ST, and Du R
- Subjects
- Angiography, Digital Subtraction, Cerebral Angiography, Circle of Willis diagnostic imaging, Computed Tomography Angiography, Female, Humans, Aneurysm, Ruptured, Intracranial Aneurysm diagnostic imaging
- Abstract
Background: Hemodynamic stress, conditioned by the morphology of the surrounding vasculature, plays an important role in aneurysm formation. Our goal was to identify image-based location-specific parameters that are associated with posterior communicating artery (PCoA) aneurysms., Methods: Three-dimensional morphological parameters obtained from CT angiography or digital subtraction angiography from 187 patients with unilateral PCoA aneurysms, diagnosed at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016, were evaluated. In order to control for genetic and clinical risk factors, we chose the contralateral unaffected PCoA as a control group. We examined diameters and angles of the surrounding parent and daughter vessels. Univariable and multivariable statistical analyses were performed to determine statistical significance. Sensitivity analyses with small aneurysms (≤5 mm) only and an unmatched analysis of 432 PCoA aneurysms and 197 control patients without PCoA aneurysms were also performed., Results: In a multivariable conditional logistic regression model we showed that smaller diameter size ratio (OR 1.45×10
-5 , 95% CI 1.12×10-7 to 1.88×10-3 ) and larger daughter-daughter angle (OR 1.04, 95% CI 1.02 to 1.07) were significantly associated with PCoA aneurysm presence after correcting for other variables. In subgroup analyses of small aneurysms (≤5 mm) and in an unmatched analysis the significance and direction of these results were preserved., Conclusions: Larger daughter-daughter angles and smaller diameter size ratio are significantly associated with the presence of PCoA aneurysms. These simple parameters can be utilized to guide the risk assessment for the formation of PCoA aneurysms in high risk patients., Competing Interests: Competing interests: None declared., (© Author(s) (or their employer(s)) 2021. No commercial re-use. See rights and permissions. Published by BMJ.)- Published
- 2021
- Full Text
- View/download PDF
35. Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups.
- Author
-
Thompson HM, Sharma B, Bhalla S, Boley R, McCluskey C, Dligach D, Churpek MM, Karnik NS, and Afshar M
- Subjects
- Electronic Health Records, Hispanic or Latino, Humans, Machine Learning, Natural Language Processing, Opioid-Related Disorders
- Abstract
Objectives: To assess fairness and bias of a previously validated machine learning opioid misuse classifier., Materials & Methods: Two experiments were conducted with the classifier's original (n = 1000) and external validation (n = 53 974) datasets from 2 health systems. Bias was assessed via testing for differences in type II error rates across racial/ethnic subgroups (Black, Hispanic/Latinx, White, Other) using bootstrapped 95% confidence intervals. A local surrogate model was estimated to interpret the classifier's predictions by race and averaged globally from the datasets. Subgroup analyses and post-hoc recalibrations were conducted to attempt to mitigate biased metrics., Results: We identified bias in the false negative rate (FNR = 0.32) of the Black subgroup compared to the FNR (0.17) of the White subgroup. Top features included "heroin" and "substance abuse" across subgroups. Post-hoc recalibrations eliminated bias in FNR with minimal changes in other subgroup error metrics. The Black FNR subgroup had higher risk scores for readmission and mortality than the White FNR subgroup, and a higher mortality risk score than the Black true positive subgroup (P < .05)., Discussion: The Black FNR subgroup had the greatest severity of disease and risk for poor outcomes. Similar features were present between subgroups for predicting opioid misuse, but inequities were present. Post-hoc mitigation techniques mitigated bias in type II error rate without creating substantial type I error rates. From model design through deployment, bias and data disadvantages should be systematically addressed., Conclusion: Standardized, transparent bias assessments are needed to improve trustworthiness in clinical machine learning models., (© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2021
- Full Text
- View/download PDF
36. External validation of an opioid misuse machine learning classifier in hospitalized adult patients.
- Author
-
Afshar M, Sharma B, Bhalla S, Thompson HM, Dligach D, Boley RA, Kishen E, Simmons A, Perticone K, and Karnik NS
- Subjects
- Adult, Analgesics, Opioid, Electronic Health Records, Humans, Machine Learning, Patients, Opioid-Related Disorders diagnosis, Opioid-Related Disorders epidemiology
- Abstract
Background: Opioid misuse screening in hospitals is resource-intensive and rarely done. Many hospitalized patients are never offered opioid treatment. An automated approach leveraging routinely captured electronic health record (EHR) data may be easier for hospitals to institute. We previously derived and internally validated an opioid classifier in a separate hospital setting. The aim is to externally validate our previously published and open-source machine-learning classifier at a different hospital for identifying cases of opioid misuse., Methods: An observational cohort of 56,227 adult hospitalizations was examined between October 2017 and December 2019 during a hospital-wide substance use screening program with manual screening. Manually completed Drug Abuse Screening Test served as the reference standard to validate a convolutional neural network (CNN) classifier with coded word embedding features from the clinical notes of the EHR. The opioid classifier utilized all notes in the EHR and sensitivity analysis was also performed on the first 24 h of notes. Calibration was performed to account for the lower prevalence than in the original cohort., Results: Manual screening for substance misuse was completed in 67.8% (n = 56,227) with 1.1% (n = 628) identified with opioid misuse. The data for external validation included 2,482,900 notes with 67,969 unique clinical concept features. The opioid classifier had an AUC of 0.99 (95% CI 0.99-0.99) across the encounter and 0.98 (95% CI 0.98-0.99) using only the first 24 h of notes. In the calibrated classifier, the sensitivity and positive predictive value were 0.81 (95% CI 0.77-0.84) and 0.72 (95% CI 0.68-0.75). For the first 24 h, they were 0.75 (95% CI 0.71-0.78) and 0.61 (95% CI 0.57-0.64)., Conclusions: Our opioid misuse classifier had good discrimination during external validation. Our model may provide a comprehensive and automated approach to opioid misuse identification that augments current workflows and overcomes manual screening barriers.
- Published
- 2021
- Full Text
- View/download PDF
37. Comparison and interpretability of machine learning models to predict severity of chest injury.
- Author
-
Kulshrestha S, Dligach D, Joyce C, Gonzalez R, O'Rourke AP, Glazer JM, Stey A, Kruser JM, Churpek MM, and Afshar M
- Abstract
Objective: Trauma quality improvement programs and registries improve care and outcomes for injured patients. Designated trauma centers calculate injury scores using dedicated trauma registrars; however, many injuries arrive at nontrauma centers, leaving a substantial amount of data uncaptured. We propose automated methods to identify severe chest injury using machine learning (ML) and natural language processing (NLP) methods from the electronic health record (EHR) for quality reporting., Materials and Methods: A level I trauma center was queried for patients presenting after injury between 2014 and 2018. Prediction modeling was performed to classify severe chest injury using a reference dataset labeled by certified registrars. Clinical documents from trauma encounters were processed into concept unique identifiers for inputs to ML models: logistic regression with elastic net (EN) regularization, extreme gradient boosted (XGB) machines, and convolutional neural networks (CNN). The optimal model was identified by examining predictive and face validity metrics using global explanations., Results: Of 8952 encounters, 542 (6.1%) had a severe chest injury. CNN and EN had the highest discrimination, with an area under the receiver operating characteristic curve of 0.93 and calibration slopes between 0.88 and 0.97. CNN had better performance across risk thresholds with fewer discordant cases. Examination of global explanations demonstrated the CNN model had better face validity, with top features including "contusion of lung" and "hemopneumothorax.", Discussion: The CNN model featured optimal discrimination, calibration, and clinically relevant features selected., Conclusion: NLP and ML methods to populate trauma registries for quality analyses are feasible., (© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2021
- Full Text
- View/download PDF
38. Tobacco use and age are associated with different morphologic features of anterior communicating artery aneurysms.
- Author
-
Zhang J, Lai PMR, Can A, Mukundan S, Castro VM, Dligach D, Finan S, Gainer VS, Shadick NA, Savova G, Murphy SN, Cai T, Weiss ST, and Du R
- Subjects
- Adult, Age Factors, Aged, Aneurysm, Ruptured pathology, Female, Humans, Intracranial Aneurysm pathology, Male, Middle Aged, Risk Factors, Aneurysm, Ruptured epidemiology, Anterior Cerebral Artery pathology, Intracranial Aneurysm epidemiology, Tobacco Use epidemiology
- Abstract
We present a cohort of patients with anterior communicating artery (ACoA) aneurysms to investigate morphological characteristics and clinical factors associated with rupture of the aneurysms. 505 patients with ACoA aneurysms were identified at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016, with available CT angiography (CTA). Three-dimensional (3D) reconstructions were performed to evaluate aneurysmal morphologic features, including location, projection, irregularity, the presence of daughter dome, height, height/width ratio, and relationships between surrounding vessels. Patient risk factors assessed included patient age, sex, tobacco use, alcohol use, and family history of aneurysms and aneurysmal subarachnoid hemorrhage. Logistic regression was used to build a predictive ACoA score for rupture. Morphologic features associated with ruptured ACoA aneurysms were the presence of a daughter dome (OR 21.4, 95% CI 10.6-43.1), smaller neck diameter (OR 0.55, 95% CI 0.42-0.71), larger aspect ratio (OR 3.57, 95% CI 2.05-6.24), larger flow angle (OR 1.03, 95% CI 1.02-1.05), and smaller ipsilateral A2-ACoA angle (OR 0.98, 95% CI 0.97-1.00). Tobacco use was predominantly associated with morphological factors intrinsic to the aneurysm that were associated with rupture while younger age was also associated with morphologic features extrinsic to the aneurysm that were associated with rupture. The ACoA score had good predictive capacity for rupture with AUC = 0.92 using the 0.632 bootstrap cross-validation for correction of overfitting bias. Ruptured ACoA aneurysms were associated with morphological features that are simple to assess using a simple scoring system. Tobacco use and younger age were predominantly associated with intrinsic and extrinsic morphological features characteristic of rupture, respectively.
- Published
- 2021
- Full Text
- View/download PDF
39. Vascular Geometry Associated with Anterior Communicating Artery Aneurysm Formation.
- Author
-
Zhang J, Can A, Lai PMR, Mukundan S Jr, Castro VM, Dligach D, Finan S, Gainer VS, Shadick NA, Savova G, Murphy SN, Cai T, Weiss ST, and Du R
- Subjects
- Adult, Aged, Anterior Cerebral Artery pathology, Case-Control Studies, Cerebral Angiography, Circle of Willis pathology, Computed Tomography Angiography, Female, Humans, Imaging, Three-Dimensional, Male, Middle Aged, Organ Size, Aneurysm, Ruptured diagnostic imaging, Anterior Cerebral Artery diagnostic imaging, Circle of Willis diagnostic imaging, Intracranial Aneurysm diagnostic imaging
- Abstract
Objective: To identify clinical and morphologic risk factors correlated with anterior communicating artery (ACoA) aneurysm formation., Methods: Three-dimensional morphologic parameters obtained from computed tomography angiography or digital subtraction angiography from 504 patients with ACoA aneurysms and 201 patients with aneurysms in other locations that were diagnosed at Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016 were evaluated. The presence of hypoplastic and aplastic A1 segments and diameters and angles of surrounding parent and daughter vessels were examined. Univariable and multivariable statistical analyses were performed to determine statistical significance. Sensitivity analyses for small (≤3 mm) aneurysms only were also performed., Results: Aplastic and hypoplastic A1 segments were more common in the ACoA group (38.9% vs. 6.5% hypoplastic and 22.2% vs. 0.5% aplastic). In multivariable analysis, the presence of a hypoplastic A1 segment was associated with ACoA aneurysms. An A2-ACoA (daughter-daughter) angle was also significantly associated with ACoA aneurysms in multivariable analysis; however, as Pearson's correlation test between aneurysm width and daughter-daughter angle was significant, the daughter-daughter angle was most likely not independently associated with aneurysm presence, but rather might have been a result of the presence of an aneurysm. Subgroup analyses of small aneurysms (≤3 mm) and of unruptured aneurysms showed similar results., Conclusions: Our results demonstrate that of all the morphologic parameters, the presence of a hypoplastic A1 segment was the only parameter independently associated with the presence of ACoA aneurysms that was not correlated with aneurysm size and could aid as a simple screening parameter., (Copyright © 2020 Elsevier Inc. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
40. Prediction of severe chest injury using natural language processing from the electronic health record.
- Author
-
Kulshrestha S, Dligach D, Joyce C, Baker MS, Gonzalez R, O'Rourke AP, Glazer JM, Stey A, Kruser JM, Churpek MM, and Afshar M
- Subjects
- Electronic Health Records, Humans, Retrospective Studies, Unified Medical Language System, Natural Language Processing, Thoracic Injuries
- Abstract
Introduction: Trauma injury severity scores are currently calculated retrospectively from the electronic health record (EHR) using manual annotation by certified trauma coders. Natural language processing (NLP) of clinical documents in the EHR may enable automated injury scoring. We hypothesize that NLP with machine learning can discriminate between cases of severe and non-severe injury to the thorax after trauma., Methods: Clinical documents from a trauma center were examined between 2014 and 2018. Severe chest injury was defined as a thorax abbreviated injury score (AIS) >2 and served as the reference standard for supervised learning. Free text unigrams and concept unique identifiers (CUIs) from the Unified Medical Language Systems (UMLS) were extracted from clinical documents collected at one hour, four hours, and eight hours after patient arrival to the emergency department. Logistic regression models with elastic net regularization were tuned to maximize area under the receiver operating characteristic curve (AUROC) using 10-fold cross-validation on the training dataset (80%) and tested on a hold-out 20% dataset., Results: There were 6,891 traumas that met inclusion criteria. The complete data corpus consisted of 473,694 documents. Models trained using the first hour of data had a mean AUROC of 0.88 (95%CI [0.86, 0.89]); model discrimination and reclassification from the first hour significantly improved after eight hours with a mean AUROC of 0.94 (95%CI [0.93, 0.95]). Performance of models using CUIs were similar to unigrams (p>0.05). Models demonstrated excellent clinical face validity., Conclusions: Both CUIs and unigrams demonstrated excellent discrimination in predicting severity of chest injury using the first eight hours of clinical documents. Our model demonstrates that automated anatomical injury scoring is feasible and may be used for aggregation of data for trauma research and quality programs., Competing Interests: Declaration of Competing Interest No competing interests are declared., (Copyright © 2020 Elsevier Ltd. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
41. Morphological variables associated with ruptured basilar tip aneurysms.
- Author
-
Zhang J, Can A, Lai PMR, Mukundan S Jr, Castro VM, Dligach D, Finan S, Gainer VS, Shadick NA, Savova G, Murphy SN, Cai T, Weiss ST, and Du R
- Subjects
- Aged, Aneurysm, Ruptured etiology, Cerebral Angiography, Computed Tomography Angiography, Female, Humans, Image Processing, Computer-Assisted, Imaging, Three-Dimensional, Male, Middle Aged, Risk Factors, Aneurysm, Ruptured diagnostic imaging, Aneurysm, Ruptured pathology, Basilar Artery diagnostic imaging, Basilar Artery pathology, Intracranial Aneurysm diagnostic imaging, Intracranial Aneurysm pathology
- Abstract
Morphological factors of intracranial aneurysms and the surrounding vasculature could affect aneurysm rupture risk in a location specific manner. Our goal was to identify image-based morphological parameters that correlated with ruptured basilar tip aneurysms. Three-dimensional morphological parameters obtained from CT-angiography (CTA) or digital subtraction angiography (DSA) from 200 patients with basilar tip aneurysms diagnosed at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016 were evaluated. We examined aneurysm wall irregularity, the presence of daughter domes, hypoplastic, aplastic or fetal PCoAs, vertebral dominance, maximum height, perpendicular height, width, neck diameter, aspect and size ratio, height/width ratio, and diameters and angles of surrounding parent and daughter vessels. Univariable and multivariable statistical analyses were performed to determine statistical significance. In multivariable analysis, presence of a daughter dome, aspect ratio, and larger flow angle were significantly associated with rupture status. We also introduced two new variables, diameter size ratio and parent-daughter angle ratio, which were both significantly inversely associated with ruptured basilar tip aneurysms. Notably, multivariable analyses also showed that larger diameter size ratio was associated with higher Hunt-Hess score while smaller flow angle was associated with higher Fisher grade. These easily measurable parameters, including a new parameter that is unlikely to be affected by the formation of the aneurysm, could aid in screening strategies in high-risk patients with basilar tip aneurysms. One should note, however, that the changes in parameters related to aneurysm morphology may be secondary to aneurysm rupture rather than causal.
- Published
- 2021
- Full Text
- View/download PDF
42. Pre-training phenotyping classifiers.
- Author
-
Dligach D, Afshar M, and Miller T
- Subjects
- Humans, ROC Curve, Language, Natural Language Processing
- Abstract
Recent transformer-based pre-trained language models have become a de facto standard for many text classification tasks. Nevertheless, their utility in the clinical domain, where classification is often performed at encounter or patient level, is still uncertain due to the limitation on the maximum length of input. In this work, we introduce a self-supervised method for pre-training that relies on a masked token objective and is free from the limitation on the maximum input length. We compare the proposed method with supervised pre-training that uses billing codes as a source of supervision. We evaluate the proposed method on one publicly-available and three in-house datasets using the standard evaluation metrics such as the area under the ROC curve and F1 score. We find that, surprisingly, even though self-supervised pre-training performs slightly worse than supervised, it still preserves most of the gains from pre-training., (Copyright © 2020 Elsevier Inc. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
43. Surrounding vascular geometry associated with basilar tip aneurysm formation.
- Author
-
Zhang J, Can A, Lai PMR, Mukundan S Jr, Castro VM, Dligach D, Finan S, Gainer VS, Shadick NA, Savova G, Murphy SN, Cai T, Weiss ST, and Du R
- Subjects
- Adult, Aged, Angiography, Digital Subtraction, Basilar Artery diagnostic imaging, Computed Tomography Angiography, Female, Humans, Intracranial Aneurysm diagnostic imaging, Male, Middle Aged, Risk, Basilar Artery pathology, Intracranial Aneurysm etiology, Intracranial Aneurysm pathology
- Abstract
Hemodynamic stress is thought to play an important role in the formation of intracranial aneurysms, which is conditioned by the geometry of the surrounding vasculature. Our goal was to identify image-based morphological parameters that were associated with basilar artery tip aneurysms (BTA) in a location-specific manner. Three-dimensional morphological parameters obtained from CT-angiography (CTA) or digital subtraction angiography (DSA) from 207 patients with BTAs and a control group of 106 patients with aneurysms elsewhere to control for non-morphological factors, who were diagnosed at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016, were evaluated. We examined the presence of hypoplastic, aplastic or fetal PCoAs, vertebral dominance, and diameters and angles of surrounding parent and daughter vessels. Univariable and multivariable statistical analyses were performed to determine statistical significance. Sensitivity analyses with small (≤ 3 mm) aneurysms only and with angles excluded, were also performed. In multivariable analysis, daughter-daughter angle was directly, and parent artery diameter and diameter size ratio were inversely associated with BTAs. These results remained significant in the subgroup analysis of small aneurysms (width ≤ 3 mm) and when angles were excluded. These easily measurable and robust parameters that are unlikely to be affected by aneurysm formation could aid in risk stratification for the formation of BTAs in high-risk patients.
- Published
- 2020
- Full Text
- View/download PDF
44. External Validation of an Acute Respiratory Distress Syndrome Prediction Model Using Radiology Reports.
- Author
-
Mayampurath A, Churpek MM, Su X, Shah S, Munroe E, Patel B, Dligach D, and Afshar M
- Subjects
- Academic Medical Centers, Adult, Age Factors, Aged, Female, Hospital Mortality, Hospitals, Urban, Humans, Machine Learning, Male, Middle Aged, Natural Language Processing, Prospective Studies, Reproducibility of Results, Sex Factors, Socioeconomic Factors, Image Processing, Computer-Assisted methods, Intensive Care Units, Radiography, Thoracic methods, Respiratory Distress Syndrome diagnostic imaging, Respiratory Distress Syndrome mortality
- Abstract
Objectives: Acute respiratory distress syndrome is frequently under recognized and associated with increased mortality. Previously, we developed a model that used machine learning and natural language processing of text from radiology reports to identify acute respiratory distress syndrome. The model showed improved performance in diagnosing acute respiratory distress syndrome when compared to a rule-based method. In this study, our objective was to externally validate the natural language processing model in patients from an independent hospital setting., Design: Secondary analysis of data across five prospective clinical studies., Setting: An urban, tertiary care, academic hospital., Patients: Adult patients admitted to the medical ICU and at-risk for acute respiratory distress syndrome., Interventions: None., Measurements and Main Results: The natural language processing model was previously derived and internally validated in burn, trauma, and medical patients at Loyola University Medical Center. Two machine learning models were examined with the following text features from qualifying radiology reports: 1) word representations (n-grams) and 2) standardized clinical named entity mentions mapped from the National Library of Medicine Unified Medical Language System. The models were externally validated in a cohort of 235 patients at the University of Chicago Medicine, among which 110 (47%) were diagnosed with acute respiratory distress syndrome by expert annotation. During external validation, the n-gram model demonstrated good discrimination between acute respiratory distress syndrome and nonacute respiratory distress syndrome patients (C-statistic, 0.78; 95% CI, 0.72-0.84). The n-gram model had a higher discrimination for acute respiratory distress syndrome when compared with the standardized named entity model, although not statistically significant (C-statistic 0.78 vs 0.72; p = 0.09). The most important features in the model had good face validity for acute respiratory distress syndrome characteristics but differences in frequencies did occur between hospital settings., Conclusions: Our computable phenotype for acute respiratory distress syndrome had good discrimination in external validation and may be used by other health systems for case-identification. Discrepancies in feature representation are likely due to differences in characteristics of the patient cohorts.
- Published
- 2020
- Full Text
- View/download PDF
45. Age and morphology of posterior communicating artery aneurysms.
- Author
-
Zhang J, Can A, Lai PMR, Mukundan S Jr, Castro VM, Dligach D, Finan S, Yu S, Gainer VS, Shadick NA, Savova G, Murphy SN, Cai T, Weiss ST, and Du R
- Subjects
- Age Factors, Aged, Aneurysm, Ruptured diagnostic imaging, Cerebral Angiography, Computed Tomography Angiography, Female, Humans, Image Processing, Computer-Assisted, Imaging, Three-Dimensional, Intracranial Aneurysm diagnostic imaging, Male, Middle Aged, Multivariate Analysis, Natural Language Processing, Registries, Retrospective Studies, Risk, Aneurysm, Ruptured physiopathology, Intracranial Aneurysm physiopathology
- Abstract
Risk of intracranial aneurysm rupture could be affected by geometric features of intracranial aneurysms and the surrounding vasculature in a location specific manner. Our goal is to investigate the morphological characteristics associated with ruptured posterior communicating artery (PCoA) aneurysms, as well as patient factors associated with the morphological parameters. Three-dimensional morphological parameters in 409 patients with 432 PCoA aneurysms diagnosed at the Brigham and Women's Hospital and Massachusetts General Hospital between 1990 and 2016 who had available CT angiography (CTA) or digital subtraction angiography (DSA) were evaluated. Morphological parameters examined included aneurysm wall irregularity, presence of a daughter dome, presence of hypoplastic or aplastic A1 arteries and hypoplastic or fetal PCoA, perpendicular height, width, neck diameter, aspect and size ratio, height/width ratio, and diameters and angles of surrounding parent and daughter vessels. Univariable and multivariable statistical analyses were performed to determine the association of morphological parameters with rupture of PCoA aneurysms. Additional analyses were performed to determine the association of patient factors with the morphological parameters. Irregular, multilobed PCoA aneurysms with larger height/width ratios and larger flow angles were associated with ruptured PCoA aneurysms, whereas perpendicular height was inversely associated with rupture in a multivariable model. Older age was associated with lower aspect ratio, with a trend towards lower height/width ratio and smaller flow angle, features that are associated with a lower rupture risk. Morphological parameters are easy to assess and could help in risk stratification in patients with unruptured PCoA aneurysms. PCoA aneurysms diagnosed at older age have morphological features associated with lower risk.
- Published
- 2020
- Full Text
- View/download PDF
46. Validation of an alcohol misuse classifier in hospitalized patients.
- Author
-
To D, Sharma B, Karnik N, Joyce C, Dligach D, and Afshar M
- Subjects
- Adult, Case-Control Studies, Female, Humans, Male, Middle Aged, Reproducibility of Results, Sensitivity and Specificity, Tertiary Care Centers, Alcoholism diagnosis, Electronic Health Records, Inpatients, Natural Language Processing, Supervised Machine Learning
- Abstract
Background: Current modes of identifying alcohol misuse in hospitalized patients rely on self-report questionnaires and diagnostic codes that have limitations, including low sensitivity. Information in the clinical notes of the electronic health record (EHR) may further augment the identification of alcohol misuse. Natural language processing (NLP) with supervised machine learning has been successful at analyzing clinical notes and identifying cases of alcohol misuse in trauma patients., Methods: An alcohol misuse NLP classifier, previously developed on trauma patients who completed the Alcohol Use Disorders Identification Test, was validated in a cohort of 1000 hospitalized patients at a large, tertiary health system between January 1, 2007 and September 1, 2017. The clinical notes were processed using the clinical Text Analysis and Knowledge Extraction System. The National Institute on Alcohol Abuse and Alcoholism (NIAAA) guidelines for alcohol misuse were used during annotation of the medical records in our validation dataset., Results: The alcohol misuse classifier had an area under the receiver operating characteristic curve of 0.91 (95% CI 0.90-0.93) in the cohort of hospitalized patients. The sensitivity, specificity, positive predictive value, and negative predictive value were 0.88 (95% CI 0.85-0.90), 0.78 (95% CI 0.74-0.82), 0.85 (95% CI 0.82-0.87), and 0.82 (95% CI 0.78-0.86), respectively. The Hosmer-Lemeshow Test (p = 0.13) demonstrates good model fit. Additionally, there was a dose-dependent response in alcohol consumption behaviors across increasing strata of predicted probabilities for alcohol misuse., Conclusion: The alcohol misuse NLP classifier had good discrimination and test characteristics in hospitalized patients. An approach using the clinical notes with NLP and supervised machine learning may better identify alcohol misuse cases than conventional methods solely relying on billing diagnostic codes., Competing Interests: Declaration of Competing Interest No conflict of interest to disclose among the authors., (Copyright © 2019 Elsevier Inc. All rights reserved.)
- Published
- 2020
- Full Text
- View/download PDF
47. Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients.
- Author
-
Sharma B, Dligach D, Swope K, Salisbury-Afshar E, Karnik NS, Joyce C, and Afshar M
- Subjects
- Adult, Electronic Health Records, Humans, Inpatients, Medical Records, Unified Medical Language System, Machine Learning, Natural Language Processing, Opioid-Related Disorders diagnosis
- Abstract
Background: Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier., Methods: An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling (n = 1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration., Results: Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs > 0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms 'Heroin' and 'Victim of abuse'., Conclusions: We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns.
- Published
- 2020
- Full Text
- View/download PDF
48. Does BERT need domain adaptation for clinical negation detection?
- Author
-
Lin C, Bethard S, Dligach D, Sadeque F, Savova G, and Miller TA
- Subjects
- Algorithms, Datasets as Topic, Humans, Medical Records, Information Storage and Retrieval methods, Machine Learning, Natural Language Processing, Neural Networks, Computer
- Abstract
Introduction: Classifying whether concepts in an unstructured clinical text are negated is an important unsolved task. New domain adaptation and transfer learning methods can potentially address this issue., Objective: We examine neural unsupervised domain adaptation methods, introducing a novel combination of domain adaptation with transformer-based transfer learning methods to improve negation detection. We also want to better understand the interaction between the widely used bidirectional encoder representations from transformers (BERT) system and domain adaptation methods., Materials and Methods: We use 4 clinical text datasets that are annotated with negation status. We evaluate a neural unsupervised domain adaptation algorithm and BERT, a transformer-based model that is pretrained on massive general text datasets. We develop an extension to BERT that uses domain adversarial training, a neural domain adaptation method that adds an objective to the negation task, that the classifier should not be able to distinguish between instances from 2 different domains., Results: The domain adaptation methods we describe show positive results, but, on average, the best performance is obtained by plain BERT (without the extension). We provide evidence that the gains from BERT are likely not additive with the gains from domain adaptation., Discussion: Our results suggest that, at least for the task of clinical negation detection, BERT subsumes domain adaptation, implying that BERT is already learning very general representations of negation phenomena such that fine-tuning even on a specific corpus does not lead to much overfitting., Conclusion: Despite being trained on nonclinical text, the large training sets of models like BERT lead to large gains in performance for the clinical negation detection task., (© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
- Published
- 2020
- Full Text
- View/download PDF
49. Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies.
- Author
-
Afshar M, Dligach D, Sharma B, Cai X, Boyda J, Birch S, Valdez D, Zelisko S, Joyce C, Modave F, and Price R
- Subjects
- Data Mining methods, Electronic Health Records, Humans, Patient Readmission, Machine Learning, Natural Language Processing, Unified Medical Language System, Vocabulary, Controlled
- Abstract
Objective: Natural language processing (NLP) engines such as the clinical Text Analysis and Knowledge Extraction System are a solution for processing notes for research, but optimizing their performance for a clinical data warehouse remains a challenge. We aim to develop a high throughput NLP architecture using the clinical Text Analysis and Knowledge Extraction System and present a predictive model use case., Materials and Methods: The CDW was comprised of 1 103 038 patients across 10 years. The architecture was constructed using the Hadoop data repository for source data and 3 large-scale symmetric processing servers for NLP. Each named entity mention in a clinical document was mapped to the Unified Medical Language System concept unique identifier (CUI)., Results: The NLP architecture processed 83 867 802 clinical documents in 13.33 days and produced 37 721 886 606 CUIs across 8 standardized medical vocabularies. Performance of the architecture exceeded 500 000 documents per hour across 30 parallel instances of the clinical Text Analysis and Knowledge Extraction System including 10 instances dedicated to documents greater than 20 000 bytes. In a use-case example for predicting 30-day hospital readmission, a CUI-based model had similar discrimination to n-grams with an area under the curve receiver operating characteristic of 0.75 (95% CI, 0.74-0.76)., Discussion and Conclusion: Our health system's high throughput NLP architecture may serve as a benchmark for large-scale clinical research using a CUI-based approach., (© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
- Published
- 2019
- Full Text
- View/download PDF
50. Toward a clinical text encoder: pretraining for clinical natural language processing with applications to substance misuse.
- Author
-
Dligach D, Afshar M, and Miller T
- Subjects
- Big Data, Datasets as Topic, Deep Learning, Electronic Health Records, Humans, Information Storage and Retrieval methods, Algorithms, Clinical Coding methods, Natural Language Processing, Neural Networks, Computer, Substance-Related Disorders classification
- Abstract
Objective: Our objective is to develop algorithms for encoding clinical text into representations that can be used for a variety of phenotyping tasks., Materials and Methods: Obtaining large datasets to take advantage of highly expressive deep learning methods is difficult in clinical natural language processing (NLP). We address this difficulty by pretraining a clinical text encoder on billing code data, which is typically available in abundance. We explore several neural encoder architectures and deploy the text representations obtained from these encoders in the context of clinical text classification tasks. While our ultimate goal is learning a universal clinical text encoder, we also experiment with training a phenotype-specific encoder. A universal encoder would be more practical, but a phenotype-specific encoder could perform better for a specific task., Results: We successfully train several clinical text encoders, establish a new state-of-the-art on comorbidity data, and observe good performance gains on substance misuse data., Discussion: We find that pretraining using billing codes is a promising research direction. The representations generated by this type of pretraining have universal properties, as they are highly beneficial for many phenotyping tasks. Phenotype-specific pretraining is a viable route for trading the generality of the pretrained encoder for better performance on a specific phenotyping task., Conclusions: We successfully applied our approach to many phenotyping tasks. We conclude by discussing potential limitations of our approach., (© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.