12 results on '"clinical text"'
Search Results
2. On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.
- Author
-
Oronoz, Maite, Gojenola, Koldo, Pérez, Alicia, de Ilarraza, Arantza Díaz, and Casillas, Arantza
- Abstract
The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
3. Using large clinical corpora for query expansion in text-based cohort identification.
- Author
-
Zhu, Dongqing, Wu, Stephen, Carterette, Ben, and Liu, Hongfang
- Abstract
Highlights: [•] Demonstrated utility of an in-domain collection (clinical text) for query expansion. [•] Analyzed effect of external collection size on a mixture of relevance models. [•] Any existing query expansion configuration can benefit from an indomain collection. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
4. A controlled greedy supervised approach for co-reference resolution on clinical text.
- Author
-
Chowdhury, Md. Faisal Mahbub and Zweigenbaum, Pierre
- Abstract
Abstract: Identification of co-referent entity mentions inside text has significant importance for other natural language processing (NLP) tasks (e.g. event linking). However, this task, known as co-reference resolution, remains a complex problem, partly because of the confusion over different evaluation metrics and partly because the well-researched existing methodologies do not perform well on new domains such as clinical records. This paper presents a variant of the influential mention-pair model for co-reference resolution. Using a series of linguistically and semantically motivated constraints, the proposed approach controls generation of less-informative/sub-optimal training and test instances. Additionally, the approach also introduces some aggressive greedy strategies in chain clustering. The proposed approach has been tested on the official test corpus of the recently held i2b2/VA 2011 challenge. It achieves an unweighted average F
1 score of 0.895, calculated from multiple evaluation metrics (MUC, B3 and CEAF scores). These results are comparable to the best systems of the challenge. What makes our proposed system distinct is that it also achieves high average F1 scores for each individual chain type (Test: 0.897, Person: 0.852, Problem: 0.855, Treatment: 0.884). Unlike other works, it obtains good scores for each of the individual metrics rather than being biased towards a particular metric. [Copyright &y& Elsevier]- Published
- 2013
- Full Text
- View/download PDF
5. UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text.
- Author
-
Demner-Fushman, Dina, Mork, James G., Shooshan, Sonya E., and Aronson, Alan R.
- Abstract
Abstract: Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients’ problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text. We found that suppression of highly ambiguous terms in the conservative AutoFilter content view can partially replace manual filtering for literature applications, and suppression of two character mappings in the same content view achieves 89.5% precision at 78.6% recall for clinical applications. [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
6. Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries
- Author
-
Ghada Alfattni, Niels Peek, and Goran Nenadic
- Subjects
Relation (database) ,Computer science ,Health Informatics ,Context (language use) ,Discharge summaries ,computer.software_genre ,NLP ,Field (computer science) ,TLINKs ,Humans ,Language ,Natural Language Processing ,business.industry ,Deep learning ,Clinical text ,Relationship extraction ,Patient Discharge ,Semantics ,Computer Science Applications ,Memory, Short-Term ,Artificial intelligence ,Source text ,business ,computer ,Natural language processing ,Word (computer architecture) ,Sentence - Abstract
Temporal relation extraction between health-related events is a widely studied task in clinical Natural Language Processing (NLP). The current state-of-the-art methods mostly rely on engineered features (i.e., rule-based modelling) and sequence modelling, which often encodes a source sentence into a single fixed-length context. An obvious disadvantage of this fixed-length context design is its incapability to model longer sentences, as important temporal information in the clinical text may appear at different positions. To address this issue, we propose an Attention-based Bidirectional Long Short-Term Memory (Att-BiLSTM) model to enable learning the important semantic information in long source text segments and to better determine which parts of the text are most important. We experimented with two embeddings and compared the performances to traditional state-of-the-art methods that require elaborate linguistic pre-processing and hand-engineered features. The experimental results on the i2b2 2012 temporal relation test corpus show that the proposed method achieves a significant improvement with an F-score of 0.811, which is at least 10% better than state-of-the-art in the field. We show that the model can be remarkably effective at classifying temporal relations when provided with word embeddings trained on corpora in a general domain. Finally, we perform an error analysis to gain insight into the common errors made by the model.
- Published
- 2021
- Full Text
- View/download PDF
7. Building a semantically annotated corpus of clinical texts.
- Author
-
Roberts, Angus, Gaizauskas, Robert, Hepple, Mark, Demetriou, George, Guo, Yikun, Roberts, Ian, and Setzer, Andrea
- Abstract
Abstract: In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains. [Copyright &y& Elsevier]
- Published
- 2009
- Full Text
- View/download PDF
8. Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries.
- Author
-
Alfattni, Ghada, Peek, Niels, and Nenadic, Goran
- Abstract
Temporal relation extraction between health-related events is a widely studied task in clinical Natural Language Processing (NLP). The current state-of-the-art methods mostly rely on engineered features (i.e., rule-based modelling) and sequence modelling, which often encodes a source sentence into a single fixed-length context. An obvious disadvantage of this fixed-length context design is its incapability to model longer sentences, as important temporal information in the clinical text may appear at different positions. To address this issue, we propose an Attention-based Bidirectional Long Short-Term Memory (Att-BiLSTM) model to enable learning the important semantic information in long source text segments and to better determine which parts of the text are most important. We experimented with two embeddings and compared the performances to traditional state-of-the-art methods that require elaborate linguistic pre-processing and hand-engineered features. The experimental results on the i2b2 2012 temporal relation test corpus show that the proposed method achieves a significant improvement with an F-score of 0.811, which is at least 10% better than state-of-the-art in the field. We show that the model can be remarkably effective at classifying temporal relations when provided with word embeddings trained on corpora in a general domain. Finally, we perform an error analysis to gain insight into the common errors made by the model. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. Extracting and classifying diagnosis dates from clinical notes: A case study.
- Author
-
Fu, Julia T., Sholle, Evan, Krichevsky, Spencer, Scandura, Joseph, and Campion, Thomas R.
- Abstract
Myeloproliferative neoplasms (MPNs) are chronic hematologic malignancies that may progress over long disease courses. The original date of diagnosis is an important piece of information for patient care and research, but is not consistently documented. We describe an attempt to build a pipeline for extracting dates with natural language processing (NLP) tools and techniques and classifying them as relevant diagnoses or not. Inaccurate and incomplete date extraction and interpretation impacted the performance of the overall pipeline. Existing lightweight Python packages tended to have low specificity for identifying and interpreting partial and relative dates in clinical text. A rules-based regular expression (regex) approach achieved recall of 83.0% on dates manually annotated as diagnosis dates, and 77.4% on all annotated dates. With only 3.8% of annotated dates representing initial MPN diagnoses, additional methods of targeting candidate date instances may alleviate noise and class imbalance. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. An enhanced CRFs-based system for information extraction from radiology reports
- Author
-
Andrea Esuli, Diego Marcheggiani, and Fabrizio Sebastiani
- Subjects
Conditional random field ,medicine.medical_specialty ,Information extraction ,Computer science ,Health Informatics ,Conditional random fields ,computer.software_genre ,Machine learning ,Domain (software engineering) ,Task (project management) ,Annotation ,medicine ,Data Mining ,Computer Simulation ,CRFS ,business.industry ,Supervised learning ,Clinical text ,Computer Science Applications ,Feature (computer vision) ,Artificial intelligence ,Radiology ,business ,computer ,Natural language processing - Abstract
We discuss the problem of performing information extraction from free-text radiology reports via supervised learning. In this task, segments of text (not necessarily coinciding with entire sentences, and possibly crossing sentence boundaries) need to be annotated with tags representing concepts of interest in the radiological domain. In this paper we present two novel approaches to IE for radiology reports: (i) a cascaded, two-stage method based on pipelining two taggers generated via the well known linear-chain conditional random fields (LC-CRFs) learner and (ii) a confidence-weighted ensemble method that combines standard LC-CRFs and the proposed two-stage method. We also report on the use of “positional features”, a novel type of feature intended to aid in the automatic annotation of texts in which the instances of a given concept may be hypothesized to systematically occur in specific areas of the text. We present experiments on a dataset of mammography reports in which the proposed ensemble is shown to outperform a traditional, single-stage CRFs system in two different, applicatively interesting scenarios.
- Published
- 2012
- Full Text
- View/download PDF
11. Building a semantically annotated corpus of clinical texts
- Author
-
Angus Roberts, Mark Hepple, Yikun Guo, Robert Gaizauskas, George Demetriou, Ian Roberts, and Andrea Setzer
- Subjects
Scheme (programming language) ,Text corpus ,Biomedical Research ,Information extraction ,Text mining ,Computer science ,Abstracting and Indexing ,Information Storage and Retrieval ,Guidelines as Topic ,Health Informatics ,Temporal annotation ,computer.software_genre ,Semantics ,Medical Records ,Annotation ,User-Computer Interface ,Text processing ,Corpora ,Component (UML) ,Neoplasms ,Terminology as Topic ,Humans ,Evaluation ,computer.programming_language ,Internet ,Information retrieval ,Models, Statistical ,Semantic annotation ,business.industry ,Natural language processing ,Clinical text ,Computer Science Applications ,Artificial intelligence ,business ,computer ,Gold standards ,Annotation guidelines - Abstract
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains.
- Full Text
- View/download PDF
12. UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text
- Author
-
Alan R. Aronson, Sonya E. Shooshan, James G. Mork, and Dina Demner-Fushman
- Subjects
Biomedical knowledge ,Metathesaurus ,Computer science ,Content views ,Information Storage and Retrieval ,Health Informatics ,computer.software_genre ,Article ,UMLS ,Text messaging ,Natural Language Processing ,Information retrieval ,Recall ,Character (computing) ,business.industry ,Unified Medical Language System ,Search engine indexing ,Publications ,Clinical text ,Computer Science Applications ,Identification (information) ,Automatic indexing ,Indexing ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients’ problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text. We found that suppression of highly ambiguous terms in the conservative AutoFilter content view can partially replace manual filtering for literature applications, and suppression of two character mappings in the same content view achieves 89.5% precision at 78.6% recall for clinical applications.
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.