Descriptor: "clinical text" / Topic: business - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"clinical text"' showing total 19 results

Start Over Descriptor "clinical text" Topic business

19 results on '"clinical text"'

1. Developing a Clinical Language Model for Swedish : Continued Pretraining of Generic BERT with In-Domain Data

Author: Anastasios Lamproudis, Aron Henriksson, and Hercules Dalianis
Subjects: Vocabulary, Computer and Information Sciences, Downstream (software development), Computer science, business.industry, media_common.quotation_subject, Data- och informationsvetenskap, computer.software_genre, Task (project management), Domain (software engineering), clinical text, language models, Added value, Language model, Diagnosis code, Artificial intelligence, natural language processing, business, computer, Natural language processing, Protected health information, media_common
Abstract: The use of pretrained language models, fine-tuned to perform a specific downstream task, has become widespread in NLP. Using a generic language model in specialized domains may, however, be sub-optimal due to differences in language use and vocabulary. In this paper, it is investigated whether an existing, generic language model for Swedish can be improved for the clinical domain through continued pretraining with clinical text. The generic and domain-specific language models are fine-tuned and evaluated on three representative clinical NLP tasks: (i) identifying protected health information, (ii) assigning ICD-10 diagnosis codes to discharge summaries, and (iii) sentence-level uncertainty prediction. The results show that continued pretraining on in-domain data leads to improved performance on all three downstream tasks, indicating that there is a potential added value of domain-specific language models for clinical NLP.
Published: 2021

2. Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis

Author: Barry Devereux, Jesus Martinez del Rincon, and Mark Ormerod
Subjects: transformer models, Computer science, Computer applications to medicine. Medical informatics, R858-859.7, Health Informatics, 02 engineering and technology, computer.software_genre, 03 medical and health sciences, representation learning, Health Information Management, Semantic similarity, Similarity (psychology), 0202 electrical engineering, electronic engineering, information engineering, natural language processing, Representation (mathematics), 030304 developmental biology, Transformer (machine learning model), 0303 health sciences, Original Paper, business.industry, Deep learning, clinical text, Test set, 020201 artificial intelligence & image processing, Artificial intelligence, biomedical NLP, business, computer, Feature learning, Natural language processing, Sentence
Abstract: Background Semantic textual similarity (STS) is a natural language processing (NLP) task that involves assigning a similarity score to 2 snippets of text based on their meaning. This task is particularly difficult in the domain of clinical text, which often features specialized language and the frequent use of abbreviations. Objective We created an NLP system to predict similarity scores for sentence pairs as part of the Clinical Semantic Textual Similarity track in the 2019 n2c2/OHNLP Shared Task on Challenges in Natural Language Processing for Clinical Data. We subsequently sought to analyze the intermediary token vectors extracted from our models while processing a pair of clinical sentences to identify where and how representations of semantic similarity are built in transformer models. Methods Given a clinical sentence pair, we take the average predicted similarity score across several independently fine-tuned transformers. In our model analysis we investigated the relationship between the final model’s loss and surface features of the sentence pairs and assessed the decodability and representational similarity of the token vectors generated by each model. Results Our model achieved a correlation of 0.87 with the ground-truth similarity score, reaching 6th place out of 33 teams (with a first-place score of 0.90). In detailed qualitative and quantitative analyses of the model’s loss, we identified the system’s failure to correctly model semantic similarity when both sentence pairs contain details of medical prescriptions, as well as its general tendency to overpredict semantic similarity given significant token overlap. The token vector analysis revealed divergent representational strategies for predicting textual similarity between bidirectional encoder representations from transformers (BERT)–style models and XLNet. We also found that a large amount information relevant to predicting STS can be captured using a combination of a classification token and the cosine distance between sentence-pair representations in the first layer of a transformer model that did not produce the best predictions on the test set. Conclusions We designed and trained a system that uses state-of-the-art NLP models to achieve very competitive results on a new clinical STS data set. As our approach uses no hand-crafted rules, it serves as a strong deep learning baseline for this task. Our key contribution is a detailed analysis of the model’s outputs and an investigation of the heuristic biases learned by transformer models. We suggest future improvements based on these findings. In our representational analysis we explore how different transformer models converge or diverge in their representation of semantic signals as the tokens of the sentences are augmented by successive layers. This analysis sheds light on how these “black box” models integrate semantic similarity information in intermediate layers, and points to new research directions in model distillation and sentence embedding extraction for applications in clinical NLP.
Published: 2020
Full Text: View/download PDF

3. Incorporating Domain Knowledge into Natural Language Inference on Clinical Texts

Author: Yu Fang, Fengqi Yan, Maozhen Li, and Mingming Lu
Subjects: General Computer Science, Computer science, Inference, Attention mechanism, 02 engineering and technology, computer.software_genre, Logical consequence, Task (project management), medical domain knowledge, Natural language inference, 0202 electrical engineering, electronic engineering, information engineering, Open domain, General Materials Science, word representation, Class (computer programming), business.industry, General Engineering, 020206 networking & telecommunications, clinical text, natural language inference, Domain knowledge, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, computer, lcsh:TK1-9971, Natural language processing, Word (computer architecture)
Abstract: Making inference on clinical texts is a task which has not been fully studied. With the newly released, expert annotated MedNLI dataset, this task is being boosted. Compared with open domain data, clinical texts present unique linguistic phenomena, e.g., a large number of medical terms and abbreviations, different written forms for the same medical concept, which make inference much harder. Incorporating domain-specific knowledge is a way to eliminate this problem, in this paper, we assemble a new incorporating medical concept definitions module on the classic enhanced sequential inference model (ESIM), which first extracts the most relevant medical concept for each word, if it exists, then encodes the definition of this medical concept with a bidirectional long short-term network (BiLSTM) to obtain domain-specific definition representations, and attends these definition representations over vanilla word embeddings. The empirical evaluations are conducted to demonstrate that our model improves the prediction performance and achieves a high level of accuracy on the MedNLI dataset. Specifically, the knowledge enhanced word representations contribute significantly to entailment class. Institute of Electrical and Electronics Engineers
Published: 2019

4. Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries

Author: Ghada Alfattni, Niels Peek, and Goran Nenadic
Subjects: Relation (database), Computer science, Health Informatics, Context (language use), Discharge summaries, computer.software_genre, NLP, Field (computer science), TLINKs, Humans, Language, Natural Language Processing, business.industry, Deep learning, Clinical text, Relationship extraction, Patient Discharge, Semantics, Computer Science Applications, Memory, Short-Term, Artificial intelligence, Source text, business, computer, Natural language processing, Word (computer architecture), Sentence
Abstract: Temporal relation extraction between health-related events is a widely studied task in clinical Natural Language Processing (NLP). The current state-of-the-art methods mostly rely on engineered features (i.e., rule-based modelling) and sequence modelling, which often encodes a source sentence into a single fixed-length context. An obvious disadvantage of this fixed-length context design is its incapability to model longer sentences, as important temporal information in the clinical text may appear at different positions. To address this issue, we propose an Attention-based Bidirectional Long Short-Term Memory (Att-BiLSTM) model to enable learning the important semantic information in long source text segments and to better determine which parts of the text are most important. We experimented with two embeddings and compared the performances to traditional state-of-the-art methods that require elaborate linguistic pre-processing and hand-engineered features. The experimental results on the i2b2 2012 temporal relation test corpus show that the proposed method achieves a significant improvement with an F-score of 0.811, which is at least 10% better than state-of-the-art in the field. We show that the model can be remarkably effective at classifying temporal relations when provided with word embeddings trained on corpora in a general domain. Finally, we perform an error analysis to gain insight into the common errors made by the model.
Published: 2021
Full Text: View/download PDF

5. The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study

Author: Paloma Martínez and Renzo M. Rivera Zavala
Subjects: Conditional random field, Word embedding, Computer science, Computer applications to medicine. Medical informatics, R858-859.7, Health Informatics, 02 engineering and technology, computer.software_genre, 03 medical and health sciences, Health Information Management, Named-entity recognition, Negation, Truth value, 0202 electrical engineering, electronic engineering, information engineering, natural language processing, 030304 developmental biology, Original Paper, contextual information, 0303 health sciences, business.industry, Deep learning, deep learning, clinical text, Information extraction, 020201 artificial intelligence & image processing, Artificial intelligence, Language model, long short-term memory, business, computer, Natural language processing
Abstract: Background Negation and speculation are critical elements in natural language processing (NLP)-related tasks, such as information extraction, as these phenomena change the truth value of a proposition. In the clinical narrative that is informal, these linguistic facts are used extensively with the objective of indicating hypotheses, impressions, or negative findings. Previous state-of-the-art approaches addressed negation and speculation detection tasks using rule-based methods, but in the last few years, models based on machine learning and deep learning exploiting morphological, syntactic, and semantic features represented as spare and dense vectors have emerged. However, although such methods of named entity recognition (NER) employ a broad set of features, they are limited to existing pretrained models for a specific domain or language. Objective As a fundamental subsystem of any information extraction pipeline, a system for cross-lingual and domain-independent negation and speculation detection was introduced with special focus on the biomedical scientific literature and clinical narrative. In this work, detection of negation and speculation was considered as a sequence-labeling task where cues and the scopes of both phenomena are recognized as a sequence of nested labels recognized in a single step. Methods We proposed the following two approaches for negation and speculation detection: (1) bidirectional long short-term memory (Bi-LSTM) and conditional random field using character, word, and sense embeddings to deal with the extraction of semantic, syntactic, and contextual patterns and (2) bidirectional encoder representations for transformers (BERT) with fine tuning for NER. Results The approach was evaluated for English and Spanish languages on biomedical and review text, particularly with the BioScope corpus, IULA corpus, and SFU Spanish Review corpus, with F-measures of 86.6%, 85.0%, and 88.1%, respectively, for NeuroNER and 86.4%, 80.8%, and 91.7%, respectively, for BERT. Conclusions These results show that these architectures perform considerably better than the previous rule-based and conventional machine learning–based systems. Moreover, our analysis results show that pretrained word embedding and particularly contextualized embedding for biomedical corpora help to understand complexities inherent to biomedical text.
Published: 2020
Full Text: View/download PDF

6. Temporal Expression Classification and Normalization From Chinese Narrative Clinical Texts: Pattern Learning Approach

Author: Yongyi Gong, Boyu Chen, Yingying Qu, Heng Weng, and Xiaoyi Pan
Subjects: Normalization (statistics), Computer science, Computer applications to medicine. Medical informatics, R858-859.7, Health Informatics, Temporal expression extraction, computer.software_genre, 03 medical and health sciences, 0302 clinical medicine, Heuristic rule, Health Information Management, Machine learning, Narrative, 030212 general & internal medicine, Pattern learning, Discharge summary, Temporal information, 030304 developmental biology, Original Paper, 0303 health sciences, Recall, business.industry, Temporal expression normalization, Clinical text, Artificial intelligence, Disease progress, F1 score, business, computer, Natural language processing
Abstract: Background Temporal information frequently exists in the representation of the disease progress, prescription, medication, surgery progress, or discharge summary in narrative clinical text. The accurate extraction and normalization of temporal expressions can positively boost the analysis and understanding of narrative clinical texts to promote clinical research and practice. Objective The goal of the study was to propose a novel approach for extracting and normalizing temporal expressions from Chinese narrative clinical text. Methods TNorm, a rule-based and pattern learning-based approach, has been developed for automatic temporal expression extraction and normalization from unstructured Chinese clinical text data. TNorm consists of three stages: extraction, classification, and normalization. It applies a set of heuristic rules and automatically generated patterns for temporal expression identification and extraction of clinical texts. Then, it collects the features of extracted temporal expressions for temporal type prediction and classification by using machine learning algorithms. Finally, the features are combined with the rule-based and a pattern learning-based approach to normalize the extracted temporal expressions. Results The evaluation dataset is a set of narrative clinical texts in Chinese containing 1459 discharge summaries of a domestic Grade A Class 3 hospital. The results show that TNorm, combined with temporal expressions extraction and temporal types prediction, achieves a precision of 0.8491, a recall of 0.8328, and a F1 score of 0.8409 in temporal expressions normalization. Conclusions This study illustrates an automatic approach, TNorm, that extracts and normalizes temporal expression from Chinese narrative clinical texts. TNorm was evaluated on the basis of discharge summary data, and results demonstrate its effectiveness on temporal expression normalization.
Published: 2020
Full Text: View/download PDF

7. A Pattern-Based Method for Medical Entity Recognition From Chinese Diagnostic Imaging Text

Author: Tianyong Hao, Junjie Chen, Zhaopeng Xu, Yuyang Chen, and Zihong Liang
Subjects: Relation (database), Computer science, business.industry, computer.software_genre, pattern-based strategy, clinical text, Set (abstract data type), Information extraction, Identification (information), Named-entity recognition, Artificial Intelligence, Medical imaging, Methods, Regular expression, Artificial intelligence, medical named entity recognition, information extraction, natural language processing, F1 score, business, computer, Natural language processing
Abstract: Background: The identification of medical entities and relations from electronic medical records is a fundamental research issue for medical informatics. However, the task of extracting valuable knowledge from these records is challenging due to its high complexity. The accurate identification of entity and relation is still an open research problem in medical information extraction. Methods: A pattern-based method for extracting certain tumor-related entities and attributes from Chinese unstructured diagnostic imaging text is proposed. This method is a composition of three steps. Firstly, an algorithm based on keyword matching is designed to obtain the primary sites of tumors. Then a set of regular expressions is applied to identify primary tumor size information. Finally, a set of rules is defined to acquire metastatic sites of tumors. Results: Our method achieves a recall of 0.697, a precision of 0.825 and an F1 score of 0.755 using an overall weighted metric. For each of the extraction tasks, the F1 scores are 0.784, 0.822 and 0.740. Conclusions: The method proves to be stable and robust with different amounts of testing data. It achieves a comparatively high performance in the CHIP 2018 open challenge, demonstrating its effectiveness in extracting tumor-related entities from Chinese diagnostic imaging text.
Published: 2019

8. An efficient prototype method to identify and correct misspellings in clinical text

Author: Yijun Shao, T. Elizabeth Workman, Qing Zeng-Treitler, and Guy Divita
Subjects: 0301 basic medicine, Research Report, Medical Records Systems, Computerized, Computer science, Pathology, Surgical, lcsh:Medicine, Dictionaries as Topic, computer.software_genre, General Biochemistry, Genetics and Molecular Biology, 03 medical and health sciences, Spelling analysis, 0302 clinical medicine, Error analysis, Text messaging, False positive paradox, Humans, Word2vec, Word2Vec, 030212 general & internal medicine, lcsh:Science (General), lcsh:QH301-705.5, Language, Natural Language Processing, business.industry, lcsh:R, Reproducibility of Results, General Medicine, Emergency department, Clinical text, Unified Medical Language System, Spelling, Term (time), Research Note, 030104 developmental biology, lcsh:Biology (General), Vocabulary, Controlled, Word embeddings, Edit distance, Artificial intelligence, business, computer, Spelling correction, Natural language processing, Algorithms, Medical Informatics, lcsh:Q1-390
Abstract: Objective Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications. Results In this small-scale study utilizing a total of 76,786 clinical notes, the prototype method achieved positive predictive values of 0.9057 and 0.8979, respectively, for the surgical pathology reports, and emergency department progress and visit notes, in identifying and correcting misspelled words. False positives varied by corpus. Spelling error types were similar among the two corpora, however, the authors of emergency department progress and visit notes made over four times as many errors. Overall, the results of this study suggest that this method could also perform sufficiently in identifying misspellings in other clinical document types. Electronic supplementary material The online version of this article (10.1186/s13104-019-4073-y) contains supplementary material, which is available to authorized users.
Published: 2019

9. Extracting Clinical Relations in Electronic Health Records Using Enriched Parse Trees

Author: Yoonsuck Choe, Jisung Kim, and Klaus Mueller
Subjects: Parsing, Semantic feature, business.industry, Computer science, relation extraction, Feature vector, Parse tree, computer.software_genre, Relationship extraction, clinical text, Feature (linguistics), General Earth and Planetary Sciences, Artificial intelligence, natural language processing, Tree kernel, F1 score, business, computer, support vector machine convolution tree kernel, Natural language processing, General Environmental Science
Abstract: Integrating semantic features into parse trees is an active research topic in open-domain natural language processing (NLP). We study six different parse tree structures enriched with various semantic features for determining entity relations in clinical notes using a tree kernel-based relation extraction system. We used the relation extraction task definition and the dataset from the popular 2010 i2b2/VA challenge for our evaluation. We found that the parse tree structure enriched with entity type suffixes resulted in the highest F1 score of 0.7725 and was the fastest. In terms of reducing the number of feature vectors in trained models, the entity type feature was most effective among the semantic features while adding semantic feature node was better than adding feature suffixes to the labels. Our study demonstrates that parse tree enhancements with semantic features are effective for clinical relation extraction.
Published: 2015
Full Text: View/download PDF

10. À la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge

Author: Colin Cherry, Berry de Bruijn, Joel Martin, and Xiaodan Zhu
Subjects: Patient Discharge Summaries, Post hoc, Computer science, temporal analysis, relation extraction, Information Storage and Retrieval, Health Informatics, 02 engineering and technology, data extraction, computer.software_genre, medical record, Research and Applications, Task (project management), Time, Translational Research, Biomedical, 03 medical and health sciences, 0302 clinical medicine, Artificial Intelligence, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Hospital discharge, Electronic Health Records, Humans, 030212 general & internal medicine, information extraction, natural language processing, semantics, Recall, accuracy, temporal reasoning, business.industry, medical specialist, prediction, Relationship extraction, clinical text, hospital discharge, Information extraction, machine learning, Artificial intelligence, F1 score, business, computer, Natural language processing
Abstract: Objective: An analysis of the timing of events is critical for a deeper understanding of the course of events within a patient record. The 2012 i2b2 NLP challenge focused on the extraction of temporal relationships between concepts within textual hospital discharge summaries. Materials and methods: The team from the National Research Council Canada (NRC) submitted three system runs to the second track of the challenge: typifying the time-relationship between pre-annotated entities. The NRC system was designed around four specialist modules containing statistical machine learning classifiers. Each specialist targeted distinct sets of relationships: local relationships, 'sectime'-type relationships, non-local overlap-type relationships, and non-local causal relationships. Results: The best NRC submission achieved a precision of 0.7499, a recall of 0.6431, and an F1 score of 0.6924, resulting in a statistical tie for first place. Post hoc improvements led to a precision of 0.7537, a recall of 0.6455, and an F1 score of 0.6954, giving the highest scores reported on this task to date. Discussion and conclusions: Methods for general relation extraction extended well to temporal relations, and gave top-ranked state-of-the-art results. Careful ordering of predictions within result sets proved critical to this success.
Published: 2013

11. Expanding a dictionary of marker words for uncertainty and negation using distributional semantics

Author: Aron Henriksson, Carita Paradis, Roza Baskalayci, Andreas Kerren, Alyaa Alfalahi, Maria Skeppstedt, Rickard Ahlbom, Lars Asker, Grouin, Cyril, Hamon, Thierry, Névéol, Aurélie, and Zweigenbaum, Pierre
Subjects: Information retrieval, Computer and Information Science, business.industry, Computer science, marker words, Statistical semantics, dictionary expansion, computer.software_genre, clinical text, Ranking (information retrieval), Negation, distributional semantics, negation, Artificial intelligence, Distributional semantics, uncertainty, business, computer, Systemvetenskap, informationssystem och informatik, Natural language processing, Word (group theory), Information Systems
Abstract: Approaches to determining the factuality of diagnoses and findings in clinical text tend to rely on dictionaries of marker words for uncertainty and negation. Here, a method for semi-automatically expanding a dictionary of marker words using distributional semantics is presented and evaluated. It is shown that ranking candidates for inclusion according to their proximity to cluster centroids of semantically similar seed words is more successful than ranking them according to proximity to each individual seed word.
Published: 2015

12. Parsing Clinical Text

Author: Jungwei Fan, Hua Xu, Buzhou Tang, Yang Huang, Joshua C. Denny, and Min Jiang
Subjects: 020205 medical informatics, Computer science, Treebank, Health Informatics, parsing, 02 engineering and technology, computer.software_genre, Top-down parsing, NLP, Domain (software engineering), Medical language processing, 03 medical and health sciences, 0302 clinical medicine, Parser combinator, 0202 electrical engineering, electronic engineering, information engineering, Humans, 030212 general & internal medicine, natural language processing, Bracketing, Parsing, LR parser, business.industry, Programming language, Health Policy, Parse tree, Linguistics, Biomedical text mining, 3. Good health, Computer Science Applications, clinical text, TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES, Artificial intelligence, State (computer science), Software_PROGRAMMINGLANGUAGES, business, computer, Sentence, Medical Informatics, Natural language processing, Research Article, Bottom-up parsing
Abstract: Background Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain. Methods In this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank. Results Our results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus. Conclusions Our study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text.
Published: 2014
Full Text: View/download PDF

13. A De-identification method for bilingual clinical texts of various note types

Author: Hyo Joung Choi, Jae-Ho Lee, Jihyun Park, Yongman Lyu, Moo Song Lee, Yongdon Shin, Woo Sung Kim, Chang-Min Choi, Yu Rang Park, and Soo-Yong Shin
Subjects: Bilingual Text, Text Mining, Anonymization, Multilingualism, computer.software_genre, Health informatics, Informed consent, Data Anonymization, Medicine, Electronic Health Records, Humans, De-identification, Narrative, Regular expression, Clinical Text, Natural Language Processing, Data anonymization, business.industry, General Medicine, Identifier, Health Records, Personal, Research Design, Patient Privacy, Original Article, Artificial intelligence, business, computer, Natural language processing, Algorithms, Medical Informatics
Abstract: De-identification of personal health information is essential in order not to require written patient informed consent. Previous de-identification methods were proposed using natural language processing technology in order to remove the identifiers in clinical narrative text, although these methods only focused on narrative text written in English. In this study, we propose a regular expression-based de-identification method used to address bilingual clinical records written in Korean and English. To develop and validate regular expression rules, we obtained training and validation datasets composed of 6,039 clinical notes of 20 types and 5,000 notes of 33 types, respectively. Fifteen regular expression rules were constructed using the development dataset and those rules achieved 99.87% precision and 96.25% recall for the validation dataset. Our de-identification method successfully removed the identifiers in diverse types of bilingual clinical narrative texts. This method will thus assist physicians to more easily perform retrospective research. Graphical Abstract
Published: 2014

14. An enhanced CRFs-based system for information extraction from radiology reports

Author: Andrea Esuli, Diego Marcheggiani, and Fabrizio Sebastiani
Subjects: Conditional random field, medicine.medical_specialty, Information extraction, Computer science, Health Informatics, Conditional random fields, computer.software_genre, Machine learning, Domain (software engineering), Task (project management), Annotation, medicine, Data Mining, Computer Simulation, CRFS, business.industry, Supervised learning, Clinical text, Computer Science Applications, Feature (computer vision), Artificial intelligence, Radiology, business, computer, Natural language processing
Abstract: We discuss the problem of performing information extraction from free-text radiology reports via supervised learning. In this task, segments of text (not necessarily coinciding with entire sentences, and possibly crossing sentence boundaries) need to be annotated with tags representing concepts of interest in the radiological domain. In this paper we present two novel approaches to IE for radiology reports: (i) a cascaded, two-stage method based on pipelining two taggers generated via the well known linear-chain conditional random fields (LC-CRFs) learner and (ii) a confidence-weighted ensemble method that combines standard LC-CRFs and the proposed two-stage method. We also report on the use of “positional features”, a novel type of feature intended to aid in the automatic annotation of texts in which the instances of a given concept may be hypothesized to systematically occur in specific areas of the text. We present experiments on a dataset of mammography reports in which the proposed ensemble is shown to outperform a traditional, single-stage CRFs system in two different, applicatively interesting scenarios.
Published: 2012
Full Text: View/download PDF

15. Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus

Author: Jackie Cassell, John M. Carroll, Aleksandar Savkov, and Rob Koeling
Subjects: 0301 basic medicine, Linguistics and Language, Computer science, media_common.quotation_subject, Corpus annotation, Temporal annotation, Library and Information Sciences, computer.software_genre, Language and Linguistics, Education, 03 medical and health sciences, Annotation, 0302 clinical medicine, Text mining, Chunking (psychology), 030212 general & internal medicine, media_common, Original Paper, Information retrieval, business.industry, P0098, Named entities, Clinical text, R1, Punctuation, Syntax, Spelling, Named entity, Information extraction, Chunking, 030104 developmental biology, Artificial intelligence, Computational linguistics, business, computer, Natural language processing, Word order, Annotation guidelines
Abstract: The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.
Full Text: View/download PDF

16. Building a semantically annotated corpus of clinical texts

Author: Angus Roberts, Mark Hepple, Yikun Guo, Robert Gaizauskas, George Demetriou, Ian Roberts, and Andrea Setzer
Subjects: Scheme (programming language), Text corpus, Biomedical Research, Information extraction, Text mining, Computer science, Abstracting and Indexing, Information Storage and Retrieval, Guidelines as Topic, Health Informatics, Temporal annotation, computer.software_genre, Semantics, Medical Records, Annotation, User-Computer Interface, Text processing, Corpora, Component (UML), Neoplasms, Terminology as Topic, Humans, Evaluation, computer.programming_language, Internet, Information retrieval, Models, Statistical, Semantic annotation, business.industry, Natural language processing, Clinical text, Computer Science Applications, Artificial intelligence, business, computer, Gold standards, Annotation guidelines
Abstract: In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains.
Full Text: View/download PDF

17. UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text

Author: Alan R. Aronson, Sonya E. Shooshan, James G. Mork, and Dina Demner-Fushman
Subjects: Biomedical knowledge, Metathesaurus, Computer science, Content views, Information Storage and Retrieval, Health Informatics, computer.software_genre, Article, UMLS, Text messaging, Natural Language Processing, Information retrieval, Recall, Character (computing), business.industry, Unified Medical Language System, Search engine indexing, Publications, Clinical text, Computer Science Applications, Identification (information), Automatic indexing, Indexing, Artificial intelligence, business, computer, Natural language processing
Abstract: Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients’ problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text. We found that suppression of highly ambiguous terms in the conservative AutoFilter content view can partially replace manual filtering for literature applications, and suppression of two character mappings in the same content view achieves 89.5% precision at 78.6% recall for clinical applications.
Full Text: View/download PDF

18. Terminology expansion with prototype embeddings: Extracting symptoms of urinary tract infection from clinical text

Author: Aron Henriksson, Hideyuki Tanushi, Emil Thiman, Mahbub Ul Alam, Pontus Naucler, and Hercules Dalianis
Subjects: medicine.medical_specialty, business.industry, Computer Sciences, Urinary system, Terminologies, Synonym Extraction, Word Embeddings, urologic and male genital diseases, Dermatology, female genital diseases and pregnancy complications, Terminology, Datavetenskap (datalogi), Medicine, business, Systemvetenskap, informationssystem och informatik, Clinical Text, Information Systems, Natural Language Processing
Abstract: Many natural language processing applications rely on the availability of domain-specific terminologies containing synonyms. To that end, semi-automatic methods for extracting additional synonyms of a given concept from corpora are useful, especially in low-resource domains and noisy genres such as clinical text, where nonstandard language use and misspellings are prevalent. In this study, prototype embeddings based on seed words were used to create representations for (i) specific urinary tract infection (UTI) symptoms and (ii) UTI symptoms in general. Four word embedding methods and two phrase detection methods were evaluated using clinical data from Karolinska University Hospital. It is shown that prototype embeddings can effectively capture semantic information related to UTI symptoms. Using prototype embeddings for specific UTI symptoms led to the extraction of more symptom terms compared to using prototype embeddings for UTI symptoms in general. Overall, 142 additional UTI symp tom terms were identified, yielding a more than 100% increment compared to the initial seed set. The mean average precision across all UTI symptoms was 0.51, and as high as 0.86 for one specific UTI symptom. This study provides an effective and cost-effective solution to terminology expansion with small amounts of labeled data.

19. Disorder recognition in clinical texts using multi-label structured SVM

Author: Donghong Ji, Yanan Lu, and Wutao Lin
Subjects: 0301 basic medicine, Scheme (programming language), Support Vector Machine, Information extraction, Computer science, Structured support vector machine, computer.software_genre, Machine learning, Biochemistry, Task (project management), 03 medical and health sciences, 0302 clinical medicine, Structural Biology, Multi-label, Data Mining, Humans, Disease, 030212 general & internal medicine, Molecular Biology, computer.programming_language, business.industry, Methodology Article, Applied Mathematics, Clinical text, Computer Science Applications, Support vector machine, Data set, 030104 developmental biology, ComputingMethodologies_PATTERNRECOGNITION, Artificial intelligence, business, computer
Abstract: Background Information extraction in clinical texts enables medical workers to find out problems of patients faster as well as makes intelligent diagnosis possible in the future. There has been a lot of work about disorder mention recognition in clinical narratives. But recognition of some more complicated disorder mentions like overlapping ones is still an open issue. This paper proposes a multi-label structured Support Vector Machine (SVM) based method for disorder mention recognition. We present a multi-label scheme which could be used in complicated entity recognition tasks. Results We performed three sets of experiments to evaluate our model. Our best F1-Score on the 2013 Conference and Labs of the Evaluation Forum data set is 0.7343. There are six types of labels in our multi-label scheme, all of which are represented by 24-bit binary numbers. The binary digits of each label contain information about different disorder mentions. Our multi-label method can recognize not only disorder mentions in the form of contiguous or discontiguous words but also mentions whose spans overlap with each other. The experiments indicate that our multi-label structured SVM model outperforms the condition random field (CRF) model for this disorder mention recognition task. The experiments show that our multi-label scheme surpasses the baseline. Especially for overlapping disorder mentions, the F1-Score of our multi-label scheme is 0.1428 higher than the baseline BIOHD1234 scheme. Conclusions This multi-label structured SVM based approach is demonstrated to work well with this disorder recognition task. The novel multi-label scheme we presented is superior to the baseline and it can be used in other models to solve various types of complicated entity recognition tasks as well.
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

19 results on '"clinical text"'

1. Developing a Clinical Language Model for Swedish : Continued Pretraining of Generic BERT with In-Domain Data

2. Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis

3. Incorporating Domain Knowledge into Natural Language Inference on Clinical Texts

4. Attention-based bidirectional long short-term memory networks for extracting temporal relationships from clinical discharge summaries

5. The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study

6. Temporal Expression Classification and Normalization From Chinese Narrative Clinical Texts: Pattern Learning Approach

7. A Pattern-Based Method for Medical Entity Recognition From Chinese Diagnostic Imaging Text

8. An efficient prototype method to identify and correct misspellings in clinical text

9. Extracting Clinical Relations in Electronic Health Records Using Enriched Parse Trees

10. À la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge

11. Expanding a dictionary of marker words for uncertainty and negation using distributional semantics

12. Parsing Clinical Text

13. A De-identification method for bilingual clinical texts of various note types

14. An enhanced CRFs-based system for information extraction from radiology reports

15. Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus

16. Building a semantically annotated corpus of clinical texts

17. UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text

18. Terminology expansion with prototype embeddings: Extracting symptoms of urinary tract infection from clinical text

19. Disorder recognition in clinical texts using multi-label structured SVM

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

19 results on '"clinical text"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources