Author: "Bethard, S." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Bethard, S."' showing total 27 results

Start Over Author "Bethard, S."

27 results on '"Bethard, S."'

1. Wikipedia Entities as Rendezvous across Languages: Grounding Multilingual Language Models by Predicting Wikipedia Hyperlinks

Author: Calixto, I., Raganato, A., Pasini, T., Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tur, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y., ILLC (FNWI), Language and Computation (ILLC, FNWI/FGw), Calixto, I, Raganato, A, Pasini, T, Toutanova [et al.], Kristina, Department of Digital Humanities, and Language Technology
Subjects: Vocabulary, Computer science, media_common.quotation_subject, 02 engineering and technology, entity linking, computer.software_genre, Semantics, 03 medical and health sciences, 0302 clinical medicine, 0202 electrical engineering, electronic engineering, information engineering, 6121 Languages, media_common, Point (typography), business.industry, deep learning, Hyperlink, 113 Computer and information sciences, Task (computing), 030221 ophthalmology & optometry, transformer, language model, 020201 artificial intelligence & image processing, Language model, Artificial intelligence, multilingual, business, computer, Word (computer architecture), Natural language processing, De facto standard
Abstract: Masked language models have quickly become the de facto standard when processing text. Recently, several approaches have been proposed to further enrich word representations with external knowledge sources such as knowledge graphs. However, these models are devised and evaluated in a monolingual setting only. In this work, we propose a language-independent entity prediction task as an intermediate training procedure to ground word representations on entity semantics and bridge the gap across different languages by means of a shared vocabulary of entities. We show that our approach effectively injects new lexical-semantic knowledge into neural models, improving their performance on different semantic tasks in the zero-shot crosslingual setting. As an additional advantage, our intermediate training does not require any supplementary input, allowing our models to be applied to new datasets right away. In our experiments, we use Wikipedia articles in up to 100 languages and already observe consistent gains compared to strong baselines when predicting entities using only the English Wikipedia. Further adding extra languages lead to improvements in most tasks up to a certain point, but overall we found it non-trivial to scale improvements in model transferability by training on ever increasing amounts of Wikipedia languages.
Published: 2021

2. Challenging distributional models with a conceptual network of philosophical terms

Author: Oortwijn, Y., Bloem, J., Sommerauer, P., Meyer, F., Zhou, W., Fokkens, A., Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tur, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y., ILLC (FGw), and Logic and Language (ILLC, FNWI/FGw)
Subjects: Ground truth, Small data, Computer science, 06 humanities and the arts, 02 engineering and technology, 0603 philosophy, ethics and religion, Data science, Conceptual network, Blueprint, 060302 philosophy, Close reading, 0202 electrical engineering, electronic engineering, information engineering, Literary criticism, 020201 artificial intelligence & image processing
Abstract: Computational linguistic research on language change through distributional semantic (DS) models has inspired researchers from fields such as philosophy and literary studies, who use these methods for the exploration and comparison of comparatively small datasets traditionally analyzed by close reading. Research on methods for small data is still in early stages and it is not clear which methods achieve the best results. We investigate the possibilities and limitations of using distributional semantic models for analyzing philosophical data by means of a realistic use-case. We provide a ground truth for evaluation created by philosophy experts and a blueprint for using DS models in a sound methodological setup. We compare three methods for creating specialized models from small datasets. Though the models do not perform well enough to directly support philosophers yet, we find that models designed for small data yield promising directions for future work.
Published: 2021
Full Text: View/download PDF

3. EaSe: A Diagnostic Tool for VQA Based on Answer Diversity

Author: Jolly, S., Pezzelle, S., Nabi, M., Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tur, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y., Language and Computation (ILLC, FNWI/FGw), and ILLC (FNWI)
Subjects: Information retrieval, Computer science, 0202 electrical engineering, electronic engineering, information engineering, Question answering, 020201 artificial intelligence & image processing, Sample (statistics), 02 engineering and technology, 010501 environmental sciences, Entropy (energy dispersal), 01 natural sciences, 0105 earth and related environmental sciences
Abstract: We propose EASE, a simple diagnostic tool for Visual Question Answering (VQA) which quantifies the difficulty of an image, question sample. EASE is based on the pattern of answers provided by multiple annotators to a given question. In particular, it considers two aspects of the answers: (i) their Entropy; (ii) their Semantic content. First, we prove the validity of our diagnostic to identify samples that are easy/hard for state-of-art VQA models. Second, we show that EASE can be successfully used to select the most-informative samples for training/fine-tuning. Crucially, only information that is readily available in any VQA dataset is used to compute its scores.
Published: 2021
Full Text: View/download PDF

4. DUTH at SemEval-2018 Task 2: Emoji Prediction in Tweets

Author: Apidianaki, M, Mohammad, SM, May, J, Shutova, E, Bethard, S, Carpuat, M, Effrosynidis, D, Peikos, G, Symeonidis, S, Arampatzis, A, Effrosynidis D., Peikos G., Symeonidis S., Arampatzis A., Apidianaki, M, Mohammad, SM, May, J, Shutova, E, Bethard, S, Carpuat, M, Effrosynidis, D, Peikos, G, Symeonidis, S, Arampatzis, A, Effrosynidis D., Peikos G., Symeonidis S., and Arampatzis A.
Abstract: This paper describes the approach that was developed for SemEval 2018 Task 2 (Multilingual Emoji Prediction) by the DUTH Team. First, we employed a combination of preprocessing techniques to reduce the noise of tweets and produce a number of features. Then, we built several N-grams, to represent the combination of word and emojis. Finally, we trained our system with a tuned LinearSVC classifier. Our approach in the leaderboard ranked 18th amongst 48 teams.
Published: 2018

5. Predicting the Resolution of Referring Expressions from User Behavior

Author: Engonopoulos, N., Villalba, M., Titov, I., Koller, A., Yarowsky, D., Baldwin, T., Korhonen, A., Livescu, K., Bethard, S., ILLC (FNWI), and Language and Computation (ILLC, FNWI/FGw)
Published: 2013

6. Introduction

Author: Chinaei, H., Dreyer, M., Gillenwater, J., Hassan, S., Kwiatkowski, T., Lehr, M., Levenberg, A., Lin, Z., Mannem, P., Morley, E., Schneider, N., Bergsma, S., Bernhard, D., Bethard, S., Blunsom, P., Cahill, A., Carterette, B., Choi, Y., Clark, S., Crysmann, B., Curran, J., Elsner, M., Goldwater, S., Hachey, B., Hirschberg, J., Huenerfauth, M., Johnston, M., Kruschwitz, U., Kumaran, G., Lavrenko, V., Lin, C.-Y., Lopez, A., McClosky, D., Metzler, D., Mishra, T., Mohammad, S., Pado, S., Pastra, K., Pecina, P., Raghavan, H., Romportl, J., Schmid, H., Snyder, B., Stoyanov, V., Su, Y., Tetreault, J., Vanderwende, L., Wilks, Y., Yang, F., Yates, A., Yilmaz, E., Zeman, D., Petrovic, S., Pitler, E., Selfridge, E., Osborne, M., and Solorio, T.
Abstract: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL HLT 2011, 19 June 2011 through 24 June 2011, Portland, OR
Published: 2011

7. Crowdsourcing and language studies: The new generation of linguistic data

Author: Munro, R., Bethard, S., Kuperman, V., Lai, V., Melnick, R., Potts, C., Schnoebelen, T., and Tily, H.
Published: 2010

8. Timelines from Text: Identification of Syntactic Temporal Relations.

Author: Bethard, S., Martin, J.H., and Klingenstein, S.
Published: 2007
Full Text: View/download PDF

9. Age and gender prediction on health forum data

Author: Shrestha, P., Bethard, S., Ted Pedersen, Rey-Villamizar, N., Sadeque, F., and Solorio, T.

10. A piece of my mind. Taking my poison.

Author: Bethard S, Young RK, and Bethard, Suzanne
Published: 2006
Full Text: View/download PDF

11. End-to-end clinical temporal information extraction with multi-head attention.

Author: Miller T, Bethard S, Dligach D, and Savova G
Abstract: Understanding temporal relationships in text from electronic health records can be valuable for many important downstream clinical applications. Since Clinical TempEval 2017, there has been little work on end-to-end systems for temporal relation extraction, with most work focused on the setting where gold standard events and time expressions are given. In this work, we make use of a novel multi-headed attention mechanism on top of a pre-trained transformer encoder to allow the learning process to attend to multiple aspects of the contextualized embeddings. Our system achieves state of the art results on the THYME corpus by a wide margin, in both the in-domain and cross-domain settings.
Published: 2023

12. Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization.

Author: Xu D, Gopale M, Zhang J, Brown K, Begoli E, and Bethard S
Subjects: Humans, RxNorm, Systematized Nomenclature of Medicine, Natural Language Processing, Neural Networks, Computer, Patient Discharge Summaries, Unified Medical Language System
Abstract: Objective: Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization., Materials and Methods: The shared task provided 13 609 concept mentions drawn from 100 discharge summaries. We first design a sieve-based system that uses Lucene indices over the training data, Unified Medical Language System (UMLS) preferred terms, and UMLS synonyms to generate a list of possible concepts for each mention. We then design a listwise classifier based on the BERT (Bidirectional Encoder Representations from Transformers) neural network to rank the candidate concepts, integrating UMLS semantic types through a regularizer., Results: Our generate-and-rank system was third of 33 in the competition, outperforming the candidate generator alone (81.66% vs 79.44%) and the previous state of the art (76.35%). During postevaluation, the model's accuracy was increased to 83.56% via improvements to how training data are generated from UMLS and incorporation of our UMLS semantic type regularizer., Discussion: Analysis of the model shows that prioritizing UMLS preferred terms yields better performance, that the UMLS semantic type regularizer results in qualitatively better concept predictions, and that the model performs well even on concepts not seen during training., Conclusions: Our generate-and-rank framework for UMLS concept normalization integrates key UMLS features like preferred terms and semantic types with a neural network-based ranking model to accurately link phrases in text to UMLS concepts., (© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
Published: 2020
Full Text: View/download PDF

13. Rethinking domain adaptation for machine learning over clinical language.

Author: Laparra E, Bethard S, and Miller TA
Abstract: Building clinical natural language processing (NLP) systems that work on widely varying data is an absolute necessity because of the expense of obtaining new training data. While domain adaptation research can have a positive impact on this problem, the most widely studied paradigms do not take into account the realities of clinical data sharing. To address this issue, we lay out a taxonomy of domain adaptation, parameterizing by what data is shareable. We show that the most realistic settings for clinical use cases are seriously under-studied. To support research in these important directions, we make a series of recommendations, not just for domain adaptation but for clinical NLP in general, that ensure that data, shared tasks, and released models are broadly useful, and that initiate research directions where the clinical NLP community can lead the broader NLP and machine learning fields., (© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
Published: 2020
Full Text: View/download PDF

14. Does BERT need domain adaptation for clinical negation detection?

Author: Lin C, Bethard S, Dligach D, Sadeque F, Savova G, and Miller TA
Subjects: Algorithms, Datasets as Topic, Humans, Medical Records, Information Storage and Retrieval methods, Machine Learning, Natural Language Processing, Neural Networks, Computer
Abstract: Introduction: Classifying whether concepts in an unstructured clinical text are negated is an important unsolved task. New domain adaptation and transfer learning methods can potentially address this issue., Objective: We examine neural unsupervised domain adaptation methods, introducing a novel combination of domain adaptation with transformer-based transfer learning methods to improve negation detection. We also want to better understand the interaction between the widely used bidirectional encoder representations from transformers (BERT) system and domain adaptation methods., Materials and Methods: We use 4 clinical text datasets that are annotated with negation status. We evaluate a neural unsupervised domain adaptation algorithm and BERT, a transformer-based model that is pretrained on massive general text datasets. We develop an extension to BERT that uses domain adversarial training, a neural domain adaptation method that adds an objective to the negation task, that the classifier should not be able to distinguish between instances from 2 different domains., Results: The domain adaptation methods we describe show positive results, but, on average, the best performance is obtained by plain BERT (without the extension). We provide evidence that the gains from BERT are likely not additive with the gains from domain adaptation., Discussion: Our results suggest that, at least for the task of clinical negation detection, BERT subsumes domain adaptation, implying that BERT is already learning very general representations of negation phenomena such that fine-tuning even on a specific corpus does not lead to much overfitting., Conclusion: Despite being trained on nonclinical text, the large training sets of models like BERT lead to large gains in performance for the clinical negation detection task., (© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.)
Published: 2020
Full Text: View/download PDF

15. UArizona at the MADE1.0 NLP Challenge.

Author: Xu D, Yadav V, and Bethard S
Abstract: MADE1.0 is a public natural language processing challenge aiming to extract medication and adverse drug events from Electronic Health Records. This work presents NER and RI systems developed by UArizona team for the MADE1.0 competition. We propose a neural NER system for medical named entity recognition using both local and context features for each individual word and a simple but effective SVM-based pairwise relation classification system for identifying relations between medical entities and attributes. Our system achieves 81.56%, 83.18%, and 59.85% F1 score in the three tasks of MADE1.0 challenge, respectively, ranked amongst the top three teams for Task 2 and 3.
Published: 2018

16. From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations.

Author: Laparra E, Xu D, and Bethard S
Abstract: This paper presents the first model for time normalization trained on the SCATE corpus. In the SCATE schema, time expressions are annotated as a semantic composition of time entities. This novel schema favors machine learning approaches, as it can be viewed as a semantic parsing task. In this work, we propose a character level multi-output neural network that outperforms previous state-of-the-art built on the TimeML schema. To compare predictions of systems that follow both SCATE and TimeML, we present a new scoring metric for time intervals. We also apply this new metric to carry out a comparative analysis of the annotations of both schemes in the same corpus.
Published: 2018
Full Text: View/download PDF

17. UArizona at the CLEF eRisk 2017 Pilot Task: Linear and Recurrent Models for Early Depression Detection.

Author: Sadeque F, Xu D, and Bethard S
Abstract: The 2017 CLEF eRisk pilot task focuses on automatically detecting depression as early as possible from a users' posts to Reddit. In this paper we present the techniques employed for the University of Arizona team's participation in this early risk detection shared task. We leveraged external information beyond the small training set, including a preexisting depression lexicon and concepts from the Unified Medical Language System as features. For prediction, we used both sequential (recurrent neural network) and non-sequential (support vector machine) models. Our models perform decently on the test data, and the recurrent neural models perform better than the non-sequential support vector machines while using the same feature sets.
Published: 2017

18. Towards generalizable entity-centric clinical coreference resolution.

Author: Miller T, Dligach D, Bethard S, Lin C, and Savova G
Subjects: Humans, APACHE, Electronic Health Records, Natural Language Processing, Semantics, Software
Abstract: Objective: This work investigates the problem of clinical coreference resolution in a model that explicitly tracks entities, and aims to measure the performance of that model in both traditional in-domain train/test splits and cross-domain experiments that measure the generalizability of learned models., Methods: The two methods we compare are a baseline mention-pair coreference system that operates over pairs of mentions with best-first conflict resolution and a mention-synchronous system that incrementally builds coreference chains. We develop new features that incorporate distributional semantics, discourse features, and entity attributes. We use two new coreference datasets with similar annotation guidelines - the THYME colon cancer dataset and the DeepPhe breast cancer dataset., Results: The mention-synchronous system performs similarly on in-domain data but performs much better on new data. Part of speech tag features prove superior in feature generalizability experiments over other word representations. Our methods show generalization improvement but there is still a performance gap when testing in new domains., Discussion: Generalizability of clinical NLP systems is important and under-studied, so future work should attempt to perform cross-domain and cross-institution evaluations and explicitly develop features and training regimens that favor generalizability. A performance-optimized version of the mention-synchronous system will be included in the open source Apache cTAKES software., (Copyright © 2017 Elsevier Inc. All rights reserved.)
Published: 2017
Full Text: View/download PDF

19. Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning.

Author: Osborne JD, Wyatt M, Westfall AO, Willig J, Bethard S, and Gordon G
Subjects: Electronic Health Records, Humans, International Classification of Diseases, Mandatory Reporting, Pathology, Clinical, United States, Data Mining methods, Machine Learning, Natural Language Processing, Neoplasms pathology
Abstract: Objective: To help cancer registrars efficiently and accurately identify reportable cancer cases., Material and Methods: The Cancer Registry Control Panel (CRCP) was developed to detect mentions of reportable cancer cases using a pipeline built on the Unstructured Information Management Architecture - Asynchronous Scaleout (UIMA-AS) architecture containing the National Library of Medicine's UIMA MetaMap annotator as well as a variety of rule-based UIMA annotators that primarily act to filter out concepts referring to nonreportable cancers. CRCP inspects pathology reports nightly to identify pathology records containing relevant cancer concepts and combines this with diagnosis codes from the Clinical Electronic Data Warehouse to identify candidate cancer patients using supervised machine learning. Cancer mentions are highlighted in all candidate clinical notes and then sorted in CRCP's web interface for faster validation by cancer registrars., Results: CRCP achieved an accuracy of 0.872 and detected reportable cancer cases with a precision of 0.843 and a recall of 0.848. CRCP increases throughput by 22.6% over a baseline (manual review) pathology report inspection system while achieving a higher precision and recall. Depending on registrar time constraints, CRCP can increase recall to 0.939 at the expense of precision by incorporating a data source information feature., Conclusion: CRCP demonstrates accurate results when applying natural language processing features to the problem of detecting patients with cases of reportable cancer from clinical notes. We show that implementing only a portion of cancer reporting rules in the form of regular expressions is sufficient to increase the precision, recall, and speed of the detection of reportable cancer cases when combined with off-the-shelf information extraction software and machine learning., (© The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
Published: 2016
Full Text: View/download PDF

20. Multilayered temporal modeling for the clinical domain.

Author: Lin C, Dligach D, Miller TA, Bethard S, and Savova GK
Subjects: Algorithms, Natural Language Processing, Ownership, Time, Electronic Health Records, Information Storage and Retrieval methods
Abstract: Objective: To develop an open-source temporal relation discovery system for the clinical domain. The system is capable of automatically inferring temporal relations between events and time expressions using a multilayered modeling strategy. It can operate at different levels of granularity--from rough temporality expressed as event relations to the document creation time (DCT) to temporal containment to fine-grained classic Allen-style relations., Materials and Methods: We evaluated our systems on 2 clinical corpora. One is a subset of the Temporal Histories of Your Medical Events (THYME) corpus, which was used in SemEval 2015 Task 6: Clinical TempEval. The other is the 2012 Informatics for Integrating Biology and the Bedside (i2b2) challenge corpus. We designed multiple supervised machine learning models to compute the DCT relation and within-sentence temporal relations. For the i2b2 data, we also developed models and rule-based methods to recognize cross-sentence temporal relations. We used the official evaluation scripts of both challenges to make our results comparable with results of other participating systems. In addition, we conducted a feature ablation study to find out the contribution of various features to the system's performance., Results: Our system achieved state-of-the-art performance on the Clinical TempEval corpus and was on par with the best systems on the i2b2 2012 corpus. Particularly, on the Clinical TempEval corpus, our system established a new F1 score benchmark, statistically significant as compared to the baseline and the best participating system., Conclusion: Presented here is the first open-source clinical temporal relation discovery system. It was built using a multilayered temporal modeling strategy and achieved top performance in 2 major shared tasks., (© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
Published: 2016
Full Text: View/download PDF

21. Discovering body site and severity modifiers in clinical texts.

Author: Dligach D, Bethard S, Becker L, Miller T, and Savova GK
Subjects: Electronic Health Records, Humans, Anatomy, Natural Language Processing, Severity of Illness Index, Support Vector Machine
Abstract: Objective: To research computational methods for discovering body site and severity modifiers in clinical texts., Methods: We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct cross-domain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors., Results: The performance of our method for discovering body site modifiers achieves F1 of 0.740-0.908 and our method for discovering severity modifiers achieves F1 of 0.905-0.929., Discussion: Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures., Conclusions: We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES).
Published: 2014
Full Text: View/download PDF

22. ClearTK 2.0: Design Patterns for Machine Learning in UIMA.

Author: Bethard S, Ogren P, and Becker L
Abstract: ClearTK adds machine learning functionality to the UIMA framework, providing wrappers to popular machine learning libraries, a rich feature extraction library that works across different classifiers, and utilities for applying and evaluating machine learning models. Since its inception in 2008, ClearTK has evolved in response to feedback from developers and the community. This evolution has followed a number of important design principles including: conceptually simple annotator interfaces, readable pipeline descriptions, minimal collection readers, type system agnostic code, modules organized for ease of import, and assisting user comprehension of the complex UIMA framework.
Published: 2014

23. Temporal Annotation in the Clinical Domain.

Author: Styler WF 4th, Bethard S, Finan S, Palmer M, Pradhan S, de Groen PC, Erickson B, Miller T, Lin C, Savova G, and Pustejovsky J
Abstract: This article discusses the requirements of a formal specification for the annotation of temporal information in clinical narratives. We discuss the implementation and extension of ISO-TimeML for annotating a corpus of clinical notes, known as the THYME corpus. To reflect the information task and the heavily inference-based reasoning demands in the domain, a new annotation guideline has been developed, "the THYME Guidelines to ISO-TimeML (THYME-TimeML)". To clarify what relations merit annotation, we distinguish between linguistically-derived and inferentially-derived temporal orderings in the text. We also apply a top performing TempEval 2013 system against this new resource to measure the difficulty of adapting systems to the clinical domain. The corpus is available to the community and has been proposed for use in a SemEval 2015 task.
Published: 2014

24. Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium.

Author: Pathak J, Bailey KR, Beebe CE, Bethard S, Carrell DC, Chen PJ, Dligach D, Endle CM, Hart LA, Haug PJ, Huff SM, Kaggal VC, Li D, Liu H, Marchant K, Masanz J, Miller T, Oniki TA, Palmer M, Peterson KJ, Rea S, Savova GK, Stancl CR, Sohn S, Solbrig HR, Suesse DB, Tao C, Taylor DP, Westberg L, Wu S, Zhuo N, and Chute CG
Subjects: Algorithms, Biomedical Research, Computer Security, Humans, Software, Vocabulary, Controlled, Data Mining, Electronic Health Records standards, Medical Informatics Applications, Natural Language Processing, Phenotype
Abstract: Research Objective: To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction., Materials and Methods: Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems-Mayo Clinic and Intermountain Healthcare-were used for development and validation. Extracted information was standardized and normalized to meaningful use (MU) conformant terminology and value set standards using Clinical Element Models (CEMs). These resources were used to demonstrate semi-automatic execution of MU clinical-quality measures modeled using the Quality Data Model (QDM) and an open-source rules engine., Results: Using CEMs and open-source natural language processing and terminology services engines-namely, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) and Common Terminology Services (CTS2)-we developed a data-normalization platform that ensures data security, end-to-end connectivity, and reliable data flow within and across institutions. We demonstrated the applicability of this platform by executing a QDM-based MU quality measure that determines the percentage of patients between 18 and 75 years with diabetes whose most recent low-density lipoprotein cholesterol test result during the measurement year was <100 mg/dL on a randomly selected cohort of 273 Mayo Clinic patients. The platform identified 21 and 18 patients for the denominator and numerator of the quality measure, respectively. Validation results indicate that all identified patients meet the QDM-based criteria., Conclusions: End-to-end automated systems for extracting clinical information from diverse EHR systems require extensive use of standardized vocabularies and terminologies, as well as robust information models for storing, discovering, and processing that information. This study demonstrates the application of modular and open-source resources for enabling secondary use of EHR data through normalization into standards-based, comparable, and consistent format for high-throughput phenotyping to identify patient cohorts.
Published: 2013
Full Text: View/download PDF

25. A synchronous context free grammar for time normalization.

Author: Bethard S
Abstract: We present an approach to time normalization (e.g. the day before yesterday ⇒2013-04-12) based on a synchronous context free grammar. Synchronous rules map the source language to formally defined operators for manipulating times (FindEnclosed, StartAtEndOf, etc.). Time expressions are then parsed using an extended CYK+ algorithm, and converted to a normalized form by applying the operators recursively. For evaluation, a small set of synchronous rules for English time expressions were developed. Our model outperforms HeidelTime, the best time normalization system in TempEval 2013, on four different time normalization corpora.
Published: 2013

26. Towards temporal relation discovery from the clinical narrative.

Author: Savova G, Bethard S, Styler W, Martin J, Palmer M, Masanz J, and Ward W
Subjects: Humans, Methods, Semantics, Time, Disease Progression, Narration, Natural Language Processing
Abstract: Disease progression and understanding relies on temporal concepts. Discovery of automated temporal relations and timelines from the clinical narrative allows for mining large data sets of clinical text to uncover patterns at the disease and patient level. Our overall goal is the complex task of building a system for automated temporal relation discovery. As a first step, we evaluate enabling methods from the general natural language processing domain - deep parsing and semantic role labeling in predicate-argument structures - to explore their portability to the clinical domain. As a second step, we develop an annotation schema for temporal relations based on TimeML. In this paper we report results and findings from these first steps. Our next efforts will scale up the data collection to develop domain-specific modules for the enabling technologies within Mayo's open-source clinical Text Analysis and Knowledge Extraction System.
Published: 2009

27. Semantic role labeling for protein transport predicates.

Author: Bethard S, Lu Z, Martin JH, and Hunter L
Subjects: Animals, Classification methods, Humans, Information Storage and Retrieval methods, Information Theory, Pattern Recognition, Automated methods, Semantics, Terminology as Topic, Vocabulary, Controlled, Abstracting and Indexing methods, Computational Biology methods, Genes physiology, Natural Language Processing, Protein Transport genetics
Abstract: Background: Automatic semantic role labeling (SRL) is a natural language processing (NLP) technique that maps sentences to semantic representations. This technique has been widely studied in the recent years, but mostly with data in newswire domains. Here, we report on a SRL model for identifying the semantic roles of biomedical predicates describing protein transport in GeneRIFs - manually curated sentences focusing on gene functions. To avoid the computational cost of syntactic parsing, and because the boundaries of our protein transport roles often did not match up with syntactic phrase boundaries, we approached this problem with a word-chunking paradigm and trained support vector machine classifiers to classify words as being at the beginning, inside or outside of a protein transport role., Results: We collected a set of 837 GeneRIFs describing movements of proteins between cellular components, whose predicates were annotated for the semantic roles AGENT, PATIENT, ORIGIN and DESTINATION. We trained these models with the features of previous word-chunking models, features adapted from phrase-chunking models, and features derived from an analysis of our data. Our models were able to label protein transport semantic roles with 87.6% precision and 79.0% recall when using manually annotated protein boundaries, and 87.0% precision and 74.5% recall when using automatically identified ones., Conclusion: We successfully adapted the word-chunking classification paradigm to semantic role labeling, applying it to a new domain with predicates completely absent from any previous studies. By combining the traditional word and phrasal role labeling features with biomedical features like protein boundaries and MEDPOST part of speech tags, we were able to address the challenges posed by the new domain data and subsequently build robust models that achieved F-measures as high as 83.1. This system for extracting protein transport information from GeneRIFs performs well even with proteins identified automatically, and is therefore more robust than the rule-based methods previously used to extract protein transport roles.
Published: 2008
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

27 results on '"Bethard, S."'

1. Wikipedia Entities as Rendezvous across Languages: Grounding Multilingual Language Models by Predicting Wikipedia Hyperlinks

2. Challenging distributional models with a conceptual network of philosophical terms

3. EaSe: A Diagnostic Tool for VQA Based on Answer Diversity

4. DUTH at SemEval-2018 Task 2: Emoji Prediction in Tweets

5. Predicting the Resolution of Referring Expressions from User Behavior

6. Introduction

7. Crowdsourcing and language studies: The new generation of linguistic data

8. Timelines from Text: Identification of Syntactic Temporal Relations.

9. Age and gender prediction on health forum data

10. A piece of my mind. Taking my poison.

11. End-to-end clinical temporal information extraction with multi-head attention.

12. Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization.

13. Rethinking domain adaptation for machine learning over clinical language.

14. Does BERT need domain adaptation for clinical negation detection?

15. UArizona at the MADE1.0 NLP Challenge.

16. From Characters to Time Intervals: New Paradigms for Evaluation and Neural Parsing of Time Normalizations.

17. UArizona at the CLEF eRisk 2017 Pilot Task: Linear and Recurrent Models for Early Depression Detection.

18. Towards generalizable entity-centric clinical coreference resolution.

19. Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning.

20. Multilayered temporal modeling for the clinical domain.

21. Discovering body site and severity modifiers in clinical texts.

22. ClearTK 2.0: Design Patterns for Machine Learning in UIMA.

23. Temporal Annotation in the Clinical Domain.

24. Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium.

25. A synchronous context free grammar for time normalization.

26. Towards temporal relation discovery from the clinical narrative.

27. Semantic role labeling for protein transport predicates.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

27 results on '"Bethard, S."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources