92 results on '"Rindflesch TC"'
Search Results
2. Combining relevance assignment with quality of the evidence to support guideline development.
- Author
-
Fiszman M, Bray BE, Shin D, Kilicoglu H, Bennett GC, Bodenreider O, Rindflesch TC, Safran C, Reti S, and Marin H
- Published
- 2010
3. Towards automatic recognition of scientifically rigorous clinical research evidence.
- Author
-
Kilicoglu H, Demner-Fushman D, Rindflesch TC, Wilczynski NL, Haynes RB, Kilicoglu, Halil, Demner-Fushman, Dina, Rindflesch, Thomas C, Wilczynski, Nancy L, and Haynes, R Brian
- Abstract
The growing numbers of topically relevant biomedical publications readily available due to advances in document retrieval methods pose a challenge to clinicians practicing evidence-based medicine. It is increasingly time consuming to acquire and critically appraise the available evidence. This problem could be addressed in part if methods were available to automatically recognize rigorous studies immediately applicable in a specific clinical situation. We approach the problem of recognizing studies containing useable clinical advice from retrieved topically relevant articles as a binary classification problem. The gold standard used in the development of PubMed clinical query filters forms the basis of our approach. We identify scientifically rigorous studies using supervised machine learning techniques (Naïve Bayes, support vector machine (SVM), and boosting) trained on high-level semantic features. We combine these methods using an ensemble learning method (stacking). The performance of learning methods is evaluated using precision, recall and F(1) score, in addition to area under the receiver operating characteristic (ROC) curve (AUC). Using a training set of 10,000 manually annotated MEDLINE citations, and a test set of an additional 2,000 citations, we achieve 73.7% precision and 61.5% recall in identifying rigorous, clinically relevant studies, with stacking over five feature-classifier combinations and 82.5% precision and 84.3% recall in recognizing rigorous studies with treatment focus using stacking over word + metadata feature vector. Our results demonstrate that a high quality gold standard and advanced classification methods can help clinicians acquire best evidence from the medical literature. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
4. Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: preliminary experiment [corrected] [published erratum appears in J AM SOC INF SCI TECHNOL 2006 Mar;57(5):726].
- Author
-
Humphrey SM, Rogers WJ, Kilicoglu H, Demner-Fushman D, and Rindflesch TC
- Abstract
An experiment was performed at the National Library of Medicine(R) (NLM(R) in word sense disambiguation (WSD) using the Journal Descriptor Indexing (JDI) methodology. The motivation is the need to solve the ambiguity problem confronting NLM's MetaMap system, which maps free text to terms corresponding to concepts in NLM's Unified Medical Language System(R) (UMLS(R) Metathesaurus(R). If the text maps to more than one Metathesaurus concept at the same high confidence score, MetaMap has no way of knowing which concept is the correct mapping. We describe the JDI methodology, which is ultimately based on statistical associations between words in a training set of MEDLINE(R) citations and a small set of journal descriptors (assigned by humans to journals per se) assumed to be inherited by the citations. JDI is the basis for selecting the best meaning that is correlated to UMLS semantic types (STs) assigned to ambiguous concepts in the Metathesaurus. For example, the ambiguity transport has two meanings: 'Biological Transport' assigned the ST Cell Function and 'Patient transport' assigned the ST Health Care Activity. A JDI-based methodology can analyze text containing transport and determine which ST receives a higher score for that text, which then returns the associated meaning, presumed to apply to the ambiguity itself. We then present an experiment in which a baseline disambiguation method was compared to four versions of JDI in disambiguating 45 ambiguous strings from NLM's WSD Test Collection. Overall average precision for the highest-scoring JDI version was 0.7873 compared to 0.2492 for the baseline method, and average precision for individual ambiguities was greater than 0.90 for 23 of them (51%), greater than 0.85 for 24 (53%), and greater than 0.65 for 35 (79%). On the basis of these results, we hope to improve performance of JDI and test its use in applications. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
5. Hierarchical concept indexing of full-text documents in the UNIFIED MEDICAL LANGUAGE SYSTEM Information Sources Map.
- Author
-
Wright LW, Nardini HKG, Aronson AR, and Rindflesch TC
- Published
- 1999
- Full Text
- View/download PDF
6. Investigating the role of interleukin-1 beta and glutamate in inflammatory bowel disease and epilepsy using discovery browsing.
- Author
-
Rindflesch TC, Blake CL, Cairelli MJ, Fiszman M, Zeiss CJ, and Kilicoglu H
- Subjects
- Brain metabolism, Humans, MEDLINE, Semantics, Data Mining, Epilepsy metabolism, Glutamic Acid metabolism, Inflammatory Bowel Diseases metabolism, Interleukin-1beta metabolism
- Abstract
Background: Structured electronic health records are a rich resource for identifying novel correlations, such as co-morbidities and adverse drug reactions. For drug development and better understanding of biomedical phenomena, such correlations need to be supported by viable hypotheses about the mechanisms involved, which can then form the basis of experimental investigations., Methods: In this study, we demonstrate the use of discovery browsing, a literature-based discovery method, to generate plausible hypotheses elucidating correlations identified from structured clinical data. The method is supported by Semantic MEDLINE web application, which pinpoints interesting concepts and relevant MEDLINE citations, which are used to build a coherent hypothesis., Results: Discovery browsing revealed a plausible explanation for the correlation between epilepsy and inflammatory bowel disease that was found in an earlier population study. The generated hypothesis involves interleukin-1 beta (IL-1 beta) and glutamate, and suggests that IL-1 beta influence on glutamate levels is involved in the etiology of both epilepsy and inflammatory bowel disease., Conclusions: The approach presented in this paper can supplement population-based correlation studies by enabling the scientist to identify literature that may justify the novel patterns identified in such studies and can underpin basic biomedical research that can lead to improved treatments and better healthcare outcomes.
- Published
- 2018
- Full Text
- View/download PDF
7. Assigning factuality values to semantic relations extracted from biomedical research literature.
- Author
-
Kilicoglu H, Rosemblat G, and Rindflesch TC
- Subjects
- Humans, Machine Learning, Natural Language Processing, Publications, Semantics, Biomedical Research, Data Mining
- Abstract
Biomedical knowledge claims are often expressed as hypotheses, speculations, or opinions, rather than explicit facts (propositions). Much biomedical text mining has focused on extracting propositions from biomedical literature. One such system is SemRep, which extracts propositional content in the form of subject-predicate-object triples called predications. In this study, we investigated the feasibility of assessing the factuality level of SemRep predications to provide more nuanced distinctions between predications for downstream applications. We annotated semantic predications extracted from 500 PubMed abstracts with seven factuality values (fact, probable, possible, doubtful, counterfact, uncommitted, and conditional). We extended a rule-based, compositional approach that uses lexical and syntactic information to predict factuality levels. We compared this approach to a supervised machine learning method that uses a rich feature set based on the annotated corpus. Our results indicate that the compositional approach is more effective than the machine learning method in recognizing the factuality values of predications. The annotated corpus as well as the source code and binaries for factuality assignment are publicly available. We will also incorporate the results of the better performing compositional approach into SemMedDB, a PubMed-scale repository of semantic predications extracted using SemRep.
- Published
- 2017
- Full Text
- View/download PDF
8. Informatics Support for Basic Research in Biomedicine.
- Author
-
Rindflesch TC, Blake CL, Fiszman M, Kilicoglu H, Rosemblat G, Schneider J, and Zeiss CJ
- Subjects
- Humans, Semantics, Data Mining methods, Information Storage and Retrieval, MEDLINE
- Abstract
Informatics methodologies exploit computer-assisted techniques to help biomedical researchers manage large amounts of information. In this paper, we focus on the biomedical research literature (MEDLINE). We first provide an overview of some text mining techniques that offer assistance in research by identifying biomedical entities (e.g., genes, substances, and diseases) and relations between them in text.We then discuss Semantic MEDLINE, an application that integrates PubMed document retrieval, concept and relation identification, and visualization, thus enabling a user to explore concepts and relations from within a set of retrieved citations. Semantic MEDLINE provides a roadmap through content and helps users discern patterns in large numbers of retrieved citations. We illustrate its use with an informatics method we call "discovery browsing," which provides a principled way of navigating through selected aspects of some biomedical research area. The method supports an iterative process that accommodates learning and hypothesis formation in which a user is provided with high level connections before delving into details.As a use case, we examine current developments in basic research on mechanisms of Alzheimer's disease. Out of the nearly 90 000 citations returned by the PubMed query "Alzheimer's disease," discovery browsing led us to 73 citations on sortilin and that disorder. We provide a synopsis of the basic research reported in 15 of these. There is wide-spread consensus among researchers working with a range of animal models and human cells that increased sortilin expression and decreased receptor expression are associated with amyloid beta and/or amyloid precursor protein., (Published by Oxford University Press on behalf of the Institute for Laboratory Animal Research 2017. This work is written by (a) US Government employee(s) and is in the public domain in the US.)
- Published
- 2017
- Full Text
- View/download PDF
9. Differentiating Sense through Semantic Interaction Data.
- Author
-
Elizabeth Workman T, Weir C, and Rindflesch TC
- Subjects
- Cluster Analysis, Humans, Software, Algorithms, Natural Language Processing, Semantics, Terminology as Topic
- Abstract
Words which have different representations but are semantically related, such as dementia and delirium, can pose difficult issues in understanding text. We explore the use of interaction frequency data between semantic elements as a means to differentiate concept pairs, using semantic predications extracted from the biomedical literature. We applied datasets of features drawn from semantic predications for semantically related pairs to two Expectation Maximization clustering processes (without, and with concept labels), then used all data to train and evaluate several concept classifying algorithms. For the unlabeled datasets, 80% displayed expected cluster count and similar or matching proportions; all labeled data exhibited similar or matching proportions when restricting cluster count to unique labels. The highest performing classifier achieved 89% accuracy, with F1 scores for individual concept classification ranging from 0.69 to 1. We conclude with a discussion on how these findings may be applied to natural language processing of clinical text.
- Published
- 2017
10. Link Prediction on a Network of Co-occurring MeSH Terms: Towards Literature-based Discovery.
- Author
-
Kastrin A, Rindflesch TC, and Hristovski D
- Subjects
- Algorithms, Area Under Curve, Data Mining, Knowledge Discovery, Medical Subject Headings
- Abstract
Objectives: Literature-based discovery (LBD) is a text mining methodology for automatically generating research hypotheses from existing knowledge. We mimic the process of LBD as a classification problem on a graph of MeSH terms. We employ unsupervised and supervised link prediction methods for predicting previously unknown connections between biomedical concepts., Methods: We evaluate the effectiveness of link prediction through a series of experiments using a MeSH network that contains the history of link formation between biomedical concepts. We performed link prediction using proximity measures, such as common neighbor (CN), Jaccard coefficient (JC), Adamic / Adar index (AA) and preferential attachment (PA). Our approach relies on the assumption that similar nodes are more likely to establish a link in the future., Results: Applying an unsupervised approach, the AA measure achieved the best performance in terms of area under the ROC curve (AUC = 0.76), followed by CN, JC, and PA. In a supervised approach, we evaluate whether proximity measures can be combined to define a model of link formation across all four predictors. We applied various classifiers, including decision trees, k-nearest neighbors, logistic regression, multilayer perceptron, naïve Bayes, and random forests. Random forest classifier accomplishes the best performance (AUC = 0.87)., Conclusions: The link prediction approach proved to be effective for LBD processing. Supervised statistical learning approaches clearly outperform an unsupervised approach to link prediction.
- Published
- 2016
- Full Text
- View/download PDF
11. Using Literature-Based Discovery to Explain Adverse Drug Effects.
- Author
-
Hristovski D, Kastrin A, Dinevski D, Burgun A, Žiberna L, and Rindflesch TC
- Subjects
- Humans, Data Mining methods, Drug-Related Side Effects and Adverse Reactions epidemiology, Drug-Related Side Effects and Adverse Reactions genetics, Pharmacogenetics methods, Pharmacovigilance
- Abstract
We report on our research in using literature-based discovery (LBD) to provide pharmacological and/or pharmacogenomic explanations for reported adverse drug effects. The goal of LBD is to generate novel and potentially useful hypotheses by analyzing the scientific literature and optionally some additional resources. Our assumption is that drugs have effects on some genes or proteins and that these genes or proteins are associated with the observed adverse effects. Therefore, by using LBD we try to find genes or proteins that link the drugs with the reported adverse effects. These genes or proteins can be used to provide insight into the processes causing the adverse effects. Initial results show that our method has the potential to assist in explaining reported adverse drug effects.
- Published
- 2016
- Full Text
- View/download PDF
12. Sortal anaphora resolution to enhance relation extraction from biomedical literature.
- Author
-
Kilicoglu H, Rosemblat G, Fiszman M, and Rindflesch TC
- Subjects
- Linguistics, Semantics, Biological Ontologies, Databases, Factual, Natural Language Processing
- Abstract
Background: Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level., Results: We evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F1 score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed., Conclusions: Our results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection.
- Published
- 2016
- Full Text
- View/download PDF
13. Spark, an application based on Serendipitous Knowledge Discovery.
- Author
-
Workman TE, Fiszman M, Cairelli MJ, Nahl D, and Rindflesch TC
- Subjects
- Internet, Models, Theoretical, PubMed, Semantics, User-Computer Interface, Information Seeking Behavior, Knowledge Bases, Medical Informatics Applications, Software
- Abstract
Findings from information-seeking behavior research can inform application development. In this report we provide a system description of Spark, an application based on findings from Serendipitous Knowledge Discovery studies and data structures known as semantic predications. Background information and the previously published IF-SKD model (outlining Serendipitous Knowledge Discovery in online environments) illustrate the potential use of information-seeking behavior in application design. A detailed overview of the Spark system illustrates how methodologies in design and retrieval functionality enable production of semantic predication graphs tailored to evoke Serendipitous Knowledge Discovery in users., (Copyright © 2016. Published by Elsevier Inc.)
- Published
- 2016
- Full Text
- View/download PDF
14. Networks of neuroinjury semantic predications to identify biomarkers for mild traumatic brain injury.
- Author
-
Cairelli MJ, Fiszman M, Zhang H, and Rindflesch TC
- Abstract
Objective: Mild traumatic brain injury (mTBI) has high prevalence in the military, among athletes, and in the general population worldwide (largely due to falls). Consequences can include a range of neuropsychological disorders. Unfortunately, such neural injury often goes undiagnosed due to the difficulty in identifying symptoms, so the discovery of an effective biomarker would greatly assist diagnosis; however, no single biomarker has been identified. We identify several body substances as potential components of a panel of biomarkers to support the diagnosis of mild traumatic brain injury., Methods: Our approach to diagnostic biomarker discovery combines ideas and techniques from systems medicine, natural language processing, and graph theory. We create a molecular interaction network that represents neural injury and is composed of relationships automatically extracted from the literature. We retrieve citations related to neurological injury and extract relationships (semantic predications) that contain potential biomarkers. After linking all relationships together to create a network representing neural injury, we filter the network by relationship frequency and concept connectivity to reduce the set to a manageable size of higher interest substances., Results: 99,437 relevant citations yielded 26,441 unique relations. 18,085 of these contained a potential biomarker as subject or object with a total of 6246 unique concepts. After filtering by graph metrics, the set was reduced to 1021 relationships with 49 unique concepts, including 17 potential biomarkers., Conclusion: We created a network of relationships containing substances derived from 99,437 citations and filtered using graph metrics to provide a set of 17 potential biomarkers. We discuss the interaction of several of these (glutamate, glucose, and lactate) as the basis for more effective diagnosis than is currently possible. This method provides an opportunity to focus the effort of wet bench research on those substances with the highest potential as biomarkers for mTBI.
- Published
- 2015
- Full Text
- View/download PDF
15. Context-driven automatic subgraph creation for literature-based discovery.
- Author
-
Cameron D, Kavuluru R, Rindflesch TC, Sheth AP, Thirunarayan K, and Bodenreider O
- Subjects
- Algorithms, Databases, Factual, Humans, Medical Subject Headings, Models, Theoretical, Semantics, Cluster Analysis, Data Mining methods, Knowledge Discovery methods
- Abstract
Background: Literature-based discovery (LBD) is characterized by uncovering hidden associations in non-interacting scientific literature. Prior approaches to LBD include use of: (1) domain expertise and structured background knowledge to manually filter and explore the literature, (2) distributional statistics and graph-theoretic measures to rank interesting connections, and (3) heuristics to help eliminate spurious connections. However, manual approaches to LBD are not scalable and purely distributional approaches may not be sufficient to obtain insights into the meaning of poorly understood associations. While several graph-based approaches have the potential to elucidate associations, their effectiveness has not been fully demonstrated. A considerable degree of a priori knowledge, heuristics, and manual filtering is still required., Objectives: In this paper we implement and evaluate a context-driven, automatic subgraph creation method that captures multifaceted complex associations between biomedical concepts to facilitate LBD. Given a pair of concepts, our method automatically generates a ranked list of subgraphs, which provide informative and potentially unknown associations between such concepts., Methods: To generate subgraphs, the set of all MEDLINE articles that contain either of the two specified concepts (A, C) are first collected. Then binary relationships or assertions, which are automatically extracted from the MEDLINE articles, called semantic predications, are used to create a labeled directed predications graph. In this predications graph, a path is represented as a sequence of semantic predications. The hierarchical agglomerative clustering (HAC) algorithm is then applied to cluster paths that are bounded by the two concepts (A, C). HAC relies on implicit semantics captured through Medical Subject Heading (MeSH) descriptors, and explicit semantics from the MeSH hierarchy, for clustering. Paths that exceed a threshold of semantic relatedness are clustered into subgraphs based on their shared context. Finally, the automatically generated clusters are provided as a ranked list of subgraphs., Results: The subgraphs generated using this approach facilitated the rediscovery of 8 out of 9 existing scientific discoveries. In particular, they directly (or indirectly) led to the recovery of several intermediates (or B-concepts) between A- and C-terms, while also providing insights into the meaning of the associations. Such meaning is derived from predicates between the concepts, as well as the provenance of the semantic predications in MEDLINE. Additionally, by generating subgraphs on different thematic dimensions (such as Cellular Activity, Pharmaceutical Treatment and Tissue Function), the approach may enable a broader understanding of the nature of complex associations between concepts. Finally, in a statistical evaluation to determine the interestingness of the subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE on average., Conclusion: These results suggest that leveraging the implicit and explicit semantics provided by manually assigned MeSH descriptors is an effective representation for capturing the underlying context of complex associations, along multiple thematic dimensions in LBD situations., (Published by Elsevier Inc.)
- Published
- 2015
- Full Text
- View/download PDF
16. Biomedical question answering using semantic relations.
- Author
-
Hristovski D, Dinevski D, Kastrin A, and Rindflesch TC
- Subjects
- Databases, Factual, Humans, Pharmacogenetics, Abstracting and Indexing, Algorithms, Information Storage and Retrieval, Natural Language Processing, Oligonucleotide Array Sequence Analysis, Semantics, Software
- Abstract
Background: The proliferation of the scientific literature in the field of biomedicine makes it difficult to keep abreast of current knowledge, even for domain experts. While general Web search engines and specialized information retrieval (IR) systems have made important strides in recent decades, the problem of accurate knowledge extraction from the biomedical literature is far from solved. Classical IR systems usually return a list of documents that have to be read by the user to extract relevant information. This tedious and time-consuming work can be lessened with automatic Question Answering (QA) systems, which aim to provide users with direct and precise answers to their questions. In this work we propose a novel methodology for QA based on semantic relations extracted from the biomedical literature., Results: We extracted semantic relations with the SemRep natural language processing system from 122,421,765 sentences, which came from 21,014,382 MEDLINE citations (i.e., the complete MEDLINE distribution up to the end of 2012). A total of 58,879,300 semantic relation instances were extracted and organized in a relational database. The QA process is implemented as a search in this database, which is accessed through a Web-based application, called SemBT (available at http://sembt.mf.uni-lj.si ). We conducted an extensive evaluation of the proposed methodology in order to estimate the accuracy of extracting a particular semantic relation from a particular sentence. Evaluation was performed by 80 domain experts. In total 7,510 semantic relation instances belonging to 2,675 distinct relations were evaluated 12,083 times. The instances were evaluated as correct 8,228 times (68%)., Conclusions: In this work we propose an innovative methodology for biomedical QA. The system is implemented as a Web-based application that is able to provide precise answers to a wide range of questions. A typical question is answered within a few seconds. The tool has some extensions that make it especially useful for interpretation of DNA microarray results.
- Published
- 2015
- Full Text
- View/download PDF
17. Constructing a Graph Database for Semantic Literature-Based Discovery.
- Author
-
Hristovski D, Kastrin A, Dinevski D, and Rindflesch TC
- Subjects
- Database Management Systems, Machine Learning, Semantics, Data Mining methods, Databases, Factual, Natural Language Processing, Periodicals as Topic, Terminology as Topic, Vocabulary, Controlled
- Abstract
Literature-based discovery (LBD) generates discoveries, or hypotheses, by combining what is already known in the literature. Potential discoveries have the form of relations between biomedical concepts; for example, a drug may be determined to treat a disease other than the one for which it was intended. LBD views the knowledge in a domain as a network; a set of concepts along with the relations between them. As a starting point, we used SemMedDB, a database of semantic relations between biomedical concepts extracted with SemRep from Medline. SemMedDB is distributed as a MySQL relational database, which has some problems when dealing with network data. We transformed and uploaded SemMedDB into the Neo4j graph database, and implemented the basic LBD discovery algorithms with the Cypher query language. We conclude that storing the data needed for semantic LBD is more natural in a graph database. Also, implementing LBD discovery algorithms is conceptually simpler with a graph query language when compared with standard SQL.
- Published
- 2015
18. Identifying plausible adverse drug reactions using knowledge extracted from the literature.
- Author
-
Shang N, Xu H, Rindflesch TC, and Cohen T
- Subjects
- Adverse Drug Reaction Reporting Systems, Algorithms, Biomedical Research, Humans, MEDLINE, ROC Curve, Data Mining methods, Drug-Related Side Effects and Adverse Reactions, Natural Language Processing, Semantics
- Abstract
Pharmacovigilance involves continually monitoring drug safety after drugs are put to market. To aid this process; algorithms for the identification of strongly correlated drug/adverse drug reaction (ADR) pairs from data sources such as adverse event reporting systems or Electronic Health Records have been developed. These methods are generally statistical in nature, and do not draw upon the large volumes of knowledge embedded in the biomedical literature. In this paper, we investigate the ability of scalable Literature Based Discovery (LBD) methods to identify side effects of pharmaceutical agents. The advantage of LBD methods is that they can provide evidence from the literature to support the plausibility of a drug/ADR association, thereby assisting human review to validate the signal, which is an essential component of pharmacovigilance. To do so, we draw upon vast repositories of knowledge that has been extracted from the biomedical literature by two Natural Language Processing tools, MetaMap and SemRep. We evaluate two LBD methods that scale comfortably to the volume of knowledge available in these repositories. Specifically, we evaluate Reflective Random Indexing (RRI), a model based on concept-level co-occurrence, and Predication-based Semantic Indexing (PSI), a model that encodes the nature of the relationship between concepts to support reasoning analogically about drug-effect relationships. An evaluation set was constructed from the Side Effect Resource 2 (SIDER2), which contains known drug/ADR relations, and models were evaluated for their ability to "rediscover" these relations. In this paper, we demonstrate that both RRI and PSI can recover known drug-adverse event associations. However, PSI performed better overall, and has the additional advantage of being able to recover the literature underlying the reasoning pathways it used to make its predictions., (Copyright © 2014 Elsevier Inc. All rights reserved.)
- Published
- 2014
- Full Text
- View/download PDF
19. Semantic processing to identify adverse drug event information from black box warnings.
- Author
-
Culbertson A, Fiszman M, Shin D, and Rindflesch TC
- Subjects
- Feasibility Studies, Humans, Internet, Semantics, United States, United States Food and Drug Administration, Drug Labeling, Drug-Related Side Effects and Adverse Reactions, Natural Language Processing, Prescription Drugs adverse effects
- Abstract
Adverse drug events account for two million combined injuries, hospitalizations, or deaths each year. Furthermore, there are few comprehensive, up-to-date, and free sources of drug information. Clinical decision support systems may significantly mitigate the number of adverse drug events. However, these systems depend on up-to-date, comprehensive, and codified data to serve as input. The DailyMed website, a resource managed by the FDA and NLM, contains all currently approved drugs. We used a semantic natural language processing approach that successfully extracted information for adverse drug events, at-risk conditions, and susceptible populations from black box warning labels on this site. The precision, recall, and F-score were, 94%, 52%, 0.67 for adverse drug events; 80%, 53%, and 0.64 for conditions; and 95%, 44%, 0.61 for populations. Overall performance was 90% precision, 51% recall, and 0.65 F-Score. Information extracted can be stored in a structured format and may support clinical decision support systems.
- Published
- 2014
20. Exploiting Literature-derived Knowledge and Semantics to Identify Potential Prostate Cancer Drugs.
- Author
-
Zhang R, Cairelli MJ, Fiszman M, Kilicoglu H, Rindflesch TC, Pakhomov SV, and Melton GB
- Abstract
In this study, we report on the performance of an automated approach to discovery of potential prostate cancer drugs from the biomedical literature. We used the semantic relationships in SemMedDB, a database of structured knowledge extracted from all MEDLINE citations using SemRep, to extract potential relationships using knowledge of cancer drugs pathways. Two cancer drugs pathway schemas were constructed using these relationships extracted from SemMedDB. Through both pathway schemas, we found drugs already used for prostate cancer therapy and drugs not currently listed as the prostate cancer medications. Our study demonstrates that the appropriate linking of relevant structured semantic relationships stored in SemMedDB can support the discovery of potential prostate cancer drugs.
- Published
- 2014
- Full Text
- View/download PDF
21. Large-scale structure of a network of co-occurring MeSH terms: statistical analysis of macroscopic properties.
- Author
-
Kastrin A, Rindflesch TC, and Hristovski D
- Subjects
- Algorithms, Humans, Models, Statistical, Principal Component Analysis, Computational Biology methods, Medical Subject Headings
- Abstract
Concept associations can be represented by a network that consists of a set of nodes representing concepts and a set of edges representing their relationships. Complex networks exhibit some common topological features including small diameter, high degree of clustering, power-law degree distribution, and modularity. We investigated the topological properties of a network constructed from co-occurrences between MeSH descriptors in the MEDLINE database. We conducted the analysis on two networks, one constructed from all MeSH descriptors and another using only major descriptors. Network reduction was performed using the Pearson's chi-square test for independence. To characterize topological properties of the network we adopted some specific measures, including diameter, average path length, clustering coefficient, and degree distribution. For the full MeSH network the average path length was 1.95 with a diameter of three edges and clustering coefficient of 0.26. The Kolmogorov-Smirnov test rejects the power law as a plausible model for degree distribution. For the major MeSH network the average path length was 2.63 edges with a diameter of seven edges and clustering coefficient of 0.15. The Kolmogorov-Smirnov test failed to reject the power law as a plausible model. The power-law exponent was 5.07. In both networks it was evident that nodes with a lower degree exhibit higher clustering than those with a higher degree. After simulated attack, where we removed 10% of nodes with the highest degrees, the giant component of each of the two networks contains about 90% of all nodes. Because of small average path length and high degree of clustering the MeSH network is small-world. A power-law distribution is not a plausible model for the degree distribution. The network is highly modular, highly resistant to targeted and random attack and with minimal dissortativity.
- Published
- 2014
- Full Text
- View/download PDF
22. Augmenting microarray data with literature-based knowledge to enhance gene regulatory network inference.
- Author
-
Chen G, Cairelli MJ, Kilicoglu H, Shin D, and Rindflesch TC
- Subjects
- Databases, Factual, Oligonucleotide Array Sequence Analysis, Computational Biology methods, Data Mining methods, Gene Expression Profiling methods, Gene Regulatory Networks, Knowledge Bases
- Abstract
Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.
- Published
- 2014
- Full Text
- View/download PDF
23. Using semantic predications to uncover drug-drug interactions in clinical data.
- Author
-
Zhang R, Cairelli MJ, Fiszman M, Rosemblat G, Kilicoglu H, Rindflesch TC, Pakhomov SV, and Melton GB
- Subjects
- Angiotensin-Converting Enzyme Inhibitors administration & dosage, Angiotensin-Converting Enzyme Inhibitors adverse effects, Humans, Lisinopril administration & dosage, Lisinopril adverse effects, Selective Serotonin Reuptake Inhibitors administration & dosage, Selective Serotonin Reuptake Inhibitors adverse effects, Sertraline administration & dosage, Sertraline adverse effects, Drug Interactions, Semantics
- Abstract
In this study we report on potential drug-drug interactions between drugs occurring in patient clinical data. Results are based on relationships in SemMedDB, a database of structured knowledge extracted from all MEDLINE citations (titles and abstracts) using SemRep. The core of our methodology is to construct two potential drug-drug interaction schemas, based on relationships extracted from SemMedDB. In the first schema, Drug1 and Drug2 interact through Drug1's effect on some gene, which in turn affects Drug2. In the second, Drug1 affects Gene1, while Drug2 affects Gene2. Gene1 and Gene2, together, then have an effect on some biological function. After checking each drug pair from the medication lists of each of 22 patients, we found 19 known and 62 unknown drug-drug interactions using both schemas. For example, our results suggest that the interaction of Lisinopril, an ACE inhibitor commonly prescribed for hypertension, and the antidepressant sertraline can potentially increase the likelihood and possibly the severity of psoriasis. We also assessed the relationships extracted by SemRep from a linguistic perspective and found that the precision of SemRep was 0.58 for 300 randomly selected sentences from MEDLINE. Our study demonstrates that the use of structured knowledge in the form of relationships from the biomedical literature can support the discovery of potential drug-drug interactions occurring in patient clinical data. Moreover, SemMedDB provides a good knowledge resource for expanding the range of drugs, genes, and biological functions considered as elements in various drug-drug interaction pathways., (Copyright © 2014 Elsevier Inc. All rights reserved.)
- Published
- 2014
- Full Text
- View/download PDF
24. Link prediction in a MeSH co-occurrence network: preliminary results.
- Author
-
Kastrin A, Rindflesch TC, and Hristovski D
- Subjects
- Pilot Projects, Semantics, Artificial Intelligence, MEDLINE, Medical Subject Headings, Natural Language Processing, Pattern Recognition, Automated methods, Periodicals as Topic, Terminology as Topic
- Abstract
Literature-based discovery (LBD) refers to automatic discovery of implicit relations from the scientific literature. Co-occurrence associations between biomedical concepts are commonly used in LBD. These co-occurrences can be represented as a network that consists of a set of nodes representing concepts and a set of edges representing their relationships (or links). In this paper we propose and evaluate a methodology for link prediction of implicit connections in a network of co-occurring Medical Subject Headings (MeSH®). The proposed approach is complementary to, and may augment, existing LBD methods. Link prediction was performed using Jaccard and Adamic-Adar similarity measures. The preliminary results showed high prediction performance, with area under the ROC curve of 0.78 and 0.82 for the two similarity measures, respectively.
- Published
- 2014
25. A methodology for extending domain coverage in SemRep.
- Author
-
Rosemblat G, Shin D, Kilicoglu H, Sneiderman C, and Rindflesch TC
- Subjects
- Information Storage and Retrieval, Unified Medical Language System, Natural Language Processing, Semantics
- Abstract
We describe a domain-independent methodology to extend SemRep coverage beyond the biomedical domain. SemRep, a natural language processing application originally designed for biomedical texts, uses the knowledge sources provided by the Unified Medical Language System (UMLS©). Ontological and terminological extensions to the system are needed in order to support other areas of knowledge. We extended SemRep's application by developing a semantic representation of a previously unsupported domain. This was achieved by adapting well-known ontology engineering phases and integrating them with the UMLS knowledge sources on which SemRep crucially depends. While the process to extend SemRep coverage has been successfully applied in earlier projects, this paper presents in detail the step-wise approach we followed and the mechanisms implemented. A case study in the field of medical informatics illustrates how the ontology engineering phases have been adapted for optimal integration with the UMLS. We provide qualitative and quantitative results, which indicate the validity and usefulness of our methodology., (Published by Elsevier Inc.)
- Published
- 2013
- Full Text
- View/download PDF
26. Semantic MEDLINE for discovery browsing: using semantic predications and the literature-based discovery paradigm to elucidate a mechanism for the obesity paradox.
- Author
-
Cairelli MJ, Miller CM, Fiszman M, Workman TE, and Rindflesch TC
- Subjects
- Humans, Semantics, Information Storage and Retrieval methods, MEDLINE, Natural Language Processing, Obesity complications, Obesity mortality
- Abstract
Applying the principles of literature-based discovery (LBD), we elucidate the paradox that obesity is beneficial in critical care despite contributing to disease generally. Our approach enhances a previous extension to LBD, called "discovery browsing," and is implemented using Semantic MEDLINE, which summarizes the results of a PubMed search into an interactive graph of semantic predications. The methodology allows a user to construct argumentation underpinning an answer to a biomedical question by engaging the user in an iterative process between system output and user knowledge. Components of the Semantic MEDLINE output graph identified as "interesting" by the user both contribute to subsequent searches and are constructed into a logical chain of relationships constituting an explanatory network in answer to the initial question. Based on this methodology we suggest that phthalates leached from plastic in critical care interventions activate PPAR gamma, which is anti-inflammatory and abundant in obese patients.
- Published
- 2013
27. Semantic processing to identify adverse drug event information from black box warnings.
- Author
-
Culbertson A, Fiszman M, Shin D, and Rindflesch TC
- Subjects
- Semantics, Drug Labeling, Drug-Related Side Effects and Adverse Reactions, Natural Language Processing
- Abstract
We utilized a semantic natural language processing approach to extract adverse drug event information from FDA black box warnings. Overall performance was 90% precision, 51% recall, and 0.65 F-Score. Information extracted can be stored in a structured format and may be useful to support clinical decision support systems.
- Published
- 2013
28. A literature-based assessment of concept pairs as a measure of semantic relatedness.
- Author
-
Workman TE, Rosemblat G, Fiszman M, and Rindflesch TC
- Subjects
- Humans, Information Storage and Retrieval, Physicians, PubMed, Statistics, Nonparametric, Natural Language Processing, Semantics, Subject Headings, Unified Medical Language System
- Abstract
The semantic relatedness between two concepts, according to human perception, is domain-rooted and reflects prior knowledge. We developed a new method for semantic relatedness assessment that reflects human judgment, utilizing semantic predications extracted from PubMed citations by SemRep. We compared the new method to other approaches utilizing path-based, statistical, and context vector methods, using a gold standard for evaluation. The new method outperformed all others, except one variation of the context vector technique. These findings have implications in several natural language processing applications, such as serendipitous knowledge discovery.
- Published
- 2013
29. Extending SemRep to the Public Health Domain.
- Author
-
Rosemblat G, Resnick MP, Auston I, Shin D, Sneiderman C, Fizsman M, and Rindflesch TC
- Abstract
We describe the use of a domain-independent methodology to extend a natural language processing (NLP) application, SemRep (Rindflesch, Fiszman, & Libbus, 2005), based on the knowledge sources afforded by the Unified Medical Language System (UMLS®) (Humphreys, Lindberg, Schoolman, & Barnett, 1998) to support the area of health promotion within the public health domain. Public health professionals require good information about successful health promotion policies and programs that might be considered for application within their own communities. Our effort seeks to improve access to relevant information for the public health profession, to help those in the field remain an information-savvy workforce. NLP and semantic techniques hold promise to help public health professionals navigate the growing ocean of information by organizing and structuring this knowledge into a focused public health framework paired with a user-friendly visualization application as a way to summarize results of PubMed searches in this field of knowledge.
- Published
- 2013
- Full Text
- View/download PDF
30. Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine.
- Author
-
Friedman C, Rindflesch TC, and Corn M
- Subjects
- United States, Education, National Library of Medicine (U.S.), Natural Language Processing
- Abstract
Natural language processing (NLP) is crucial for advancing healthcare because it is needed to transform relevant information locked in text into structured data that can be used by computer processes aimed at improving patient care and advancing medicine. In light of the importance of NLP to health, the National Library of Medicine (NLM) recently sponsored a workshop to review the state of the art in NLP focusing on text in English, both in biomedicine and in the general language domain. Specific goals of the NLM-sponsored workshop were to identify the current state of the art, grand challenges and specific roadblocks, and to identify effective use and best practices. This paper reports on the main outcomes of the workshop, including an overview of the state of the art, strategies for advancing the field, and obstacles that need to be addressed, resulting in recommendations for a research agenda intended to advance the field., (Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.)
- Published
- 2013
- Full Text
- View/download PDF
31. Clustering cliques for graph-based summarization of the biomedical research literature.
- Author
-
Zhang H, Fiszman M, Shin D, Wilkowski B, and Rindflesch TC
- Subjects
- Algorithms, Biomedical Research, Cluster Analysis, Medical Subject Headings, Semantics, Data Mining methods, PubMed
- Abstract
Background: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts)., Results: SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings., Conclusions: For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively.
- Published
- 2013
- Full Text
- View/download PDF
32. A graph-based recovery and decomposition of Swanson's hypothesis using semantic predications.
- Author
-
Cameron D, Bodenreider O, Yalamanchili H, Danh T, Vallabhaneni S, Thirunarayan K, Sheth AP, and Rindflesch TC
- Subjects
- Blood Viscosity, Computational Biology trends, Data Mining trends, Humans, Platelet Aggregation, Raynaud Disease, Computational Biology methods, Data Mining methods, Knowledge Discovery methods, Models, Theoretical, Semantics
- Abstract
Objectives: This paper presents a methodology for recovering and decomposing Swanson's Raynaud Syndrome-Fish Oil hypothesis semi-automatically. The methodology leverages the semantics of assertions extracted from biomedical literature (called semantic predications) along with structured background knowledge and graph-based algorithms to semi-automatically capture the informative associations originally discovered manually by Swanson. Demonstrating that Swanson's manually intensive techniques can be undertaken semi-automatically, paves the way for fully automatic semantics-based hypothesis generation from scientific literature., Methods: Semantic predications obtained from biomedical literature allow the construction of labeled directed graphs which contain various associations among concepts from the literature. By aggregating such associations into informative subgraphs, some of the relevant details originally articulated by Swanson have been uncovered. However, by leveraging background knowledge to bridge important knowledge gaps in the literature, a methodology for semi-automatically capturing the detailed associations originally explicated in natural language by Swanson, has been developed., Results: Our methodology not only recovered the three associations commonly recognized as Swanson's hypothesis, but also decomposed them into an additional 16 detailed associations, formulated as chains of semantic predications. Altogether, 14 out of the 19 associations that can be attributed to Swanson were retrieved using our approach. To the best of our knowledge, such an in-depth recovery and decomposition of Swanson's hypothesis has never been attempted., Conclusion: In this work therefore, we presented a methodology to semi-automatically recover and decompose Swanson's RS-DFO hypothesis using semantic representations and graph algorithms. Our methodology provides new insights into potential prerequisites for semantics-driven Literature-Based Discovery (LBD). Based on our observations, three critical aspects of LBD include: (1) the need for more expressive representations beyond Swanson's ABC model; (2) an ability to accurately extract semantic information from text; and (3) the semantic integration of scientific literature and structured background knowledge., (Published by Elsevier Inc.)
- Published
- 2013
- Full Text
- View/download PDF
33. Integration of data from omic studies with the literature-based discovery towards identification of novel treatments for neovascularization in diabetic retinopathy.
- Author
-
Maver A, Hristovski D, Rindflesch TC, and Peterlin B
- Subjects
- Animals, Humans, Mice, Rats, Signal Transduction genetics, Diabetic Retinopathy etiology, Neovascularization, Pathologic genetics, Transcriptome genetics
- Abstract
Diabetic retinopathy (DR) is a secondary complication of diabetes associated with retinal neovascularization and represents the leading cause of blindness in the adult population in the developed world. Despite research efforts, the nature of pathogenetic processes leading to DR is still unknown, making development of novel effective treatments difficult. Advances in omic technologies now offer unprecedented insight into global molecular alterations in DR, but identification of novel treatments based on massive amounts of data generated in omic studies still represents a considerable challenge. For this reason, we attempted to facilitate discovery of novel treatments for DR by complementing the interpretation of omic results using the vast body of information existing in the published literature with the literature-based discovery (LBD) approaches. To achieve this, we collected data from transcriptomic studies performed on retinal tissue from animal models of DR, performed a meta-analysis of these datasets and identified altered genes and pathways. Using the SemBT LBD framework, we have determined which therapies could regulate perturbed pathways or that could stabilize the gene expression alterations in DR. We show that by using this approach, we not only could reidentify drugs currently in use or in clinical trials, but also could indicate novel treatment directions for ameliorating neovascularization processes in DR.
- Published
- 2013
- Full Text
- View/download PDF
34. Discovering discovery patterns with Predication-based Semantic Indexing.
- Author
-
Cohen T, Widdows D, Schvaneveldt RW, Davies P, and Rindflesch TC
- Subjects
- Algorithms, Drug Therapy, MEDLINE, Natural Language Processing, Pattern Recognition, Automated, Publications, Abstracting and Indexing methods, Drug Discovery, Pharmaceutical Preparations, Semantics
- Abstract
In this paper we utilize methods of hyperdimensional computing to mediate the identification of therapeutically useful connections for the purpose of literature-based discovery. Our approach, named Predication-based Semantic Indexing, is utilized to identify empirically sequences of relationships known as "discovery patterns", such as "drug x INHIBITS substance y, substance y CAUSES disease z" that link pharmaceutical substances to diseases they are known to treat. These sequences are derived from semantic predications extracted from the biomedical literature by the SemRep system, and subsequently utilized to direct the search for known treatments for a held out set of diseases. Rapid and efficient inference is accomplished through the application of geometric operators in PSI space, allowing for both the derivation of discovery patterns from a large set of known TREATS relationships, and the application of these discovered patterns to constrain search for therapeutic relationships at scale. Our results include the rediscovery of discovery patterns that have been constructed manually by other authors in previous research, as well as the discovery of a set of previously unrecognized patterns. The application of these patterns to direct search through PSI space results in better recovery of therapeutic relationships than is accomplished with models based on distributional statistics alone. These results demonstrate the utility of efficient approximate inference in geometric space as a means to identify therapeutic relationships, suggesting a role of these methods in drug repurposing efforts. In addition, the results provide strong support for the utility of the discovery pattern approach pioneered by Hristovski and his colleagues., (Copyright © 2012 Elsevier Inc. All rights reserved.)
- Published
- 2012
- Full Text
- View/download PDF
35. SemMedDB: a PubMed-scale repository of biomedical semantic predications.
- Author
-
Kilicoglu H, Shin D, Fiszman M, Rosemblat G, and Rindflesch TC
- Subjects
- Algorithms, Data Mining, Humans, Unified Medical Language System, Databases, Factual, PubMed, Semantics
- Abstract
Summary: Effective access to the vast biomedical knowledge present in the scientific literature is challenging. Semantic relations are increasingly used in knowledge management applications supporting biomedical research to help address this challenge. We describe SemMedDB, a repository of semantic predications (subject-predicate-object triples) extracted from the entire set of PubMed citations. We propose the repository as a knowledge resource that can assist in hypothesis generation and literature-based discovery in biomedicine as well as in clinical decision-making support., Availability and Implementation: The SemMedDB repository is available as a MySQL database for non-commercial use at http://skr3.nlm.nih.gov/SemMedDB. An UMLS Metathesaurus license is required., Contact: kilicogluh@mail.nih.gov.
- Published
- 2012
- Full Text
- View/download PDF
36. A closed literature-based discovery technique finds a mechanistic link between hypogonadism and diminished sleep quality in aging men.
- Author
-
Miller CM, Rindflesch TC, Fiszman M, Hristovski D, Shin D, Rosemblat G, Zhang H, and Strohl KP
- Subjects
- Humans, Hydrocortisone blood, MEDLINE, Male, Middle Aged, Sleep Initiation and Maintenance Disorders blood, Testosterone blood, Aging blood, Hypogonadism blood, Hypogonadism complications, Sleep Initiation and Maintenance Disorders complications
- Abstract
Study Objectives: Sleep quality commonly diminishes with age, and, further, aging men often exhibit a wider range of sleep pathologies than women. We used a freely available, web-based discovery technique (Semantic MEDLINE) supported by semantic relationships to automatically extract information from MEDLINE titles and abstracts., Design: We assumed that testosterone is associated with sleep (the A-C relationship in the paradigm) and looked for a mechanism to explain this association (B explanatory link) as a potential or partial mechanism underpinning the etiology of eroded sleep quality in aging men., Measurements and Results: Review of full-text papers in critical nodes discovered in this manner resulted in the proposal that testosterone enhances sleep by inhibiting cortisol. Using this discovery method, we posit, and could confirm as a novel hypothesis, cortisol as part of a mechanistic link elucidating the observed correlation between decreased testosterone in aging men and diminished sleep quality., Conclusions: This approach is publically available and useful not only in this manner but also to generate from the literature alternative explanatory models for observed experimental results.
- Published
- 2012
- Full Text
- View/download PDF
37. Constructing a semantic predication gold standard from the biomedical literature.
- Author
-
Kilicoglu H, Rosemblat G, Fiszman M, and Rindflesch TC
- Subjects
- Humans, MEDLINE, Semantics, Unified Medical Language System, United States, Vocabulary, Controlled, Data Mining standards
- Abstract
Background: Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology., Results: We obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations., Conclusions: While interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is particularly challenging. While the resulting gold standard is mainly intended to serve as a test collection for our semantic interpreter, we believe that the lessons learned are applicable generally.
- Published
- 2011
- Full Text
- View/download PDF
38. Degree centrality for semantic abstraction summarization of therapeutic studies.
- Author
-
Zhang H, Fiszman M, Shin D, Miller CM, Rosemblat G, and Rindflesch TC
- Subjects
- Algorithms, Humans, MEDLINE, Natural Language Processing, Unified Medical Language System, Information Storage and Retrieval methods, Semantics
- Abstract
Automatic summarization has been proposed to help manage the results of biomedical information retrieval systems. Semantic MEDLINE, for example, summarizes semantic predications representing assertions in MEDLINE citations. Results are presented as a graph which maintains links to the original citations. Graphs summarizing more than 500 citations are hard to read and navigate, however. We exploit graph theory for focusing these large graphs. The method is based on degree centrality, which measures connectedness in a graph. Four categories of clinical concepts related to treatment of disease were identified and presented as a summary of input text. A baseline was created using term frequency of occurrence. The system was evaluated on summaries for treatment of five diseases compared to a reference standard produced manually by two physicians. The results showed that recall for system results was 72%, precision was 73%, and F-score was 0.72. The system F-score was considerably higher than that for the baseline (0.47)., (Published by Elsevier Inc.)
- Published
- 2011
- Full Text
- View/download PDF
39. Graph-based methods for discovery browsing with semantic predications.
- Author
-
Wilkowski B, Fiszman M, Miller CM, Hristovski D, Arabandi S, Rosemblat G, and Rindflesch TC
- Subjects
- Humans, Semantics, Unified Medical Language System, Depressive Disorder physiopathology, Information Storage and Retrieval methods, MEDLINE, Natural Language Processing
- Abstract
We present an extension to literature-based discovery that goes beyond making discoveries to a principled way of navigating through selected aspects of some biomedical domain. The method is a type of "discovery browsing" that guides the user through the research literature on a specified phenomenon. Poorly understood relationships may be explored through novel points of view, and potentially interesting relationships need not be known ahead of time. In a process of "cooperative reciprocity" the user iteratively focuses system output, thus controlling the large number of relationships often generated in literature-based discovery systems. The underlying technology exploits SemRep semantic predications represented as a graph of interconnected nodes (predication arguments) and edges (predicates). The system suggests paths in this graph, which represent chains of relationships. The methodology is illustrated with depressive disorder and focuses on the interaction of inflammation, circadian phenomena, and the neurotransmitter norepinephrine. Insight provided may contribute to enhanced understanding of the pathophysiology, treatment, and prevention of this disorder.
- Published
- 2011
40. Adapting Semantic Natural Language Processing Technology to Address Information Overload in Influenza Epidemic Management.
- Author
-
Keselman A, Rosemblat G, Kilicoglu H, Fiszman M, Jin H, Shin D, and Rindflesch TC
- Abstract
Explosion of disaster health information results in information overload among response professionals. The objective of this project was to determine the feasibility of applying semantic natural language processing (NLP) technology to addressing this overload. The project characterizes concepts and relationships commonly used in disaster health-related documents on influenza pandemics, as the basis for adapting an existing semantic summarizer to the domain. Methods include human review and semantic NLP analysis of a set of relevant documents. This is followed by a pilot-test in which two information specialists use the adapted application for a realistic information seeking task. According to the results, the ontology of influenza epidemics management can be described via a manageable number of semantic relationships that involve concepts from a limited number of semantic types. Test users demonstrate several ways to engage with the application to obtain useful information. This suggests that existing semantic NLP algorithms can be adapted to support information summarization and visualization in influenza epidemics and other disaster health areas. However, additional research is needed in the areas of terminology development (as many relevant relationships and terms are not part of existing standardized vocabularies), NLP, and user interface design.
- Published
- 2010
- Full Text
- View/download PDF
41. A Knowledge Intensive Approach to Mapping Clinical Narrative to LOINC.
- Author
-
Fiszman M, Shin D, Sneiderman CA, Jin H, and Rindflesch TC
- Subjects
- Humans, Narration, Logical Observation Identifiers Names and Codes, Natural Language Processing
- Abstract
Many natural language processing systems are being applied to clinical text, yet clinically useful results are obtained only by honing a system to a particular context. We suggest that concentration on the information needed for this processing is crucial and present a knowledge intensive methodology for mapping clinical text to LOINC. The system takes published case reports as input and maps vital signs and body measurements and reports of diagnostic procedures to fully specified LOINC codes. Three kinds of knowledge are exploited: textual, ontological, and pragmatic (including information about physiology and the clinical process). Evaluation on 4809 sentences yielded precision of 89% and recall of 93% (F-score 0.91). Our method could form the basis for a system to provide semi-automated help to human coders.
- Published
- 2010
42. Biomedical text summarization to support genetic database curation: using Semantic MEDLINE to create a secondary database of genetic information.
- Author
-
Workman TE, Fiszman M, Hurdle JF, and Rindflesch TC
- Subjects
- Genetics, Medical, Humans, Subject Headings, United States, Databases, Genetic, Information Storage and Retrieval methods, MEDLINE, Natural Language Processing, Semantics, Terminology as Topic
- Abstract
Objective: This paper examines the development and evaluation of an automatic summarization system in the domain of molecular genetics. The system is a potential component of an advanced biomedical information management application called Semantic MEDLINE and could assist librarians in developing secondary databases of genetic information extracted from the primary literature., Methods: An existing summarization system was modified for identifying biomedical text relevant to the genetic etiology of disease. The summarization system was evaluated on the task of identifying data describing genes associated with bladder cancer in MEDLINE citations. A gold standard was produced using records from Genetics Home Reference and Online Mendelian Inheritance in Man. Genes in text found by the system were compared to the gold standard. Recall, precision, and F-measure were calculated., Results: The system achieved recall of 46%, and precision of 88% (F-measure=0.61) by taking Gene References into Function (GeneRIFs) into account., Conclusion: The new summarization schema for genetic etiology has potential as a component in Semantic MEDLINE to support the work of data curators.
- Published
- 2010
- Full Text
- View/download PDF
43. Predication-based semantic indexing: permutations as a means to encode predications in semantic space.
- Author
-
Cohen T, Schvaneveldt RW, and Rindflesch TC
- Subjects
- MEDLINE, Models, Theoretical, Unified Medical Language System, Abstracting and Indexing methods, Information Storage and Retrieval, Natural Language Processing, Semantics
- Abstract
Corpus-derived distributional models of semantic distance between terms have proved useful in a number of applications. For both theoretical and practical reasons, it is desirable to extend these models to encode discrete concepts and the ways in which they are related to one another. In this paper, we present a novel vector space model that encodes semantic predications derived from MEDLINE by the SemRep system into a compact spatial representation. The associations captured by this method are of a different and complementary nature to those derived by traditional vector space models, and the encoding of predication types presents new possibilities for knowledge discovery and information retrieval.
- Published
- 2009
44. Semantic relations for interpreting DNA microarray data.
- Author
-
Hristovski D, Kastrin A, Peterlin B, and Rindflesch TC
- Subjects
- Humans, Parkinson Disease genetics, Systems Integration, Unified Medical Language System, Databases, Factual, Information Storage and Retrieval methods, Natural Language Processing, Oligonucleotide Array Sequence Analysis, Semantics
- Abstract
The results from microarray experiments, in the form of lists of over- and under-expressed genes, have great potential to support progress in biomedical research. However, results are not easy to interpret. Information about the function of the genes and their relation to other genes is needed, and this information is usually present in vast amounts of biomedical literature. Considerable effort is required to find, read and extract relevant information from the literature. A potential solution is to use computerized text analysis methods to extract relevant information. Our proposal enhances current methods in this regard and uses semantic relations extracted from biomedical text with the SemRep information extraction system. We describe an application that integrates microarray results with semantic relations and discuss its benefits in supporting enhanced access to the relevant literature for interpretation of results.
- Published
- 2009
45. Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation.
- Author
-
Fiszman M, Demner-Fushman D, Kilicoglu H, and Rindflesch TC
- Subjects
- Artificial Intelligence, Disease, Drug Therapy, Humans, Information Storage and Retrieval methods, Internet, MEDLINE, Semantics, User-Computer Interface, Vocabulary, Controlled, Electronic Data Processing methods, Evidence-Based Medicine methods, Medical Informatics methods, Natural Language Processing
- Abstract
As the number of electronic biomedical textual resources increases, it becomes harder for physicians to find useful answers at the point of care. Information retrieval applications provide access to databases; however, little research has been done on using automatic summarization to help navigate the documents returned by these systems. After presenting a semantic abstraction automatic summarization system for MEDLINE citations, we concentrate on evaluating its ability to identify useful drug interventions for 53 diseases. The evaluation methodology uses existing sources of evidence-based medicine as surrogates for a physician-annotated reference standard. Mean average precision (MAP) and a clinical usefulness score developed for this study were computed as performance metrics. The automatic summarization system significantly outperformed the baseline in both metrics. The MAP gain was 0.17 (p<0.01) and the increase in the overall score of clinical usefulness was 0.39 (p<0.05).
- Published
- 2009
- Full Text
- View/download PDF
46. Semantic processing to support clinical guideline development.
- Author
-
Fiszman M, Ortiz E, Bray BE, and Rindflesch TC
- Subjects
- United States, Artificial Intelligence, Information Storage and Retrieval methods, Natural Language Processing, Pattern Recognition, Automated methods, Practice Guidelines as Topic, Semantics, Vocabulary, Controlled
- Abstract
Clinical practice guidelines are one of the main resources for communicating evidence-based practice to health professionals. During guideline development, questions that express a knowledge gap are answered by finding relevant citations in MEDLINE and other biomedical databases. Determining citation relevance involves extensive manual review. We propose an automated method for finding relevant citations based on guideline question classification, semantic processing, and rules that match question classes with semantic predications. In this initial study, we focused on a pediatric cardiovascular risk factor guideline. The overall performance of the system was 40% recall, 88% precision (F0.5-score 0.71), and 98% specificity. We show that relevant and nonrelevant citations have clinically different semantic characteristics and suggest that this method has the potential to improve the efficiency of the literature review process in guideline development.
- Published
- 2008
47. Using semantic predications to characterize the clinical cardiovascular literature.
- Author
-
Bray BE, Fiszman M, Shin D, and Rindflesch TC
- Subjects
- Algorithms, Artificial Intelligence, Cardiovascular Diseases classification, Humans, Information Storage and Retrieval methods, Internationality, Pattern Recognition, Automated methods, Semantics, Terminology as Topic, Abstracting and Indexing methods, Cardiovascular Diseases diagnosis, Cardiovascular Diseases therapy, MEDLINE, Natural Language Processing, Periodicals as Topic classification, Periodicals as Topic statistics & numerical data
- Abstract
Using a database of semantic predications extracted from MEDLINE citations, we describe semantic characteristics of the cardiovascular literature, distinguishing therapy from diagnosis studies. Used with existing methodological filters in PubMed, the method may be useful for enhancing precision.
- Published
- 2008
48. Toward automatic recognition of high quality clinical evidence.
- Author
-
Kilicoglu H, Demner-Fushman D, Rindflesch TC, Wilczynski NL, and Haynes RB
- Subjects
- Canada, Quality Control, Semantics, Vocabulary, Controlled, Algorithms, Artificial Intelligence, Evidence-Based Medicine, MEDLINE, Natural Language Processing, Pattern Recognition, Automated methods, Periodicals as Topic
- Abstract
Automatic methods for recognizing topically relevant documents supported by high quality research can assist clinicians in practicing evidence-based medicine. We approach the challenge of identifying articles with high quality clinical evidence as a binary classification problem. Combining predictions from supervised machine learning methods and using deep semantic features, we achieve 73.5% precision and 67% recall.
- Published
- 2008
49. Knowledge-based methods to help clinicians find answers in MEDLINE.
- Author
-
Sneiderman CA, Demner-Fushman D, Fiszman M, Ide NC, and Rindflesch TC
- Subjects
- Abstracting and Indexing, Medical Subject Headings, Information Storage and Retrieval methods, Knowledge Bases, MEDLINE
- Abstract
Objectives: Large databases of published medical research can support clinical decision making by providing physicians with the best available evidence. The time required to obtain optimal results from these databases using traditional systems often makes accessing the databases impractical for clinicians. This article explores whether a hybrid approach of augmenting traditional information retrieval with knowledge-based methods facilitates finding practical clinical advice in the research literature., Design: Three experimental systems were evaluated for their ability to find MEDLINE citations providing answers to clinical questions of different complexity. The systems (SemRep, Essie, and CQA-1.0), which rely on domain knowledge and semantic processing to varying extents, were evaluated separately and in combination. Fifteen therapy and prevention questions in three categories (general, intermediate, and specific questions) were searched. The first 10 citations retrieved by each system were randomized, anonymized, and evaluated on a three-point scale. The reasons for ratings were documented., Measurements: Metrics evaluating the overall performance of a system (mean average precision, binary preference) and metrics evaluating the number of relevant documents in the first several presented to a physician were used., Results: Scores (mean average precision = 0.57, binary preference = 0.71) for fusion of the retrieval results of the three systems are significantly (p < 0.01) better than those for any individual system. All three systems present three to four relevant citations in the first five for any question type., Conclusion: The improvements in finding relevant MEDLINE citations due to knowledge-based processing show promise in assisting physicians to answer questions in clinical practice.
- Published
- 2007
- Full Text
- View/download PDF
50. Using the literature-based discovery paradigm to investigate drug mechanisms.
- Author
-
Ahlers CB, Hristovski D, Kilicoglu H, and Rindflesch TC
- Subjects
- Antipsychotic Agents pharmacology, Brain-Derived Neurotrophic Factor metabolism, Cytochrome P-450 CYP2D6 genetics, Cytochrome P-450 CYP2D6 metabolism, Humans, Prolactin genetics, Prolactin metabolism, Receptors, Glucocorticoid metabolism, Semantics, Tumor Necrosis Factor-alpha genetics, Tumor Necrosis Factor-alpha metabolism, Antipsychotic Agents therapeutic use, Information Storage and Retrieval, MEDLINE, Natural Language Processing, Neoplasms drug therapy
- Abstract
Drug therapies are often used effectively without their underlying mechanism being completely understood. We exploit the literature-based discovery paradigm to investigate these mechanisms and propose a discovery pattern that draws on semantic predications extracted from MEDLINE citations. The use of semantic predications and the discovery pattern provides a way to uncover previously unnoticed associations between pharmacologic and bioactive substances on the one hand and bioactive substances and disorders on the other. In this paper, we concentrate on research investigating the use of antipsychotic agents used for treatment of cancer. Our method resulted in five biomolecules that may provide a link between the antipsychotic agents and cancer: brain-derived neurotrophic factor, CYP2D6, glucocorticoid receptor, PRL, and TNF.
- Published
- 2007
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.