99 results on '"Literature based discovery"'
Search Results
2. Assessing the performance of large language models in literature screening for pharmacovigilance: a comparative study.
- Author
-
Li, Dan, Wu, Leihong, Zhang, Mingfeng, Shpyleva, Svitlana, Lin, Ying-Chi, Huang, Ho-Yin, Li, Ting, and Xu, Joshua
- Subjects
- *
LANGUAGE models , *ARTIFICIAL intelligence , *GENERATIVE pre-trained transformers , *DRUG monitoring , *MEDICATION safety - Abstract
Pharmacovigilance plays a crucial role in ensuring the safety of pharmaceutical products. It involves the systematic monitoring of adverse events and the detection of potential safety concerns related to drugs. Manual literature screening for pharmacovigilance related articles is a labor-intensive and time-consuming task, requiring streamlined solutions to cope with the continuous growth of literature. The primary objective of this study is to assess the performance of Large Language Models (LLMs) in automating literature screening for pharmacovigilance, aiming to enhance the process by identifying relevant articles more effectively. This study represents a novel application of LLMs including OpenAI's GPT-3.5, GPT-4, and Anthropic's Claude2, in the field of pharmacovigilance, evaluating their ability to categorize medical publications as relevant or irrelevant for safety signal reviews. Our analysis encompassed N-shot learning, chain-of-thought reasoning, and evaluating metrics, with a focus on factors impacting accuracy. The findings highlight the promising potential of LLMs in literature screening, achieving a reproducibility of 93%, sensitivity of 97%, and specificity of 67% showcasing notable strengths in terms of reproducibility and sensitivity, although with moderate specificity. Notably, performance improved when models were provided examples consisting of abstracts, labels, and corresponding reasoning explanations. Moreover, our exploration identified several potential contributing factors influencing prediction outcomes. These factors encompassed the choice of key words and prompts, the balance of the examples, and variations in reasoning explanations. By configuring advanced LLMs for efficient screening of extensive literature databases, this study underscores the transformative potential of these models in drug safety monitoring. Furthermore, these insights gained from this study can inform the development of automated systems for pharmacovigilance, contributing to the ongoing efforts to ensure the safety and efficacy of pharmacovigilance products. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Link prediction for hypothesis generation: an active curriculum learning infused temporal graph-based approach.
- Author
-
Akujuobi, Uchenna, Kumari, Priyadarshini, Choi, Jihun, Badreddine, Samy, Maruyama, Kana, Palaniappan, Sucheendra K., and Besold, Tarek R.
- Abstract
Over the last few years Literature-based Discovery (LBD) has regained popularity as a means to enhance the scientific research process. The resurgent interest has spurred the development of supervised and semi-supervised machine learning models aimed at making previously implicit connections between scientific concepts/entities within often extensive bodies of literature explicit—i.e., suggesting novel scientific hypotheses. In doing so, understanding the temporally evolving interactions between these entities can provide valuable information for predicting the future development of entity relationships. However, existing methods often underutilize the latent information embedded in the temporal aspects of the interaction data. Motivated by applications in the food domain—where we aim to connect nutritional information with health-related benefits—we address the hypothesis-generation problem using a temporal graph-based approach. Given that hypothesis generation involves predicting future (i.e., still to be discovered) entity connections, in our view the ability to capture the dynamic evolution of connections over time is pivotal for a robust model. To address this, we introduce THiGER, a novel batch contrastive temporal node-pair embedding method. THiGER excels in providing a more expressive node-pair encoding by effectively harnessing node-pair relationships. Furthermore, we present THiGER-A, an incremental training approach that incorporates an active curriculum learning strategy to mitigate label bias arising from unobserved connections. By progressively training on increasingly challenging and high-utility samples, our approach significantly enhances the performance of the embedding model. Empirical validation of our proposed method demonstrates its effectiveness on established temporal-graph benchmark datasets, as well as on real-world datasets within the food domain. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Assessing the performance of large language models in literature screening for pharmacovigilance: a comparative study.
- Author
-
Dan Li, Leihong Wu, Mingfeng Zhang, Shpyleva, Svitlana, Ying-Chi Lin, Ho-Yin Huang, Ting Li, and Joshua Xu
- Subjects
- *
LANGUAGE models , *PROSODIC analysis (Linguistics) , *GENERATIVE pre-trained transformers , *DRUG monitoring - Abstract
Pharmacovigilance plays a crucial role in ensuring the safety of pharmaceutical products. It involves the systematic monitoring of adverse events and the detection of potential safety concerns related to drugs. Manual literature screening for pharmacovigilance related articles is a labor-intensive and time-consuming task, requiring streamlined solutions to cope with the continuous growth of literature. The primary objective of this study is to assess the performance of Large Language Models (LLMs) in automating literature screening for pharmacovigilance, aiming to enhance the process by identifying relevant articles more effectively. This study represents a novel application of LLMs including OpenAI's GPT-3.5, GPT-4, and Anthropic's Claude2, in the field of pharmacovigilance, evaluating their ability to categorize medical publications as relevant or irrelevant for safety signal reviews. Our analysis encompassed N-shot learning, chain-of-thought reasoning, and evaluating metrics, with a focus on factors impacting accuracy. The findings highlight the promising potential of LLMs in literature screening, achieving a reproducibility of 93%, sensitivity of 97%, and specificity of 67% showcasing notable strengths in terms of reproducibility and sensitivity, although with moderate specificity. Notably, performance improved when models were provided examples consisting of abstracts, labels, and corresponding reasoning explanations. Moreover, our exploration identified several potential contributing factors influencing prediction outcomes. These factors encompassed the choice of key words and prompts, the balance of the examples, and variations in reasoning explanations. By configuring advanced LLMs for efficient screening of extensive literature databases, this study underscores the transformative potential of these models in drug safety monitoring. Furthermore, these insights gained from this study can inform the development of automated systems for pharmacovigilance, contributing to the ongoing efforts to ensure the safety and efficacy of pharmacovigilance products. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Using word evolution to predict drug repurposing
- Author
-
Judita Preiss
- Subjects
Drug repurposing ,Literature based discovery ,Word evolution ,Word embeddings ,Deep learning ,Computer applications to medicine. Medical informatics ,R858-859.7 - Abstract
Abstract Background Traditional literature based discovery is based on connecting knowledge pairs extracted from separate publications via a common mid point to derive previously unseen knowledge pairs. To avoid the over generation often associated with this approach, we explore an alternative method based on word evolution. Word evolution examines the changing contexts of a word to identify changes in its meaning or associations. We investigate the possibility of using changing word contexts to detect drugs suitable for repurposing. Results Word embeddings, which represent a word’s context, are constructed from chronologically ordered publications in MEDLINE at bi-monthly intervals, yielding a time series of word embeddings for each word. Focusing on clinical drugs only, any drugs repurposed in the final time segment of the time series are annotated as positive examples. The decision regarding the drug’s repurposing is based either on the Unified Medical Language System (UMLS), or semantic triples extracted using SemRep from MEDLINE. Conclusions The annotated data allows deep learning classification, with a 5-fold cross validation, to be performed and multiple architectures to be explored. Performance of 65% using UMLS labels, and 81% using SemRep labels is attained, indicating the technique’s suitability for the detection of candidate drugs for repurposing. The investigation also shows that different architectures are linked to the quantities of training data available and therefore that different models should be trained for every annotation approach.
- Published
- 2024
- Full Text
- View/download PDF
6. Using word evolution to predict drug repurposing.
- Author
-
Preiss, Judita
- Subjects
- *
DEEP learning , *DRUG repositioning , *MEDICAL language , *TIME series analysis , *VOCABULARY - Abstract
Background: Traditional literature based discovery is based on connecting knowledge pairs extracted from separate publications via a common mid point to derive previously unseen knowledge pairs. To avoid the over generation often associated with this approach, we explore an alternative method based on word evolution. Word evolution examines the changing contexts of a word to identify changes in its meaning or associations. We investigate the possibility of using changing word contexts to detect drugs suitable for repurposing. Results: Word embeddings, which represent a word's context, are constructed from chronologically ordered publications in MEDLINE at bi-monthly intervals, yielding a time series of word embeddings for each word. Focusing on clinical drugs only, any drugs repurposed in the final time segment of the time series are annotated as positive examples. The decision regarding the drug's repurposing is based either on the Unified Medical Language System (UMLS), or semantic triples extracted using SemRep from MEDLINE. Conclusions: The annotated data allows deep learning classification, with a 5-fold cross validation, to be performed and multiple architectures to be explored. Performance of 65% using UMLS labels, and 81% using SemRep labels is attained, indicating the technique's suitability for the detection of candidate drugs for repurposing. The investigation also shows that different architectures are linked to the quantities of training data available and therefore that different models should be trained for every annotation approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Avoiding background knowledge: literature based discovery from important information
- Author
-
Judita Preiss
- Subjects
Literature based discovery ,Subject–predicate–object triples ,Machine learning ,Timeslicing gold standard ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Automatic literature based discovery attempts to uncover new knowledge by connecting existing facts: information extracted from existing publications in the form of $$A \rightarrow B$$ A → B and $$B \rightarrow C$$ B → C relations can be simply connected to deduce $$A \rightarrow C$$ A → C . However, using this approach, the quantity of proposed connections is often too vast to be useful. It can be reduced by using subject $$\rightarrow$$ → (predicate) $$\rightarrow$$ → object triples as the $$A \rightarrow B$$ A → B relations, but too many proposed connections remain for manual verification. Results Based on the hypothesis that only a small number of subject–predicate–object triples extracted from a publication represent the paper’s novel contribution(s), we explore using BERT embeddings to identify these before literature based discovery is performed utilizing only these, important, triples. While the method exploits the availability of full texts of publications in the CORD-19 dataset—making use of the fact that a novel contribution is likely to be mentioned in both an abstract and the body of a paper—to build a training set, the resulting tool can be applied to papers with only abstracts available. Candidate hidden knowledge pairs generated from unfiltered triples and those built from important triples only are compared using a variety of timeslicing gold standards. Conclusions The quantity of proposed knowledge pairs is reduced by a factor of $$10^3$$ 10 3 , and we show that when the gold standard is designed to avoid rewarding background knowledge, the precision obtained increases up to a factor of 10. We argue that the gold standard needs to be carefully considered, and release as yet undiscovered candidate knowledge pairs based on important triples alongside this work.
- Published
- 2023
- Full Text
- View/download PDF
8. A Knowledge Graph Completion Method Applied to Literature-Based Discovery for Predicting Missing Links Targeting Cancer Drug Repurposing
- Author
-
Daowd, Ali, Abidi, Samina, Abidi, Syed Sibte Raza, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Michalowski, Martin, editor, Abidi, Syed Sibte Raza, editor, and Abidi, Samina, editor
- Published
- 2022
- Full Text
- View/download PDF
9. A New Literature-Based Discovery (LBD) Application Using the PubMed Database
- Author
-
Schofield, Matthew, Hristescu, Gabriela, Radu, Aurelian, Arabnia, Hamid, Series Editor, Arabnia, Hamid R., editor, Deligiannidis, Leonidas, editor, Shouno, Hayaru, editor, Tinetti, Fernando G., editor, and Tran, Quoc-Nam, editor
- Published
- 2021
- Full Text
- View/download PDF
10. Matching Biomedical Ontologies with Compact Evolutionary Algorithm
- Author
-
Xue, Xingsi, Tsai, Pei-Wei, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lu, Wei, editor, and Zhu, Kenny Q., editor
- Published
- 2020
- Full Text
- View/download PDF
11. A Knowledge Graph of Mechanistic Associations Between COVID-19, Diabetes Mellitus, and Chronic Kidney Disease.
- Author
-
Barrett, Michael, Raza Abidi, Syed Sibte, Daowd, Ali, and Abidi, Samina
- Abstract
We present an automated knowledge synthesis and discovery framework to analyze published literature to identify and represent underlying mechanistic associations that aggravate chronic conditions due to COVID-19. Our literature-based discovery approach integrates text mining, knowledge graphs and medical ontologies to discover hidden and previously unknown pathophysiologic relations, dispersed across multiple public literature databases, between COVID-19 and chronic disease mechanisms. We applied our approach to discover mechanistic associations between COVID-19 and chronic conditions--i.e. diabetes mellitus and chronic kidney disease--to understand the long-term impact of COVID-19 on patients with chronic diseases. We found several gene-disease associations that could help identify mechanisms driving poor outcomes for COVID-19 patients with underlying conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. A Knowledge Graph of Mechanistic Associations Between COVID-19, Diabetes Mellitus, and Chronic Kidney Disease.
- Author
-
Barrett, Michael, Raza Abidi, Syed Sibte, Daowd, Ali, and Abidi, Samina
- Subjects
CHRONIC kidney failure ,COVID-19 ,GENETICS ,DIABETES ,HEALTH outcome assessment ,CONFERENCES & conventions ,INTELLECT ,LITERATURE reviews ,COMORBIDITY ,MEDICAL literature - Abstract
We present an automated knowledge synthesis and discovery framework to analyze published literature to identify and represent underlying mechanistic associations that aggravate chronic conditions due to COVID-19. Our literature-based discovery approach integrates text mining, knowledge graphs and medical ontologies to discover hidden and previously unknown pathophysiologic relations, dispersed across multiple public literature databases, between COVID-19 and chronic disease mechanisms. We applied our approach to discover mechanistic associations between COVID-19 and chronic conditions--i.e. diabetes mellitus and chronic kidney disease--to understand the long-term impact of COVID-19 on patients with chronic diseases. We found several gene-disease associations that could help identify mechanisms driving poor outcomes for COVID-19 patients with underlying conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
13. Indirect association and ranking hypotheses for literature based discovery
- Author
-
Sam Henry and Bridget T. McInnes
- Subjects
Literature based discovery ,Indirect association ,Semantic relatedness ,Semantic similarity ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Literature Based Discovery (LBD) produces more potential hypotheses than can be manually reviewed, making automatically ranking these hypotheses critical. In this paper, we introduce the indirect association measures of Linking Term Association (LTA), Minimum Weight Association (MWA), and Shared B to C Set Association (SBC), and compare them to Linking Set Association (LSA), concept embeddings vector cosine, Linking Term Count (LTC), and direct co-occurrence vector cosine. Our proposed indirect association measures extend traditional association measures to quantify indirect rather than direct associations while preserving valuable statistical properties. Results We perform a comparison between several different hypothesis ranking methods for LBD, and compare them against our proposed indirect association measures. We intrinsically evaluate each method’s performance using its ability to estimate semantic relatedness on standard evaluation datasets. We extrinsically evaluate each method’s ability to rank hypotheses in LBD using a time-slicing dataset based on co-occurrence information, and another time-slicing dataset based on SemRep extracted-relationships. Precision and recall curves are generated by ranking term pairs and applying a threshold at each rank. Conclusions Results differ depending on the evaluation methods and datasets, but it is unclear if this is a result of biases in the evaluation datasets or if one method is truly better than another. We conclude that LTC and SBC are the best suited methods for hypothesis ranking in LBD, but there is value in having a variety of methods to choose from.
- Published
- 2019
- Full Text
- View/download PDF
14. Is automatic detection of hidden knowledge an anomaly?
- Author
-
Judita Preiss
- Subjects
Literature based discovery ,Anomaly detection ,Unified medical language system ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The quantity of documents being published requires researchers to specialize to a narrower field, meaning that inferable connections between publications (particularly from different domains) can be missed. This has given rise to automatic literature based discovery (LBD). However, unless heavily filtered, LBD generates more potential new knowledge than can be manually verified and another form of selection is required before the results can be passed onto a user. Since a large proportion of the automatically generated hidden knowledge is valid but generally known, we investigate the hypothesis that non trivial, interesting, hidden knowledge can be treated as an anomaly and identified using anomaly detection approaches. Results Two experiments are conducted: (1) to avoid errors arising from incorrect extraction of relations, the hypothesis is validated using manually annotated relations appearing in a thesaurus, and (2) automatically extracted relations are used to investigate the hypothesis on publication abstracts. These allow an investigation of a potential upper bound and the detection of limitations yielded by automatic relation extraction. Conclusion We apply one-class SVM and isolation forest anomaly detection algorithms to a set of hidden connections to rank connections by identifying outlying (interesting) ones and show that the approach increases the F 1 measure by a factor of 10 while greatly reducing the quantity of hidden knowledge to manually verify. We also demonstrate the statistical significance of this result.
- Published
- 2019
- Full Text
- View/download PDF
15. Using Literature Based Discovery to Gain Insights Into the Metabolomic Processes of Cardiac Arrest
- Author
-
Sam Henry, D. Shanaka Wijesinghe, Aidan Myers, and Bridget T. McInnes
- Subjects
literature based discovery ,metabolomics ,knowledge discovery ,text mining ,natural language processing ,lipidomics ,Bibliography. Library science. Information resources - Abstract
In this paper, we describe how we applied LBD techniques to discover lecithin cholesterol acyltransferase (LCAT) as a druggable target for cardiac arrest. We fully describe our process which includes the use of high-throughput metabolomic analysis to identify metabolites significantly related to cardiac arrest, and how we used LBD to gain insights into how these metabolites relate to cardiac arrest. These insights lead to our proposal (for the first time) of LCAT as a druggable target; the effects of which are supported by in vivo studies which were brought forth by this work. Metabolites are the end product of many biochemical pathways within the human body. Observed changes in metabolite levels are indicative of changes in these pathways, and provide valuable insights toward the cause, progression, and treatment of diseases. Following cardiac arrest, we observed changes in metabolite levels pre- and post-resuscitation. We used LBD to help discover diseases implicitly linked via these metabolites of interest. Results of LBD indicated a strong link between Fish Eye disease and cardiac arrest. Since fish eye disease is characterized by an LCAT deficiency, it began an investigation into the effects of LCAT and cardiac arrest survival. In the investigation, we found that decreased LCAT activity may increase cardiac arrest survival rates by increasing ω-3 polyunsaturated fatty acid availability in circulation. We verified the effects of ω-3 polyunsaturated fatty acids on increasing survival rate following cardiac arrest via in vivo with rat models.
- Published
- 2021
- Full Text
- View/download PDF
16. Characteristics of the similarity index in a Korean medical journal
- Author
-
Seunghyun Chung, Jeunghyuk Lee, Younsuk Lee, Ha Yeon Park, and Daehwan Kim
- Subjects
bibliometrics ,literature based discovery ,peer review ,plagiarism ,similarity index ,social media ,Anesthesiology ,RD78.3-87.3 - Abstract
BackgroundJournal editors have exercised their control over submitted papers having a high similarity index. Despite widespread suspicion of possible plagiarism on a high similarity index, our study focused on the real effect of the similarity index on the value of a scientific paper.MethodsThis research examined the percent values of the similarity index from 978 submitted (420 published) papers in the Korean Journal of Anesthesiology since 2012. Thus, this study aimed to identify the correlation between the similarity index and the value of a paper. The value of a paper was evaluated in two distinct phases (during a peer-review process vs. after publication), and the value of a published paper was evaluated in two aspects (academic citation vs. social media appearance).ResultsYearly mean values of the similarity index ranged from 16% to 19%. There were 254 papers cited at least once and 179 papers appearing at least once in social media. The similarity index affected the acceptance/rejection of a paper in various ways; although the influence was not linear and the cutoff measures were distinctive among the types of papers, both extremes were related to a high rate of rejection. After publication, the similarity index had no effect on academic citation or social media appearance according to the paper.ConclusionsThe finding suggested that the similarity index no longer had an influence on academic citation or social media appearance according to the paper after publication, while the similarity index affected the acceptance/rejection of a submitted paper. Proofreading and intervention for finalizing the draft by the editors might play a role in achieving uniform quality of the publication.
- Published
- 2017
- Full Text
- View/download PDF
17. SemNet: Using Local Features to Navigate the Biomedical Concept Graph
- Author
-
Andrew R. Sedler and Cassie S. Mitchell
- Subjects
literature based discovery ,unsupervised learning ,text mining ,natural language processing ,Python ,Biotechnology ,TP248.13-248.65 - Abstract
Literature-Based Discovery (LBD) aims to connect scientists across silos by assembling models of the literature to reveal previously hidden connections. Unfortunately, LBD systems have been unable to achieve user adoption on a large scale. This work develops opens source software in Python to convert a database of semantic predications of all of PubMed's 27.9 million indexed abstracts into a semantic inference network and biomedical concept graph in Neo4j. The developed software, called SemNet, queries a modified version of the publicly available SemMedDB and computes feature vectors on source-target pairs. Each unique United Medical Language System (UMLS) concept is represented as a node and each predication as an edge. Each node is assigned one of 132 node labels (e.g., Amino Acid, Peptide, or Protein (AAPP); Gene or Genome (GG); etc.) and each edge is labeled with one of 58 predications (e.g. treats, causes, inhibits, etc.). SemNet computes a single feature value for each metapath, or sequence of node types, between a source node and user-specified target node(s). Several different types of metapath-based features (count, degree weighted path count, and HeteSim metric) are computed and vectorized. SemNet employs an unsupervised learning algorithm for rank aggregation (ULARA) to rank identified source nodes that are most relevant to the user-specified target nodes(s). Statistical analysis of correlation among identified source nodes or resultant literature network features are used to identify patterns that can guide future research. Analysis of high residual nodes is used to compare and contrast SemNet rankings between different targets of interest. An example SemNet use case is presented to assess “the differential impact of smoking on cognition in males and females” using the following target nodes: nicotine, learning, memory, tetrahydrocannabinol (THC), cigarette smoke, X chromosome, and Y chromosome. Detailed rankings are discussed. Overall results suggest a hypothesis where smoking negatively impacts cognition to a greater extent in females, but smoking has stronger cardiovascular impacts in males. In summary, SemNet provides an adoptable method for efficient LBD of PubMed that extends beyond omics-only relationships to true multi-scalar connections that can provide actionable insight for predictive medicine, research prioritization, and clinical care.
- Published
- 2019
- Full Text
- View/download PDF
18. Ketamine: A Neglected Therapy for Alzheimer Disease
- Author
-
Neil R. Smalheiser
- Subjects
drug repurposing ,discovery ,literature based discovery ,technological forecasting ,neglected findings ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Published
- 2019
- Full Text
- View/download PDF
19. A Method to Accelerate and Visualize Iterative Clinical Paper Searching.
- Author
-
Yiqin Yu, Enliang Xu, Eryu Xia, He Huang, Bibo Hao, and Shilei Zhang
- Subjects
BIBLIOMETRICS ,DATABASE searching ,RECALL (Information retrieval) ,VISUALIZATION ,CLINICAL trials - Abstract
Clinical paper searching is a major task for clinical researchers to collect authoritative and up-to-date evidences to support their research works and clinical practices. Currently, this task needs huge amount of labor work. Researchers usually spend a lot of time searching on the online repository and iterating many times to get the final paper list. Systematic review is a special case, in which the paper searching process is a critical step. To address this challenge, this paper introduces a method to streamline the iterative paper searching process. It automatically selects the most probably matched papers, and then generates new search strategy. All the intermediate results are visualized based on the paper citation graph. It assembles technologies such as Page Rank and Topic-based clustering to accelerate the paper searching tasks. The precision, recall, and execution time of the proposed method are then evaluated by comparing with published systematic reviews [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
20. Ketamine: A Neglected Therapy for Alzheimer Disease.
- Author
-
Smalheiser, Neil R.
- Subjects
KETAMINE abuse ,KETAMINE ,ALZHEIMER'S disease - Abstract
Highlights from the article: No wonder that a search of Clinicaltrials.gov (carried out in April 2019) shows 152 trials of ketamine in depression, and 38 trials that involve various antidepressants in Alzheimer patients, but no registered trials that involve giving ketamine to Alzheimer patients. However, in most studies, the molecular effects of ketamine which are associated with antidepressant activity are not produced by memantine (Gideons et al., [10]), and indeed, memantine has failed to produce rapid antidepressant effects in human clinical trials of depression (Kishi et al., [16]). However, the antidepressant effects of (2 I R i ,6 I R i )-HNK were not accompanied by NMDA receptor-dependent side effects, suggesting that ketamine metabolites may provide a safer means for treating depressed patients - and by implication, a safer venue for testing procognitive effects in Alzheimer patients. Comparison of antidepressant and side effects in mice after intranasal administration of (R,S)-ketamine, (R)-ketamine, and (S)-ketamine.
- Published
- 2019
- Full Text
- View/download PDF
21. Is automatic detection of hidden knowledge an anomaly?
- Author
-
Preiss, Judita
- Subjects
- *
THEORY of knowledge , *STATISTICAL significance , *ANOMALY detection (Computer security) , *MEDICAL language , *NEUROBIOLOGY - Abstract
Background: The quantity of documents being published requires researchers to specialize to a narrower field, meaning that inferable connections between publications (particularly from different domains) can be missed. This has given rise to automatic literature based discovery (LBD). However, unless heavily filtered, LBD generates more potential new knowledge than can be manually verified and another form of selection is required before the results can be passed onto a user. Since a large proportion of the automatically generated hidden knowledge is valid but generally known, we investigate the hypothesis that non trivial, interesting, hidden knowledge can be treated as an anomaly and identified using anomaly detection approaches. Results: Two experiments are conducted: (1) to avoid errors arising from incorrect extraction of relations, the hypothesis is validated using manually annotated relations appearing in a thesaurus, and (2) automatically extracted relations are used to investigate the hypothesis on publication abstracts. These allow an investigation of a potential upper bound and the detection of limitations yielded by automatic relation extraction. Conclusion: We apply one-class SVM and isolation forest anomaly detection algorithms to a set of hidden connections to rank connections by identifying outlying (interesting) ones and show that the approach increases the F1 measure by a factor of 10 while greatly reducing the quantity of hidden knowledge to manually verify. We also demonstrate the statistical significance of this result. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
22. Avoiding background knowledge: literature based discovery from important information
- Author
-
Preiss, Judita
- Published
- 2022
- Full Text
- View/download PDF
23. The impact factor of an open access journal does not contribute to an article’s citations [version 1; referees: 2 approved]
- Author
-
SK Chua, Ahmad M Qureshi, Vijay Krishnan, Dinker R Pai, Laila B Kamal, Sharmilla Gunasegaran, MZ Afzal, Lahiru Ambawatta, JY Gan, PY Kew, Than Winn, and Suneet Sood
- Subjects
Research Article ,Articles ,Public Engagement ,Publishing & Peer Review ,bibliometrics ,bibliometric analysis ,information science ,publications ,literature based discovery ,open access ,Web of Science ,Google Scholar - Abstract
Background Citations of papers are positively influenced by the journal’s impact factor (IF). For non-open access (non-OA) journals, this influence may be due to the fact that high-IF journals are more often purchased by libraries, and are therefore more often available to researchers, than low-IF journals. This positive influence has not, however, been shown specifically for papers published in open access (OA) journals, which are universally accessible, and do not need library purchase. It is therefore important to ascertain if the IF influences citations in OA journals too. Methods 203 randomized controlled trials (102 OA and 101 non-OA) published in January 2011 were included in the study. Five-year citations for papers published in OA journals were compared to those for non-OA journals. Source papers were derived from PubMed. Citations were retrieved from Web of Science, Scopus, and Google Scholar databases. The Thompson-Reuter’s IF was used. Results OA journals were found to have significantly more citations overall compared to non-OA journals (median 15.5 vs 12, p=0.039). The IF did not correlate with citations for OA journals (Spearman’s rho =0.187, p=0.60). The increase in the citations with increasing IF was minimal for OA journals (beta coefficient = 3.346, 95% CI -0.464, 7.156, p=0.084). In contrast, the IF did show moderate correlation with citations for articles published in non-OA journals (Spearman’s rho=0.514, p Conclusion It is better to publish in an OA journal for more citations. It may not be worth paying high publishing fees for higher IF journals, because there is minimal gain in terms of increased number of citations. On the other hand, if one wishes to publish in a non-OA journal, it is better to choose one with a high IF.
- Published
- 2017
- Full Text
- View/download PDF
24. Stability and change in public health studies in Colombia and Mexico: an exploratory approach based on co-word analysis.
- Author
-
Vílchez-Román, Carlos and Quiliano-Terreros, Rocío
- Subjects
- *
PUBLIC health , *STATISTICAL correlation - Abstract
Objective. To determine the level of stability or change in topic areas published by public health journals in Latin America and the Caribbean, using keywords and co-word analysis, in order to support evidence-based research planning. Methods. Keywords were extracted from papers indexed in Scopus® that were published by the Revista de Salud Pública (RSP; Colombia), the Salud Pública de México (SPM; Mexico), and the Revista Peruana de Medicina Experimental y Salud Pública (RPMESP; Peru) for three periods: 2005 - 2007, 2008 - 2010, and 2011 - 2013. Co-word analysis was used to examine keywords extracted. Textual information was analyzed using centrality measures (inbetweenness and closeness). The hypothesis of stability/change of thematic coverage was tested using the Spearman's rho correlation coefficient. VOSviewer was used to visualize the co-word maps. Results. A moderate level of change in thematic coverage was observed in 2005 -- 2010, as evidenced by the correlation coefficients for two of the 3-year periods, 2005 -- 2007 and 2008 -- 2010: 0.545 for RSP and 0.593 for SPM. However, in 2008 -- 2013, more keywords remained constant from one period to the next, given the size of the correlation coefficients for the last 3-year periods: 2008 -- 2010 and 2011 -- 2013: 0.727 for RSP and 0.605 for SPM. Conclusion. The research hypothesis was partially accepted given that just two consecutive 3-year periods showed a statistically-significant degree of stability in thematic coverage in public health studies. In that sense, this study provides compelling evidence of the effectiveness of using a combined approach for examining the dynamics of thematic coverage: centrality measures for identifying the main keywords and visual inspection for detecting the structure of textual information. [ABSTRACT FROM AUTHOR]
- Published
- 2018
25. Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications.
- Author
-
Mower, Justin, Subramanian, Devika, and Cohen, Trevor
- Abstract
Objective: The aim of this work is to leverage relational information extracted from biomedical literature using a novel synthesis of unsupervised pretraining, representational composition, and supervised machine learning for drug safety monitoring.Methods: Using ≈80 million concept-relationship-concept triples extracted from the literature using the SemRep Natural Language Processing system, distributed vector representations (embeddings) were generated for concepts as functions of their relationships utilizing two unsupervised representational approaches. Embeddings for drugs and side effects of interest from two widely used reference standards were then composed to generate embeddings of drug/side-effect pairs, which were used as input for supervised machine learning. This methodology was developed and evaluated using cross-validation strategies and compared to contemporary approaches. To qualitatively assess generalization, models trained on the Observational Medical Outcomes Partnership (OMOP) drug/side-effect reference set were evaluated against a list of ≈1100 drugs from an online database.Results: The employed method improved performance over previous approaches. Cross-validation results advance the state of the art (AUC 0.96; F1 0.90 and AUC 0.95; F1 0.84 across the two sets), outperforming methods utilizing literature and/or spontaneous reporting system data. Examination of predictions for unseen drug/side-effect pairs indicates the ability of these methods to generalize, with over tenfold label support enrichment in the top 100 predictions versus the bottom 100 predictions.Discussion and Conclusion: Our methods can assist the pharmacovigilance process using information from the biomedical literature. Unsupervised pretraining generates a rich relationship-based representational foundation for machine learning techniques to classify drugs in the context of a putative side effect, given known examples. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
26. Distribution of 'Characteristic' Terms in MEDLINE Literatures
- Author
-
Vetle I. Torvik, Wei Zhou, and Neil R. Smalheiser
- Subjects
information retrieval ,term occurrence ,text mining ,annotation ,literature based discovery ,Information technology ,T58.5-58.64 - Abstract
Given the occurrence frequency of any term within any set of articles within MEDLINE, we define “characteristic” terms as words and phrases that occur in that literature more frequently than expected by chance (at p < 0.001 or better). In this report, we studied how the cut-off criterion varied as a function of literature size and term frequency in MEDLINE as a whole, and have compared the distribution of characteristic terms within a number of journal-defined, affiliation-defined and random literatures. We also investigated how the characteristic terms were distributed among MEDLINE titles, abstracts, and last sentence of abstracts, including “regularized” terms that appear both in the title and abstract of the same paper for at least one paper in the literature. For a set of 10 disciplinary journals, the characteristic terms comprised 18% of the total terms on average. Characteristic terms are utilized in several of our web-based services (Anne O’Tate and Arrowsmith), and should be useful for a variety of other information-processing tasks designed to improve text mining in MEDLINE.
- Published
- 2011
- Full Text
- View/download PDF
27. The effect of word sense disambiguation accuracy on literature based discovery.
- Author
-
Preiss, Judita and Stevenson, Mark
- Subjects
- *
DATA mining , *TEXT processing (Computer science) , *RAYNAUD'S disease , *BLOOD viscosity , *GENE expression , *INFORMATION retrieval , *MEDICAL research , *MEDLINE , *ARTHRITIS Impact Measurement Scales - Abstract
Background: The volume of research published in the biomedical domain has increasingly lead to researchers focussing on specific areas of interest and connections between findings being missed. Literature based discovery (LBD) attempts to address this problem by searching for previously unnoticed connections between published information (also known as "hidden knowledge"). A common approach is to identify hidden knowledge via shared linking terms. However, biomedical documents are highly ambiguous which can lead LBD systems to over generate hidden knowledge by hypothesising connections through different meanings of linking terms. Word Sense Disambiguation (WSD) aims to resolve ambiguities in text by identifying the meaning of ambiguous terms. This study explores the effect of WSD accuracy on LBD performance.Methods: An existing LBD system is employed and four approaches to WSD of biomedical documents integrated with it. The accuracy of each WSD approach is determined by comparing its output against a standard benchmark. Evaluation of the LBD output is carried out using timeslicing approach, where hidden knowledge is generated from articles published prior to a certain cutoff date and a gold standard extracted from publications after the cutoff date.Results: WSD accuracy varies depending on the approach used. The connection between the performance of the LBD and WSD systems are analysed to reveal a correlation between WSD accuracy and LBD performance.Conclusion: This study reveals that LBD performance is sensitive to WSD accuracy. It is therefore concluded that WSD has the potential to improve the output of LBD systems by reducing the amount of spurious hidden knowledge that is generated. It is also suggested that further improvements in WSD accuracy have the potential to improve LBD accuracy. [ABSTRACT FROM AUTHOR]- Published
- 2016
- Full Text
- View/download PDF
28. Avoiding background knowledge: literature based discovery from important information.
- Author
-
Preiss J
- Subjects
- Knowledge Discovery, Knowledge
- Abstract
Background: Automatic literature based discovery attempts to uncover new knowledge by connecting existing facts: information extracted from existing publications in the form of [Formula: see text] and [Formula: see text] relations can be simply connected to deduce [Formula: see text]. However, using this approach, the quantity of proposed connections is often too vast to be useful. It can be reduced by using subject[Formula: see text](predicate)[Formula: see text]object triples as the [Formula: see text] relations, but too many proposed connections remain for manual verification., Results: Based on the hypothesis that only a small number of subject-predicate-object triples extracted from a publication represent the paper's novel contribution(s), we explore using BERT embeddings to identify these before literature based discovery is performed utilizing only these, important, triples. While the method exploits the availability of full texts of publications in the CORD-19 dataset-making use of the fact that a novel contribution is likely to be mentioned in both an abstract and the body of a paper-to build a training set, the resulting tool can be applied to papers with only abstracts available. Candidate hidden knowledge pairs generated from unfiltered triples and those built from important triples only are compared using a variety of timeslicing gold standards., Conclusions: The quantity of proposed knowledge pairs is reduced by a factor of [Formula: see text], and we show that when the gold standard is designed to avoid rewarding background knowledge, the precision obtained increases up to a factor of 10. We argue that the gold standard needs to be carefully considered, and release as yet undiscovered candidate knowledge pairs based on important triples alongside this work., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
29. SemPathFinder: Semantic path analysis for discovering publicly unknown knowledge.
- Author
-
Song, Min, Heo, Go Eun, and Ding, Ying
- Subjects
SEMANTICS ,PATH analysis (Statistics) ,BIOPHARMACEUTICS ,MEDICAL personnel ,DRUG side effects - Abstract
The enormous amount of biomedicine's natural-language texts creates a daunting challenge to discover novel and interesting patterns embedded in the text corpora that help biomedical professionals find new drugs and treatments. These patterns constitute entities such as genes, compounds, treatments, and side effects and their associations that spread across publications in different biomedical specialties. This paper proposes SemPathFinder to discover previously unknown relations in biomedical text. SemPathFinder overcomes the problems of Swanson's ABC model by using semantic path analysis to tell a story about plausible connections between biological terms. Storytelling-based semantic path analysis can be viewed as relation navigation for bio-entities that are semantically close to each other, and reveals insight into how a series of entity pairs is organized, and how it can be harnessed to explain seemingly unrelated connections. We apply SemPathFinder for two well-known use cases of Swanson's ABC model, and the experimental results show that SemPathFinder detects all intermediate terms except for one and also infers several interesting new hypotheses. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
30. Exploring relation types for literature-based discovery.
- Author
-
Preiss, Judita, Stevenson, Mark, and Gaizauskas, Robert
- Abstract
Objective: Literature-based discovery (LBD) aims to identify "hidden knowledge" in the medical literature by: (1) analyzing documents to identify pairs of explicitly related concepts (terms), then (2) hypothesizing novel relations between pairs of unrelated concepts that are implicitly related via a shared concept to which both are explicitly related. Many LBD approaches use simple techniques to identify semantically weak relations between concepts, for example, document co-occurrence. These generate huge numbers of hypotheses, difficult for humans to assess. More complex techniques rely on linguistic analysis, for example, shallow parsing, to identify semantically stronger relations. Such approaches generate fewer hypotheses, but may miss hidden knowledge. The authors investigate this trade-off in detail, comparing techniques for identifying related concepts to discover which are most suitable for LBD. Materials and methods: A generic LBD system that can utilize a range of relation types was developed. Experiments were carried out comparing a number of techniques for identifying relations. Two approaches were used for evaluation: replication of existing discoveries and the "time slicing" approach. Results: Previous LBD discoveries could be replicated using relations based either on document co-occurrence or linguistic analysis. Using relations based on linguistic analysis generated many fewer hypotheses, but a significantly greater proportion of them were candidates for hidden knowledge. Discussion and Conclusion: The use of linguistic analysis-based relations improves accuracy of LBD without overly damaging coverage. LBD systems often generate huge numbers of hypotheses, which are infeasible to manually review. Improving their accuracy has the potential to make these systems significantly more usable. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
31. A Bird's-Eye View of Alzheimer's Disease Research: Reflecting Different Perspectives of Indexers, Authors, or Citers in Mapping the Field.
- Author
-
Lee, Dahee, Kim, Won Chul, Charidimou, Andreas, and Song, Min
- Subjects
- *
ALZHEIMER'S disease research , *BASAL ganglia diseases , *SENILE dementia , *BIBLIOMETRICS , *PARKINSON'S disease - Abstract
During the last 30 years, Alzheimer's disease (AD) research, aiming to understand the pathophysiology and to improve the diagnosis, management, and, ultimately, treatment of the disease, has grown rapidly. Recently, some studies have used simple bibliometric approaches to investigate research trends and advances in the field. In our study, we map the AD research field by applying entitymetrics, an extended concept of bibliometrics, to capture viewpoints of indexers, authors, or citers. Using the full-text documents with reference section retrieved from PubMed Central, we constructed four types of networks: MeSH-MeSH (MM), MeSH-Citation-MeSH (MCM), Keyphrase-Keyphrase (KK), and Keyphrase-Citation-Keyphrase (KCK) networks. The working hypothesis was that MeSH, keyphrase, and citation relationships reflect the views of indexers, authors, and/or citers, respectively. In comparative network and centrality analysis, we found that those views are different: indexers emphasize amyloid-related entities, including methodological terms, while authors focus on specific biomedical terms, including clinical syndromes. The more dense and complex networks of citing relationships reported in our study, to a certain extent reflect the impact of basic science discoveries in AD. However, none of these could have had clinical relevance for patients without close collaboration between investigators in translational and clinical-related AD research (reflected in indexers and authors' networks). Our approach has relevance for researches in the field, since they can identify relations between different developments which are not otherwise evident. These developments combined with advanced visualization techniques, might aid the discovery of novel interactions between genes and pathways or used as a resource to advance clinical drug development. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
32. Construcción del conocimiento alrededor de sistemas híbridos fotovoltaicos mediante Descubrimiento basado en la literatura
- Author
-
Coral, Marco, Salazar, Miguel, Palacios, Luis, Moquillaza, Santiago, and Álvarez, José C.
- Subjects
minería de texto ,sistemas híbridos fotovoltaicos ,conocimiento implícito ,hybrid photovoltaic systems ,Literature Based Discovery ,text mining ,implicit knowledge ,clustering ,agrupación - Abstract
Given the need to deepen the dynamics in the generation of new knowledge, with respect to renewable energies, in particular around hybrid photovoltaic systems, this study analyzes the research developed, through the “Literature Based Discovery” - LBD. A research chain is developed to capture emerging and other innovative or exploratory aspects. The steps followed have as a priority to generate the extraction and automated search, from the key- words, to group the results looking for their references, in addition to locating research on these semantic terms, concepts and the relationship between them through text mining. The results generate a grouping of topics based on the keywords, which were linked by LBD, these identify dependencies between various analyzes, concluding that LBD techniques generate correlations that are not explicitly seen, so it is necessary to carry out additional analyzes by the expert to correctly determine the implicit knowledge., Ante la necesidad de profundizar la dinámica en la generación de nuevos conocimientos, con relación a las energías renovables, en particular alrededor de los sistemas híbridos fotovoltaicos, en este estudio se examinan las investigaciones elaboradas bajo una estructura de colaboración de investigación, mediante “Literature Based Discovery” - LBD. Se elabora una cadena de búsqueda, para capturar aspectos emergentes y otros aspectos innovadores o exploratorios de energía renovable de los sistemas híbridos fotovoltaicos. Los pasos seguidos tienen como prioridad generar la extracción y búsqueda automatizada, a partir de las palabras-clave, agrupar los resultados buscando las referencias de estos, además de ubicar investigaciones sobre estos términos semánticos, conceptos y la relación entre ellos mediante minería de texto. Los resultados generan agrupación de tópicos en base a las palabras clave, los que han sido vinculados por LBD, estos identifican dependencias entre diversos análisis concluyéndose que las técnicas LBD generan correlaciones que no se ven explícitamente, por lo que es necesario realizar análisis adicionales por parte del experto para determinar correctamente el conocimiento implícito.
- Published
- 2020
33. The contribution of syntactic-semantic approach to the search for complementary literatures for scientific or technical discovery.
- Author
-
Vicente-Gomila, Jose
- Abstract
The present paper tries to show that the current state of the art in syntactics and semantics, in computer systems based on the theory of inventive problem solving known as TRIZ, may help in the task of literature based discovery. With a structured and logic cause linkage between concepts, LBD could be faster and with less expert involvement at the beginning of the LBD process. The author tries to demonstrate the concept with two different problems: the hearing and balance problem known as Meniere's disease, and to some of the current problems in the lithium air batteries for electric vehicles. By using open literature based discovery from An to Bn and from Bn to Cn, and with the logic relationships of real causes and effects approach, the author finds several relative new concepts such as vitamin A. Other concepts as niacin or fish oil, are also found, as potential to help in the Meniere's disease. Secondly, using such procedure the author is able to find patents from disparate domain of expertise, as patents about odor control or metal casting. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
34. Gene–disease association with literature based enrichment.
- Author
-
Tsafnat, Guy, Jasch, Dennis, Misra, Agam, Choong, Miew Keen, Lin, Frank P.-Y., and Coiera, Enrico
- Abstract
Highlights: [•] Knowledge-based functional enrichment for gene prioritization of high throughput data. [•] Automatic ontology generation from MEDLINE. [•] Novel and fully automatic literature-based discovery. [•] Literature ontologies perform better than expert-derived ones. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
35. Mineração de textos biomédicos: uma revisão bibliométrica.
- Author
-
Woszezenki, Cristiane Raquel and Gonçalves, Alexandre Leopoldo
- Abstract
Copyright of Perspectivas em Ciência da Informaçao is the property of Nova Economia and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2013
- Full Text
- View/download PDF
36. Indirect association and ranking hypotheses for literature based discovery
- Author
-
Henry, Sam and McInnes, Bridget T.
- Published
- 2019
- Full Text
- View/download PDF
37. SemNet: Using Local Features to Navigate the Biomedical Concept Graph
- Author
-
Cassie S. Mitchell and Andrew R Sedler
- Subjects
0301 basic medicine ,Histology ,Computer science ,literature based discovery ,Feature vector ,lcsh:Biotechnology ,Biomedical Engineering ,Inference ,Bioengineering ,02 engineering and technology ,text mining ,unsupervised learning ,Literature-based discovery ,03 medical and health sciences ,Software ,lcsh:TP248.13-248.65 ,natural language processing ,Technology Report ,computer.programming_language ,Information retrieval ,business.industry ,Unified Medical Language System ,Bioengineering and Biotechnology ,Cognition ,Python (programming language) ,021001 nanoscience & nanotechnology ,030104 developmental biology ,Unsupervised learning ,0210 nano-technology ,business ,computer ,Biotechnology ,Python - Abstract
Literature-Based Discovery (LBD) aims to connect scientists across silos by assembling models of the literature to reveal previously hidden connections. Unfortunately, LBD systems have been unable to achieve user adoption on a large scale. This work develops opens source software in Python to convert a database of semantic predications of all of PubMed's 27.9 million indexed abstracts into a semantic inference network and biomedical concept graph in Neo4j. The developed software, called SemNet, queries a modified version of the publicly available SemMedDB and computes feature vectors on source-target pairs. Each unique United Medical Language System (UMLS) concept is represented as a node and each predication as an edge. Each node is assigned one of 132 node labels (e.g., Amino Acid, Peptide, or Protein (AAPP); Gene or Genome (GG); etc.) and each edge is labeled with one of 58 predications (e.g. treats, causes, inhibits, etc.). SemNet computes a single feature value for each metapath, or sequence of node types, between a source node and user-specified target node(s). Several different types of metapath-based features (count, degree weighted path count, and HeteSim metric) are computed and vectorized. SemNet employs an unsupervised learning algorithm for rank aggregation (ULARA) to rank identified source nodes that are most relevant to the user-specified target nodes(s). Statistical analysis of correlation among identified source nodes or resultant literature network features are used to identify patterns that can guide future research. Analysis of high residual nodes is used to compare and contrast SemNet rankings between different targets of interest. An example SemNet use case is presented to assess "the differential impact of smoking on cognition in males and females" using the following target nodes: nicotine, learning, memory, tetrahydrocannabinol (THC), cigarette smoke, X chromosome, and Y chromosome. Detailed rankings are discussed. Overall results suggest a hypothesis where smoking negatively impacts cognition to a greater extent in females, but smoking has stronger cardiovascular impacts in males. In summary, SemNet provides an adoptable method for efficient LBD of PubMed that extends beyond omics-only relationships to true multi-scalar connections that can provide actionable insight for predictive medicine, research prioritization, and clinical care.
- Published
- 2019
- Full Text
- View/download PDF
38. Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries.
- Author
-
Bhasuran B
- Subjects
- Data Mining methods, Drug Repositioning, Biomedical Research, Machine Learning
- Abstract
The major outcomes and insights of scientific research and clinical study end up in the form of publication or clinical record in an unstructured text format. Due to advancements in biomedical research, the growth of published literature is getting tremendous large in recent years. The scientists and clinical researchers are facing a big challenge to stay current with the knowledge and to extract hidden information from this sheer quantity of millions of published biomedical literature. The potential one-stop automated solution to this problem is biomedical literature mining. One of the long-standing goals in biology is to discover the disease-causing genes and their specific roles in personalized precision medicine and drug repurposing. However, the empirical approaches and clinical affirmation are expensive and time-consuming. In silico approach using text mining to identify the disease causing genes can contribute towards biomarker discovery. This chapter presents a protocol on combining literature mining and machine learning for predicting biomedical discoveries with a special emphasis on gene-disease relation based discovery. The protocol is presented as a literature based discovery (LBD) pipeline for gene-disease based discovery. The protocol includes our web based tools: (1) DNER (Disease Named Entity Recognizer) for disease entity recognition, (2) BCCNER (Bidirectional, Contextual clues Named Entity Tagger) for gene/protein entity recognition, (3) DisGeReExT (Disease-Gene Relation Extractor) for statistically validated results and visualization, and (4) a newly introduced deep learning based method for association discovery. Our proposed deep learning based method can be generalized and applied to other important biomedical discoveries focusing on entities such as drug/chemical, or miRNA., (© 2022. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.)
- Published
- 2022
- Full Text
- View/download PDF
39. Using Literature Based Discovery to Gain Insights Into the Metabolomic Processes of Cardiac Arrest.
- Author
-
Henry S, Wijesinghe DS, Myers A, and McInnes BT
- Abstract
In this paper, we describe how we applied LBD techniques to discover lecithin cholesterol acyltransferase (LCAT) as a druggable target for cardiac arrest. We fully describe our process which includes the use of high-throughput metabolomic analysis to identify metabolites significantly related to cardiac arrest, and how we used LBD to gain insights into how these metabolites relate to cardiac arrest. These insights lead to our proposal (for the first time) of LCAT as a druggable target; the effects of which are supported by in vivo studies which were brought forth by this work. Metabolites are the end product of many biochemical pathways within the human body. Observed changes in metabolite levels are indicative of changes in these pathways, and provide valuable insights toward the cause, progression, and treatment of diseases. Following cardiac arrest, we observed changes in metabolite levels pre- and post-resuscitation. We used LBD to help discover diseases implicitly linked via these metabolites of interest. Results of LBD indicated a strong link between Fish Eye disease and cardiac arrest. Since fish eye disease is characterized by an LCAT deficiency, it began an investigation into the effects of LCAT and cardiac arrest survival. In the investigation, we found that decreased LCAT activity may increase cardiac arrest survival rates by increasing ω -3 polyunsaturated fatty acid availability in circulation. We verified the effects of ω -3 polyunsaturated fatty acids on increasing survival rate following cardiac arrest via in vivo with rat models., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2021 Henry, Wijesinghe, Myers and McInnes.)
- Published
- 2021
- Full Text
- View/download PDF
40. Stability and change in public health studies in Colombia and Mexico: an exploratory approach based on co-word analysis
- Author
-
Rocío Quiliano-Terreros and Carlos Vílchez-Román
- Subjects
medicine.medical_specialty ,Latin Americans ,Correlation coefficient ,Closeness ,Stability (learning theory) ,Review ,Colombia ,terminología como asunto ,Descoberta baseada em literatura ,03 medical and health sciences ,Perú ,0302 clinical medicine ,Literature based discovery ,Descubrimiento basado en la literatura ,salud pública ,saúde pública ,Statistics ,Peru ,medicine ,terminology as topic ,030212 general & internal medicine ,Mexico ,030505 public health ,Public health ,México ,public health ,Public Health, Environmental and Occupational Health ,terminologia como assunto ,Colômbia ,Visual inspection ,Geography ,Thematic map ,0305 other medical science ,Centrality - Abstract
To determine the level of stability or change in topic areas published by public health journals in Latin America and the Caribbean, using keywords and co-word analysis, in order to support evidence-based research planning.Keywords were extracted from papers indexed in ScopusA moderate level of change in thematic coverage was observed in 2005 - 2010, as evidenced by the correlation coefficients for two of the 3-year periods, 2005 - 2007 and 2008 - 2010: 0.545 for RSP and 0.593 for SPM. However, in 2008 - 2013, more keywords remained constant from one period to the next, given the size of the correlation coefficients for the last 3-year periods: 2008 - 2010 and 2011 - 2013: 0.727 for RSP and 0.605 for SPM.The research hypothesis was partially accepted given that just two consecutive 3-year periods showed a statistically-significant degree of stability in thematic coverage in public health studies. In that sense, this study provides compelling evidence of the effectiveness of using a combined approach for examining the dynamics of thematic coverage: centrality measures for identifying the main keywords and visual inspection for detecting the structure of textual information.Determinar el grado de estabilidad o cambio en los temas que se publican en las revistas de salud pública de América Latina y el Caribe, por medio del análisis de palabras clave y copalabras, a fin de sustentar la planificación de investigaciones basadas en la evidencia.Se extrajeron las palabras clave de los artículos indizados en Scopus® publicados por tres revistas —En el período 2005-2010 se observó un nivel moderado de cambios en la cobertura temática, como lo demuestran los coeficientes de correlación correspondientes a dos períodos trianuales (2005-2007 y 2008-2010): 0,545 en RSP y 0,593 en SPM. En cambio, en el período 2008-2013 un mayor número de palabras clave se mantuvo constante de un período al siguiente, considerando la magnitud de los coeficientes de correlación de los últimos períodos trianuales: 2008-2010 y 2011-2013: 0,727 en RSP y 0,605 en SPM.La hipótesis de investigación fue aceptada parcialmente considerando que solo en dos períodos trianuales consecutivos se observó una estabilidad estadísticamente significativa en la cobertura temática de los artículos sobre salud pública. En ese sentido, el presente estudio aporta datos convincentes sobre la eficacia de usar un enfoque combinado para examinar la dinámica de la cobertura temática: medidas de centralidad para determinar las principales palabras clave e inspección visual para determinar la estructura de la información textual.Determinar o nível de estabilidade ou mudança em áreas de tópicos publicadas por periódicos de saúde pública na América Latina e no Caribe, com o uso de palavras-chave e da análise da coocorrência das palavras, para apoiar o planejamento de pesquisas com fundamentação científica.As palavras-chave foram extraídas de artigos indexados na Scopus® publicados naUm nível moderado de mudança na cobertura temática foi observado no período de 2005–2010, como evidenciado pelos coeficientes de correlação em dois dos triênios estudados (2005–2007 e 2008–2010): 0,545 para RSP e 0,593 para SPM. Porém, no triênio de 2008–2013, verificou-se que um número maior de palavras-chave continuou constante de um período ao outro, como demonstrado pelos coeficientes de correlação para os últimos triênios (2008–2010 e 2011–2013): 0,727 para RSP e 0,605 para SPM.A hipótese de pesquisa foi parcialmente aceita visto que somente dois triênios consecutivos apresentaram um nível estatisticamente significativo de estabilidade na cobertura temática em estudos de saúde pública. O presente estudo fornece evidências convincentes da efetividade de usar um enfoque combinado para examinar a dinâmica da cobertura temática: medidas de centralidade para identificar as principais palavras-chave e inspeção visual para detectar a estrutura da informação textual.
- Published
- 2016
41. A survey on literature based discovery approaches in biomedical domain.
- Author
-
Gopalakrishnan, Vishrawas, Jha, Kishlay, Jin, Wei, and Zhang, Aidong
- Abstract
Literature Based Discovery (LBD) refers to the problem of inferring new and interesting knowledge by logically connecting independent fragments of information units through explicit or implicit means. This area of research, which incorporates techniques from Natural Language Processing (NLP), Information Retrieval and Artificial Intelligence, has significant potential to reduce discovery time in biomedical research fields. Formally introduced in 1986, LBD has grown to be a significant and a core task for text mining practitioners in the biomedical domain. Together with its inter-disciplinary nature, this has led researchers across domains to contribute in advancing this field of study. This survey attempts to consolidate and present the evolution of techniques in this area. We cover a variety of techniques and provide a detailed description of the problem setting, the intuition, the advantages and limitations of various influential papers. We also list the current bottlenecks in this field and provide a general direction of research activities for the future. In an effort to be comprehensive and for ease of reference for off-the-shelf users, we also list many publicly available tools for LBD. We hope this survey will act as a guide to both academic and industry (bio)-informaticians, introduce the various methodologies currently employed and also the challenges yet to be tackled. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
42. Exploring relation types for literature-based discovery
- Author
-
Mark Stevenson, Judita Preiss, and Robert Gaizauskas
- Subjects
Relation (database) ,Computer science ,literature based discovery ,knowledge discovery ,Information Storage and Retrieval ,Health Informatics ,02 engineering and technology ,text mining ,USable ,Machine learning ,computer.software_genre ,Literature-based discovery ,03 medical and health sciences ,Knowledge extraction ,Simple (abstract algebra) ,0202 electrical engineering, electronic engineering, information engineering ,Focus on Natural Language Processing ,natural language processing ,030304 developmental biology ,0303 health sciences ,Shallow parsing ,business.industry ,Linguistics ,Replication (computing) ,Range (mathematics) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Objective Literature-based discovery (LBD) aims to identify “hidden knowledge” in the medical literature by: (1) analyzing documents to identify pairs of explicitly related concepts (terms), then (2) hypothesizing novel relations between pairs of unrelated concepts that are implicitly related via a shared concept to which both are explicitly related. Many LBD approaches use simple techniques to identify semantically weak relations between concepts, for example, document co-occurrence. These generate huge numbers of hypotheses, difficult for humans to assess. More complex techniques rely on linguistic analysis, for example, shallow parsing, to identify semantically stronger relations. Such approaches generate fewer hypotheses, but may miss hidden knowledge. The authors investigate this trade-off in detail, comparing techniques for identifying related concepts to discover which are most suitable for LBD.Materials and methods A generic LBD system that can utilize a range of relation types was developed. Experiments were carried out comparing a number of techniques for identifying relations. Two approaches were used for evaluation: replication of existing discoveries and the “time slicing” approach.1Results Previous LBD discoveries could be replicated using relations based either on document co-occurrence or linguistic analysis. Using relations based on linguistic analysis generated many fewer hypotheses, but a significantly greater proportion of them were candidates for hidden knowledge.Discussion and Conclusion The use of linguistic analysis-based relations improves accuracy of LBD without overly damaging coverage. LBD systems often generate huge numbers of hypotheses, which are infeasible to manually review. Improving their accuracy has the potential to make these systems significantly more usable.
- Published
- 2015
43. The impact factor of an open access journal does not contribute to an article’s citations
- Author
-
Vijay Krishnan, P Y Kew, Dinker R Pai, S K Chua, Than Winn, Suneet Sood, Laila B Kamal, Sharmilla Gunasegaran, M Z Afzal, J Y Gan, Ahmad Munir Qureshi, and Lahiru Ambawatta
- Subjects
Bibliometric analysis ,Web of science ,literature based discovery ,Scopus ,Bibliometrics ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,bibliometric analysis ,0302 clinical medicine ,Medicine ,030212 general & internal medicine ,publications ,Google Scholar ,General Pharmacology, Toxicology and Pharmaceutics ,open access ,General Immunology and Microbiology ,Impact factor ,business.industry ,05 social sciences ,Articles ,information science ,General Medicine ,Web of Science ,bibliometrics ,0509 other social sciences ,050904 information & library sciences ,business ,Public Engagement ,Open access journal ,Research Article ,Demography - Abstract
Background Citations of papers are positively influenced by the journal’s impact factor (IF). For non-open access (non-OA) journals, this influence may be due to the fact that high-IF journals are more often purchased by libraries, and are therefore more often available to researchers, than low-IF journals. This positive influence has not, however, been shown specifically for papers published in open access (OA) journals, which are universally accessible, and do not need library purchase. It is therefore important to ascertain if the IF influences citations in OA journals too. Methods 203 randomized controlled trials (102 OA and 101 non-OA) published in January 2011 were included in the study. Five-year citations for papers published in OA journals were compared to those for non-OA journals. Source papers were derived from PubMed. Citations were retrieved from Web of Science, Scopus, and Google Scholar databases. The Thompson-Reuter’s IF was used. Results OA journals were found to have significantly more citations overall compared to non-OA journals (median 15.5 vs 12, p=0.039). The IF did not correlate with citations for OA journals (Spearman’s rho =0.187, p=0.60). The increase in the citations with increasing IF was minimal for OA journals (beta coefficient = 3.346, 95% CI -0.464, 7.156, p=0.084). In contrast, the IF did show moderate correlation with citations for articles published in non-OA journals (Spearman’s rho=0.514, p Conclusion It is better to publish in an OA journal for more citations. It may not be worth paying high publishing fees for higher IF journals, because there is minimal gain in terms of increased number of citations. On the other hand, if one wishes to publish in a non-OA journal, it is better to choose one with a high IF.
- Published
- 2017
- Full Text
- View/download PDF
44. A Method to Accelerate and Visualize Iterative Clinical Paper Searching.
- Author
-
Yu Y, Xu E, Xia E, Huang H, Hao B, and Zhang S
- Subjects
- Cluster Analysis, Humans, Research Personnel
- Abstract
Clinical paper searching is a major task for clinical researchers to collect authoritative and up-to-date evidences to support their research works and clinical practices. Currently, this task needs huge amount of labor work. Researchers usually spend a lot of time searching on the online repository and iterating many times to get the final paper list. Systematic review is a special case, in which the paper searching process is a critical step. To address this challenge, this paper introduces a method to streamline the iterative paper searching process. It automatically selects the most probably matched papers, and then generates new search strategy. All the intermediate results are visualized based on the paper citation graph. It assembles technologies such as PageRank and Topic-based clustering to accelerate the paper searching tasks. The precision, recall, and execution time of the proposed method are then evaluated by comparing with published systematic reviews.
- Published
- 2019
- Full Text
- View/download PDF
45. SemNet: Using Local Features to Navigate the Biomedical Concept Graph.
- Author
-
Sedler AR and Mitchell CS
- Abstract
Literature-Based Discovery (LBD) aims to connect scientists across silos by assembling models of the literature to reveal previously hidden connections. Unfortunately, LBD systems have been unable to achieve user adoption on a large scale. This work develops opens source software in Python to convert a database of semantic predications of all of PubMed's 27.9 million indexed abstracts into a semantic inference network and biomedical concept graph in Neo4j. The developed software, called SemNet, queries a modified version of the publicly available SemMedDB and computes feature vectors on source-target pairs. Each unique United Medical Language System (UMLS) concept is represented as a node and each predication as an edge. Each node is assigned one of 132 node labels (e.g., Amino Acid, Peptide, or Protein (AAPP); Gene or Genome (GG); etc.) and each edge is labeled with one of 58 predications (e.g. treats, causes, inhibits, etc.). SemNet computes a single feature value for each metapath, or sequence of node types, between a source node and user-specified target node(s). Several different types of metapath-based features (count, degree weighted path count, and HeteSim metric) are computed and vectorized. SemNet employs an unsupervised learning algorithm for rank aggregation (ULARA) to rank identified source nodes that are most relevant to the user-specified target nodes(s). Statistical analysis of correlation among identified source nodes or resultant literature network features are used to identify patterns that can guide future research. Analysis of high residual nodes is used to compare and contrast SemNet rankings between different targets of interest. An example SemNet use case is presented to assess "the differential impact of smoking on cognition in males and females" using the following target nodes: nicotine, learning, memory, tetrahydrocannabinol (THC), cigarette smoke, X chromosome, and Y chromosome. Detailed rankings are discussed. Overall results suggest a hypothesis where smoking negatively impacts cognition to a greater extent in females, but smoking has stronger cardiovascular impacts in males. In summary, SemNet provides an adoptable method for efficient LBD of PubMed that extends beyond omics-only relationships to true multi-scalar connections that can provide actionable insight for predictive medicine, research prioritization, and clinical care.
- Published
- 2019
- Full Text
- View/download PDF
46. Mineração de textos biomédicos: uma revisão bibliométrica
- Author
-
Cristiane Raquel Woszezenki and Alexandre Leopoldo Gonçalves
- Subjects
Text mining ,Literature based discovery ,Bioinformatics ,Bibliography. Library science. Information resources - Abstract
A mineração de textos vem sendo, cada vez mais, empregada para automatizar o processo de extração de informações importantes, contidas em textos biomédicos, possibilitando que os pesquisadores fiquem a par do desenvolvimento da biomedicina. Considerando a importância deste campo de pesquisa, este artigo apresenta um mapeamento das publicações científicas sobre mineração de textos biomédicos e discute as principais tarefas desse campo de pesquisa, as quais os pesquisadores têm dedicado maior atenção. Para isso, foi utilizada a bibliometria, uma técnica que permite analisar o desenvolvimento de um campo da ciência, visando identificar suas características. O mapeamento apresentado promove o conhecimento sobre o histórico e o estado atual do campo de pesquisa e disponibiliza insumos, que permitem enriquecer a discussão sobre os possíveis rumos que as pesquisas, na área, têm tomado e as prováveis tendências científicas para os pesquisadores e interessados no tema.
- Full Text
- View/download PDF
47. Characteristics of the similarity index in a Korean medical journal
- Author
-
Younsuk Lee, Seunghyun Chung, Jeunghyuk Lee, Ha Yeon Park, and Daehwan Kim
- Subjects
Index (economics) ,020205 medical informatics ,literature based discovery ,social media ,similarity index ,02 engineering and technology ,Bibliometrics ,lcsh:RD78.3-87.3 ,Literature-based discovery ,03 medical and health sciences ,0302 clinical medicine ,Similarity (network science) ,Statistics ,0202 electrical engineering, electronic engineering, information engineering ,Medicine ,Social media ,030212 general & internal medicine ,Medical journal ,Clinical Research Article ,Information retrieval ,business.industry ,Anesthesiology and Pain Medicine ,lcsh:Anesthesiology ,bibliometrics ,plagiarism ,business ,Citation ,Value (mathematics) - Abstract
Background Journal editors have exercised their control over submitted papers having a high similarity index. Despite widespread suspicion of possible plagiarism on a high similarity index, our study focused on the real effect of the similarity index on the value of a scientific paper. Methods This research examined the percent values of the similarity index from 978 submitted (420 published) papers in the Korean Journal of Anesthesiology since 2012. Thus, this study aimed to identify the correlation between the similarity index and the value of a paper. The value of a paper was evaluated in two distinct phases (during a peer-review process vs. after publication), and the value of a published paper was evaluated in two aspects (academic citation vs. social media appearance). Results Yearly mean values of the similarity index ranged from 16% to 19%. There were 254 papers cited at least once and 179 papers appearing at least once in social media. The similarity index affected the acceptance/rejection of a paper in various ways; although the influence was not linear and the cutoff measures were distinctive among the types of papers, both extremes were related to a high rate of rejection. After publication, the similarity index had no effect on academic citation or social media appearance according to the paper. Conclusions The finding suggested that the similarity index no longer had an influence on academic citation or social media appearance according to the paper after publication, while the similarity index affected the acceptance/rejection of a submitted paper. Proofreading and intervention for finalizing the draft by the editors might play a role in achieving uniform quality of the publication.
- Published
- 2017
- Full Text
- View/download PDF
48. Kaba Küme Teorisinin Literatür Tabanlı Bilgi Keşfine Uygulanması
- Author
-
Mehmet Güleç, Fatih and Sever, Hayri
- Subjects
Literature based discovery - Abstract
Science is a collection of academic studies to clarify the beening wondered. The results of academic studies are released a scientific writing style in order to share with other researchers and these publications are considered to be the most important output of the academic studies. Examination of these publications by other scientists with their present knowledge, new research ideas may occur. The acedemic studies which are examining the publication with information systems by simulating how the new research ideas occur in the human brain are referred to as Literature Discovery Based.In the Literature Based Discovery studies, pieces of information are created by associating terms according to their common publications. By applying chain rule to these information pieces, establishing new ideas is aimed. The ABC model focuses on the user interested subject and finds terms which are not directly related to the subject but also indirectly related through some common terms. Bilimsel ilerlemede paralel alanlarda disiplinler arası çalışmaları, yeniliğin ve gelişmenin anahtarı olarak kabul edilir. Benzer şekilde, bilim inşalarının bir alana hakim olması ancak birden çok alanda da bilgi birikimi olması beklenir. Birbirinden farklı çalışmaların, bir araya getirilmesi ile elde edilen sinerjiden doğan yeni fikirler, bilgisayarlı sistemlerde benzetimi yapılması fikri ile Literatür Tabanlı Bilgi Keşfi – LTB uygulamaları ortaya çıkmıştır. LTB, bilimsel yayınlardan elde edilen ilişkilerin çaprazlanarak yeni ve önemli fikirlerin oluşturulmasını hedeflemektedir. LTB ile, birden çok çalışmanın birbirini tamamlayıcı bir şekilde bir araya getirilmesi sonucu yeni ve özgün çıkarımların yapılması hedeflenmektedir. Bu amaçla terimler arası ilişkiler, geçtikleri ortak yayınlar üzerinden kurulmaktadır.
- Published
- 2013
49. Distribution of 'Characteristic' Terms in MEDLINE Literatures
- Author
-
Neil R. Smalheiser, Wei Zhou, and Vetle I. Torvik
- Subjects
Information retrieval ,Distribution (number theory) ,lcsh:T58.5-58.64 ,Computer science ,lcsh:Information technology ,literature based discovery ,MEDLINE ,Function (mathematics) ,text mining ,Term (time) ,Set (abstract data type) ,Literature-based discovery ,annotation ,term occurrence ,information retrieval ,Sentence ,Information Systems - Abstract
Given the occurrence frequency of any term within any set of articles within MEDLINE, we define ―characteristic‖ terms as words and phrases that occur in that literature more frequently than expected by chance (at p < 0.001 or better). In this report, we studied how the cut-off criterion varied as a function of literature size and term frequency in MEDLINE as a whole, and have compared the distribution of characteristic terms within a number of journal-defined, affiliation-defined and random literatures. We also investigated how the characteristic terms were distributed among MEDLINE titles, abstracts, and last sentence of abstracts, including ―regularized‖ terms that appear both in the title and abstract of the same paper for at least one paper in the literature. For a set of 10 disciplinary journals, the characteristic terms comprised 18% of the total terms on average. Characteristic terms are utilized in several of our web-based services (Anne O’Tate and Arrowsmith), and should be useful for a variety of other information-processing tasks designed to improve text mining in MEDLINE.
- Published
- 2011
- Full Text
- View/download PDF
50. Unsupervised Mining of Knowledge Gaps in Scientific Literature
- Author
-
Fernandez¹, Silvia, Jourlin, Pierre, Sanjuan², Eric, jourlin, pierre, Laboratoire Informatique d'Avignon (LIA), and Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI
- Subjects
ACM: H.: Information Systems ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,Formal Concept Lattices ,Literature Based Discovery ,[INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR] ,ACM: D.: Software ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,Natural Language Processing - Abstract
Literature Based Discovery (LBD) relies on the identification of gaps in the scientific literature. Most of the existing methods are supervised and rely on the use of specific large knowledge domain databases like MedLine for medical study. We present here a tractable approach based on Natural Language Processing techniques with few linguistic resources and Formal Concept Lattice exploration. Entities are automatically extracted from full text scientific papers based on their acronym forms. An unsupervised classification is build using syntax and WordNet relations. Resulting classes are clustered into multiple formal concepts and the knowledge gaps are identified in the resulting Galois Lattice. The feasibility and the relevance of the outcome is analyzed on a large corpus of fulltext journal articles dealing with nuclear energy research., La découverte au travers de la littérature (Literature Based Discovery ou LBD) repose sur l'identification des lacunes dans la littérature scientifique. La plupart des méthodes existantes sont supervisées et s'appuient sur l'utilisation de larges bases de connaissances spécifiques telles que MEDLINE dans le domaine de la médecine. Dans cet article, nous présentons une approche fondée sur des techniques de Traitement de la Langue Naturelle (TALN), d'exploration de treillis de concepts formels et sur une utilisation minimale de ressources linguistiques. Les entités sont extraites automatiquement à partir du texte intégral des articles scientifiques en fonction de leurs acronymes. Une classification non supervisée est construite en utilisant la syntaxe et les relations de WordNet. Les classes résultantes sont regroupées en plusieurs concepts formels et les lacunes de connaissances sont définies dans le treillis de Galois induit. La faisabilité et la pertinence des résultats sont analysées sur un large corpus textuel d'articles de revues portant sur la recherche en énergie nucléaire.
- Published
- 2010
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.