23 results on '"triple extraction"'
Search Results
2. A Study on Double-Headed Entities and Relations Prediction Framework for Joint Triple Extraction.
- Author
-
Xiao, Yanbing, Chen, Guorong, Du, Chongling, Li, Lang, Yuan, Yu, Zou, Jincheng, and Liu, Jingcheng
- Subjects
- *
NATURAL language processing , *KNOWLEDGE graphs , *DATA mining - Abstract
Relational triple extraction, a fundamental procedure in natural language processing knowledge graph construction, assumes a crucial and irreplaceable role in the domain of academic research related to information extraction. In this paper, we propose a Double-Headed Entities and Relations Prediction (DERP) framework, which divides the entity recognition process into two stages: head entity recognition and tail entity recognition, using the obtained head and tail entities as inputs. By utilizing the corresponding relation and the corresponding entity, the DERP framework further incorporates a triple prediction module to improve the accuracy and completeness of the joint relation triple extraction. We conducted experiments on two English datasets, NYT and WebNLG, and two Chinese datasets, DuIE2.0 and CMeIE-V2, and compared the English dataset experimental results with those derived from ten baseline models. The experimental results demonstrate the effectiveness of our proposed DERP framework for triple extraction. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. A Knowledge Graph Embedding Model Based on Cyclic Consistency—Cyclic_CKGE.
- Author
-
Li, Jialong, Guo, Zhonghua, He, Jiahao, Ma, Xiaoyan, and Ma, Jing
- Subjects
KNOWLEDGE graphs ,INFORMATION retrieval ,PROBLEM solving ,DATA visualization - Abstract
Most of the existing medical knowledge maps are incomplete and need to be completed/predicted to obtain a complete knowledge map. To solve this problem, we propose a knowledge graph embedding model (Cyclic_CKGE) based on cyclic consistency. The model first uses the "graph" constructed with the head entity and relationship to predict the tail entity, and then uses the "inverse graph" constructed with the tail entity and relationship to predict the head entity. Finally, the semantic space distance between the head entity and the original head entity should be very close, which solves the reversibility problem of the network. The Cyclic_CKGE model with a parameter of 0.46 M has the best results on FB15k-237, reaching 0.425 Hits@10. Compared with the best model R-GCN, its parameter exceeds 8 M and reaches 0.417 Hits@10. Overall, Cyclic_CKGE's parametric efficiency is more than 17 times that of R-GCNs and more than 8 times that of DistMult. In order to better show the practical application of the model, we construct a visual medical information platform based on a medical knowledge map. The platform has three kinds of disease information retrieval methods: conditional query, path query and multi-symptom disease inference. This provides a theoretical method and a practical example for realizing knowledge graph visualization. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. A triple joint extraction method combining hybrid embedding and relational label embedding
- Author
-
Jianfeng DAI, Xingyu CHEN, Ligang DONG, and Xian JIANG
- Subjects
triple extraction ,relational embedding ,BERT ,attention mechanism ,pointer annotation ,Telecommunication ,TK5101-6720 ,Technology - Abstract
The purpose of triple extraction is to obtain relationships between entities from unstructured text and apply them to downstream tasks.The embedding mechanism has a great impact on the performance of the triple extraction model, and the embedding vector should contain rich semantic information that is closely related to the relationship extraction task.In Chinese datasets, the information contained between words is very different, and in order to avoid the loss of semantic information problems generated by word separation errors, a triple joint extraction method combining hybrid embedding and relational label embedding (HEPA) was designed, and a hybrid embedding means that combines letter embedding and word embedding was proposed to reduce the errors generated by word separation errors.A relational embedding mechanism that fuses text and relational labels was added, and an attention mechanism was used to distinguish the relevance of entities in a sentence with different relational labels, thus improving the matching accuracy.The method of matching entities with pointer annotation was used, which improved the extraction effect on relational overlapping triples.Comparative experiments are conducted on the publicly available DuIE dataset, and the F1 value of HEPA is improved by 2.8% compared to the best performing baseline model (CasRel).
- Published
- 2023
- Full Text
- View/download PDF
5. An Entity-Relation Joint Extraction Method Based on Two Independent Sub-Modules From Unstructured Text
- Author
-
Su Liu, Wenqi Lyu, Xiao Ma, and Jike Ge
- Subjects
BERT ,cascade decoding ,entity recognition ,relation extraction ,triple extraction ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Extracting entity, relation, and attribute information from unstructured text is crucial for constructing large-scale knowledge graphs (KG). Existing research approaches either focus on entity recognition before relation extraction or employ unified annotation. However, these methods overlook the intrinsic relation between entity recognition and relation extraction, resulting in ineffective handling of triple overlap issues where multiple relations share the same entity in a sentence. To address these challenges, this paper proposes an entity-relation joint extraction model comprising two independent sub-modules: one for extracting the head entity and the other for extracting the tail entity and its corresponding relation. The model generates candidate entities and relations by enumerating token sequences in sentences, and then uses the two sub-modules to predict entities and relations. The predicted entities and relations are jointly decoded to obtain relational triples, avoiding error propagation and solving redundancy, entity overlap, and poor generalization. Extensive experiments demonstrate that our model achieves state-of-the-art performance on WebNLG, NYT, WebNLG*, and NYT* public benchmarks. It outperforms all baselines on the WebNLG* dataset, showing significant improvements in different types of triples: normal, SEO, and EPO by 3.8%, 2.9%, and 5.5%, respectively, compared to ETL-Span. For the NYT* dataset, our method improves by 5.7% in triples of Normal type, thereby confirming its effectiveness.
- Published
- 2023
- Full Text
- View/download PDF
6. 融入关系形式化概念的端到端三元组抽取.
- Author
-
程春雷, 邹静, 叶青, 张素华, 蓝勇, and 杨瑞
- Abstract
Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
7. 融合混合嵌入与关系标签嵌入的三元组联合抽取方法.
- Author
-
戴剑锋, 陈星妤, 董黎刚, and 蒋献
- Abstract
Copyright of Telecommunications Science is the property of Beijing Xintong Media Co., Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
8. A Knowledge Graph Embedding Model Based on Cyclic Consistency—Cyclic_CKGE
- Author
-
Jialong Li, Zhonghua Guo, Jiahao He, Xiaoyan Ma, and Jing Ma
- Subjects
knowledge graph ,intelligent medical systems ,triple extraction ,disease relational reasoning ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Most of the existing medical knowledge maps are incomplete and need to be completed/predicted to obtain a complete knowledge map. To solve this problem, we propose a knowledge graph embedding model (Cyclic_CKGE) based on cyclic consistency. The model first uses the “graph” constructed with the head entity and relationship to predict the tail entity, and then uses the “inverse graph” constructed with the tail entity and relationship to predict the head entity. Finally, the semantic space distance between the head entity and the original head entity should be very close, which solves the reversibility problem of the network. The Cyclic_CKGE model with a parameter of 0.46 M has the best results on FB15k-237, reaching 0.425 Hits@10. Compared with the best model R-GCN, its parameter exceeds 8 M and reaches 0.417 Hits@10. Overall, Cyclic_CKGE’s parametric efficiency is more than 17 times that of R-GCNs and more than 8 times that of DistMult. In order to better show the practical application of the model, we construct a visual medical information platform based on a medical knowledge map. The platform has three kinds of disease information retrieval methods: conditional query, path query and multi-symptom disease inference. This provides a theoretical method and a practical example for realizing knowledge graph visualization.
- Published
- 2023
- Full Text
- View/download PDF
9. A Study on Double-Headed Entities and Relations Prediction Framework for Joint Triple Extraction
- Author
-
Yanbing Xiao, Guorong Chen, Chongling Du, Lang Li, Yu Yuan, Jincheng Zou, and Jingcheng Liu
- Subjects
triple extraction ,entity recognition ,relation extraction ,joint extraction ,Mathematics ,QA1-939 - Abstract
Relational triple extraction, a fundamental procedure in natural language processing knowledge graph construction, assumes a crucial and irreplaceable role in the domain of academic research related to information extraction. In this paper, we propose a Double-Headed Entities and Relations Prediction (DERP) framework, which divides the entity recognition process into two stages: head entity recognition and tail entity recognition, using the obtained head and tail entities as inputs. By utilizing the corresponding relation and the corresponding entity, the DERP framework further incorporates a triple prediction module to improve the accuracy and completeness of the joint relation triple extraction. We conducted experiments on two English datasets, NYT and WebNLG, and two Chinese datasets, DuIE2.0 and CMeIE-V2, and compared the English dataset experimental results with those derived from ten baseline models. The experimental results demonstrate the effectiveness of our proposed DERP framework for triple extraction.
- Published
- 2023
- Full Text
- View/download PDF
10. A Novel Conditional Knowledge Graph Representation and Construction
- Author
-
Zheng, Tingyue, Xu, Ziqiang, Li, Yufan, Zhao, Yuan, Wang, Bin, Yang, Xiaochun, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Fang, Lu, editor, Chen, Yiran, editor, Zhai, Guangtao, editor, Wang, Jane, editor, Wang, Ruiping, editor, and Dong, Weisheng, editor
- Published
- 2021
- Full Text
- View/download PDF
11. 基于地理位置信息的知识图谱查询方法.
- Author
-
李怡霈 and 王宇翔
- Subjects
- *
KNOWLEDGE graphs , *K-nearest neighbor classification , *NATURAL languages - Abstract
Existing knowledge graph query methods ignore the geographic location information of entities themselves, so they do not support geographic location related queries. In view of this problem, on the basis of the hybrid knowledge graph integrating geographic location information, this study proposes a knowledge graph query method based on geographic location information. By extracting the triples from the query problem, the corresponding query graph is constructed to understand the natural language query problem. The query problems based on geographic location information are divided into six categories, and combined with the existing semantic query methods for fact-based problems, the corresponding knowledge graph query methods are studied according to the query graph or K-nearest neighbor search idea. Experimental results show that the accuracy rate of the proposed method can reach more than 77%, which can provide effective support for query based on geographic location information. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. Representation and association of Chinese financial equity knowledge driven by multilayer ontology.
- Author
-
Zhenghao Liu, Zhijian Zhang, Xi Zeng, and Huakui Lv
- Subjects
- *
KNOWLEDGE representation (Information theory) , *CONCEPTUAL structures , *ONTOLOGY , *FINANCIAL management , *ONTOLOGIES (Information retrieval) - Abstract
Aiming at the current situation of complex financial ownership structure and isolated data organization, this study referring to the methods for multi-layer hierarchical construct domain ontology modeling. At the same time, the three dimensions of industry, company and internal environment were integrated, and the concept cube was designed and constructed based on knowledge extraction and text classification technology, so as to provide a multi-level and fine-grained knowledge representation and association method for financial equity knowledge. The experimental results show that conceptual cube structure represents semantic information as a dense lowdimensional representation vector, which greatly enhances semantic relevance and interpretability. The multilayer ontology-driven ownership structure reflects a variety of knowledge association patterns, and in the "Intelligent Financial Big Data System" developed by the research team, the association query of three categories of association relationships in the field of industry, enterprise and internal environment is realized, as well as the dynamic analysis and supervision of typical financial management problems. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. SF-GPT: A training-free method to enhance capabilities for knowledge graph construction in LLMs.
- Author
-
Sun, Lizhuang, Zhang, Peng, Gao, Fang, An, Yuan, Li, Zhixing, and Zhao, Yuanwei
- Subjects
- *
LANGUAGE models , *KNOWLEDGE graphs , *INFORMATION retrieval , *NOISE - Abstract
Knowledge graphs (KGs) are constructed by extracting knowledge triples from text and fusing knowledge, enhancing information retrieval efficiency. Current methods for knowledge triple extraction include "Pretrain and Fine-tuning" and Large Language Models (LLMs). The former shifts effort from manual extraction to dataset annotation and suffers from performance degradation with different test and training set distributions. LLMs-based methods face errors and incompleteness in extraction. We introduce SF-GPT, a training-free method to address these issues. Firstly, we propose the Entity Extraction Filter (EEF) module to filter triple generation results, addressing evaluation and cleansing challenges. Secondly, we introduce a training-free Entity Alignment Module based on Entity Alias Generation (EAG), tackling semantic richness and interpretability issues in LLM-based knowledge fusion. Finally, our Self-Fusion Subgraph strategy uses multi-response self-fusion and a common entity list to filter triple results, reducing noise from LLMs' multi-responses. In experiments, SF-GPT showed a 55.5% increase in recall and a 32.6% increase in F1 score on the BDNC dataset compared to the UniRel model trained on the NYT dataset and achieved a 5% improvement in F1 score compared to GPT-4+EEF baseline on the WebNLG dataset in the case of a fusion round of three. SF-GPT offers a promising way to extract knowledge from unstructured information. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
14. A Methodology for Open Information Extraction and Representation from Large Scientific Corpora: The CORD-19 Data Exploration Use Case.
- Author
-
Papadopoulos, Dimitris, Papadakis, Nikolaos, and Litke, Antonis
- Subjects
DATA mining ,CONCEPT mapping ,REPRESENTATIONS of graphs ,ELECTRONIC records ,CORPORA ,NATURAL language processing ,INFORMATION storage & retrieval systems - Abstract
Featured Application: Open Information Extraction on the COVID-19 Open Research Dataset (CORD-19). The usefulness of automated information extraction tools in generating structured knowledge from unstructured and semi-structured machine-readable documents is limited by challenges related to the variety and intricacy of the targeted entities, the complex linguistic features of heterogeneous corpora, and the computational availability for readily scaling to large amounts of text. In this paper, we argue that the redundancy and ambiguity of subject–predicate–object (SPO) triples in open information extraction systems has to be treated as an equally important step in order to ensure the quality and preciseness of generated triples. To this end, we propose a pipeline approach for information extraction from large corpora, encompassing a series of natural language processing tasks. Our methodology consists of four steps: i. in-place coreference resolution, ii. extractive text summarization, iii. parallel triple extraction, and iv. entity enrichment and graph representation. We manifest our methodology on a large medical dataset (CORD-19), relying on state-of-the-art tools to fulfil the aforementioned steps and extract triples that are subsequently mapped to a comprehensive ontology of biomedical concepts. We evaluate the effectiveness of our information extraction method by comparing it in terms of precision, recall, and F1-score with state-of-the-art OIE engines and demonstrate its capabilities on a set of data exploration tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
15. Improving Question-Answering for Portuguese Using Triples Extracted from Corpora
- Author
-
Rodrigues, Ricardo, Gomes, Paulo, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Silva, João, editor, Ribeiro, Ricardo, editor, Quaresma, Paulo, editor, Adami, André, editor, and Branco, António, editor
- Published
- 2016
- Full Text
- View/download PDF
16. RAPPORT — A Portuguese Question-Answering System
- Author
-
Rodrigues, Ricardo, Gomes, Paulo, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Pereira, Francisco, editor, Machado, Penousal, editor, Costa, Ernesto, editor, and Cardoso, Amílcar, editor
- Published
- 2015
- Full Text
- View/download PDF
17. A Methodology for Open Information Extraction and Representation from Large Scientific Corpora: The CORD-19 Data Exploration Use Case
- Author
-
Dimitris Papadopoulos, Nikolaos Papadakis, and Antonis Litke
- Subjects
information extraction ,triple extraction ,bioinformatics ,data mining ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
The usefulness of automated information extraction tools in generating structured knowledge from unstructured and semi-structured machine-readable documents is limited by challenges related to the variety and intricacy of the targeted entities, the complex linguistic features of heterogeneous corpora, and the computational availability for readily scaling to large amounts of text. In this paper, we argue that the redundancy and ambiguity of subject–predicate–object (SPO) triples in open information extraction systems has to be treated as an equally important step in order to ensure the quality and preciseness of generated triples. To this end, we propose a pipeline approach for information extraction from large corpora, encompassing a series of natural language processing tasks. Our methodology consists of four steps: i. in-place coreference resolution, ii. extractive text summarization, iii. parallel triple extraction, and iv. entity enrichment and graph representation. We manifest our methodology on a large medical dataset (CORD-19), relying on state-of-the-art tools to fulfil the aforementioned steps and extract triples that are subsequently mapped to a comprehensive ontology of biomedical concepts. We evaluate the effectiveness of our information extraction method by comparing it in terms of precision, recall, and F1-score with state-of-the-art OIE engines and demonstrate its capabilities on a set of data exploration tasks.
- Published
- 2020
- Full Text
- View/download PDF
18. A Methodology for Open Information Extraction and Representation from Large Scientific Corpora: The CORD-19 Data Exploration Use Case
- Author
-
Antonis Litke, Nikolaos Papadakis, and Dimitris Papadopoulos
- Subjects
Information extraction ,Bioinformatics ,Computer science ,Triple extraction ,02 engineering and technology ,Ontology (information science) ,computer.software_genre ,lcsh:Technology ,lcsh:Chemistry ,Set (abstract data type) ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,Redundancy (engineering) ,General Materials Science ,information extraction ,Representation (mathematics) ,Data mining ,lcsh:QH301-705.5 ,Instrumentation ,030304 developmental biology ,Fluid Flow and Transfer Processes ,0303 health sciences ,Coreference ,triple extraction ,lcsh:T ,business.industry ,Process Chemistry and Technology ,General Engineering ,bioinformatics ,data mining ,Pipeline (software) ,Automatic summarization ,lcsh:QC1-999 ,Computer Science Applications ,lcsh:Biology (General) ,lcsh:QD1-999 ,lcsh:TA1-2040 ,020201 artificial intelligence & image processing ,Artificial intelligence ,lcsh:Engineering (General). Civil engineering (General) ,business ,computer ,lcsh:Physics ,Natural language processing - Abstract
The usefulness of automated information extraction tools in generating structured knowledge from unstructured and semi-structured machine-readable documents is limited by challenges related to the variety and intricacy of the targeted entities, the complex linguistic features of heterogeneous corpora, and the computational availability for readily scaling to large amounts of text. In this paper, we argue that the redundancy and ambiguity of subject&ndash, predicate&ndash, object (SPO) triples in open information extraction systems has to be treated as an equally important step in order to ensure the quality and preciseness of generated triples. To this end, we propose a pipeline approach for information extraction from large corpora, encompassing a series of natural language processing tasks. Our methodology consists of four steps: i. in-place coreference resolution, ii. extractive text summarization, iii. parallel triple extraction, and iv. entity enrichment and graph representation. We manifest our methodology on a large medical dataset (CORD-19), relying on state-of-the-art tools to fulfil the aforementioned steps and extract triples that are subsequently mapped to a comprehensive ontology of biomedical concepts. We evaluate the effectiveness of our information extraction method by comparing it in terms of precision, recall, and F1-score with state-of-the-art OIE engines and demonstrate its capabilities on a set of data exploration tasks.
- Published
- 2020
- Full Text
- View/download PDF
19. PREDOSE: A semantic web platform for drug abuse epidemiology using social media.
- Author
-
Cameron, Delroy, Smith, Gary A., Daniulaityte, Raminta, Sheth, Amit P., Dave, Drashti, Chen, Lu, Anand, Gaurish, Carlson, Robert, Watkins, Kera Z., and Falck, Russel
- Abstract
Highlights: [•] We present a semantic web platform for drug abuse research using social media. [•] Social media texts could be an important resource in identifying new epidemiological trends. [•] Extraction of appropriate semantic information may be beneficial to epidemiological research. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
20. Robust triple extraction with cascade bidirectional capsule network.
- Author
-
Zhang, Ningyu, Deng, Shumin, Ye, Hongbin, Zhang, Wei, and Chen, Huajun
- Subjects
- *
CAPSULE neural networks , *CASCADE connections - Abstract
Recent approaches have witnessed the success of neural models for triple extraction. However, we empirically observe that previous approaches may fail for those disambiguate text expressed in a similar context and generate triples that contradict the commonsense. Such issues severely hinder the generalization of triple extraction in real-world applications. Motivated by the capsule networks' power of modeling latent structures and the implicit entity-relation schema, we propose a novel Cascade Bidirectional Capsule Network (CBCapsule) to address those issues. We firstly introduce a cascade capsule network to dynamically aggregate context representations and then propose a bidirectional routing mechanism to encourage interaction between the high level (e.g., relations) and low level (e.g., entities) capsules. Experimental results on three benchmarks show that our proposed approach is more efficient than baselines and has a more robust generalization ability with complex surface forms. • The first model using capsule network in triple extraction. • Modeling the parts-wholes between entities and relations. • A robust model for triple extraction with diverse surface forms. • Can handle overlapping relational triples. • Good performance in benchmarks and obtain better results in robust settings. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Triple extraction method enables high quality mass spectrometry-based proteomics and phospho-proteomics for eventual multi-omics integration studies.
- Author
-
Sanchez-Quiles V, Shi MJ, Dingli F, Krucker C, Loew D, Bernard-Pierrot I, and Radvanyi F
- Subjects
- Animals, Genomics, Mass Spectrometry, Workflow, Proteome, Proteomics
- Abstract
Large-scale multi-omic analysis allows a thorough understanding of different physiological or pathological conditions, particularly cancer. Here, an extraction method simultaneously yielding DNA, RNA and protein (thereby referred to as "triple extraction", TEx) was tested for its suitability to unbiased, system-wide proteomic investigation. Largely proven efficient for transcriptomic and genomic studies, we aimed at exploring TEx compatibility with mass spectrometry-based proteomics and phospho-proteomics, as compared to a standard urea extraction. TEx is suitable for the shotgun investigation of proteomes, providing similar results as urea-based protocol both at the qualitative and quantitative levels. TEx is likewise compatible with the exploration of phosphorylation events, actually providing a higher number of correctly localized sites than urea, although the nature of extracted modifications appears somewhat distinct between both techniques. These results highlight that the presented protocol is well suited for the examination of the proteome and modified proteome of this bladder cancer cell model, as efficiently as other more widely used workflows for mass spectrometry-based analysis. Potentially applicable to other mammalian cell types and tissues, TEx represents an advantageous strategy for multi-omics on scarce and/or heterogenous samples., (© 2021 Wiley-VCH GmbH.)
- Published
- 2021
- Full Text
- View/download PDF
22. PREDOSE: A semantic web platform for drug abuse epidemiology using social media
- Author
-
Drashti Dave, Robert G. Carlson, Russel S. Falck, Lu Chen, Gary Alan Smith, Delroy Cameron, Amit P. Sheth, Gaurish Anand, Kera Z. Watkins, and Raminta Daniulaityte
- Subjects
Relationship extraction ,Sentiment extraction ,020205 medical informatics ,Substance-Related Disorders ,Triple extraction ,User-generated content ,Prescription drug abuse ,Health Informatics ,02 engineering and technology ,Entity identification ,Ontology (information science) ,computer.software_genre ,Article ,Drug Abuse Ontology ,03 medical and health sciences ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,Medicine ,Social media ,030212 general & internal medicine ,Semantic Web ,Internet ,business.industry ,Data science ,Computer Science Applications ,Information extraction ,Opiod abuse ,Content analysis ,Domain knowledge ,business ,Social Media ,computer ,Semantic web - Abstract
Objectives The role of social media in biomedical knowledge mining, including clinical, medical and healthcare informatics, prescription drug abuse epidemiology and drug pharmacology, has become increasingly significant in recent years. Social media offers opportunities for people to share opinions and experiences freely in online communities, which may contribute information beyond the knowledge of domain professionals. This paper describes the development of a novel semantic web platform called PREDOSE ( PRE scription D rug abuse O nline S urveillance and E pidemiology), which is designed to facilitate the epidemiologic study of prescription (and related) drug abuse practices using social media. PREDOSE uses web forum posts and domain knowledge, modeled in a manually created Drug Abuse Ontology (DAO – pronounced dow ), to facilitate the extraction of semantic information from User Generated Content (UGC), through combination of lexical, pattern-based and semantics-based techniques. In a previous study, PREDOSE was used to obtain the datasets from which new knowledge in drug abuse research was derived. Here, we report on various platform enhancements, including an updated DAO, new components for relationship and triple extraction, and tools for content analysis, trend detection and emerging patterns exploration, which enhance the capabilities of the PREDOSE platform. Given these enhancements, PREDOSE is now more equipped to impact drug abuse research by alleviating traditional labor-intensive content analysis tasks. Methods Using custom web crawlers that scrape UGC from publicly available web forums, PREDOSE first automates the collection of web-based social media content for subsequent semantic annotation. The annotation scheme is modeled in the DAO, and includes domain specific knowledge such as prescription (and related) drugs, methods of preparation, side effects, and routes of administration. The DAO is also used to help recognize three types of data, namely: (1) entities, (2) relationships and (3) triples. PREDOSE then uses a combination of lexical and semantic-based techniques to extract entities and relationships from the scraped content, and a top-down approach for triple extraction that uses patterns expressed in the DAO. In addition, PREDOSE uses publicly available lexicons to identify initial sentiment expressions in text, and then a probabilistic optimization algorithm (from related research) to extract the final sentiment expressions. Together, these techniques enable the capture of fine-grained semantic information, which facilitate search, trend analysis and overall content analysis using social media on prescription drug abuse. Moreover, extracted data are also made available to domain experts for the creation of training and test sets for use in evaluation and refinements in information extraction techniques. Results A recent evaluation of the information extraction techniques applied in the PREDOSE platform indicates 85% precision and 72% recall in entity identification, on a manually created gold standard dataset. In another study, PREDOSE achieved 36% precision in relationship identification and 33% precision in triple extraction, through manual evaluation by domain experts. Given the complexity of the relationship and triple extraction tasks and the abstruse nature of social media texts, we interpret these as favorable initial results. Extracted semantic information is currently in use in an online discovery support system, by prescription drug abuse researchers at the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. Conclusion A comprehensive platform for entity, relationship, triple and sentiment extraction from such abstruse texts has never been developed for drug abuse research. PREDOSE has already demonstrated the importance of mining social media by providing data from which new findings in drug abuse research were uncovered. Given the recent platform enhancements, including the refined DAO, components for relationship and triple extraction, and tools for content, trend and emerging pattern analysis, it is expected that PREDOSE will play a significant role in advancing drug abuse epidemiology in future.
- Published
- 2013
- Full Text
- View/download PDF
23. Modeliranje sopojavitev besed z metodami strojnega učenja
- Author
-
Sipoš, Ruben and Demšar, Janez
- Subjects
računalništvo ,triple extraction ,modeliranje sopojavitev besed ,triples ,computer science ,izračun trojk ,word n-grams ,strojno učenje ,univerzitetni študij ,modeling word co-occurrences ,udc:004(043.2) ,machine learning ,diploma ,diplomske naloge ,posploševanje konceptov ,concept abstraction ,n-terke besed ,trojke - Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.