Back to Search
Start Over
DocR-BERT: Document-Level R-BERT for Chemical-Induced Disease Relation Extraction via Gaussian Probability Distribution
- Source :
- IEEE Journal of Biomedical and Health Informatics. 26:1341-1352
- Publication Year :
- 2022
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2022.
-
Abstract
- Chemical-induced disease (CID) relation extraction from biomedical articles plays an important role in disease treatment and drug development. Existing methods are insufficient for capturing complete document level semantic information due to ignoring semantic information of entities in different sentences. In this work, we proposed an effective document-level relation extraction model to automatically extract intra-/inter-sentential CID relations from articles. Firstly, our model employed BERT to generate contextual semantic representations of the title, abstract and shortest dependency paths (SDPs). Secondly, to enhance the semantic representation of the whole document, cross attention with self-attention (named cross2self-attention) between abstract, title and SDPs was proposed to learn the mutual semantic information. Thirdly, to distinguish the importance of the target entity in different sentences, the Gaussian probability distribution was utilized to compute the weights of the co-occurrence sentence and its adjacent entity sentences. More complete semantic information of the target entity is collected from all entities occurring in the document via our presented document-level R-BERT (DocR-BERT). Finally, the related representations were concatenated and fed into the softmax function to extract CIDs. We evaluated the model on the CDR corpus provided by BioCreative V. The proposed model without external resources is superior in performance as compared with other state-of-the-art models (our model achieves 53.5%, 70%, and 63.7% of the F1-score on inter-/intra-sentential and overall CDR dataset). The experimental results indicate that cross2self-attention, the Gaussian probability distribution and DocR-BERT can effectively improve the CID extraction performance. Furthermore, the mutual semantic information learned by the cross self-attention from abstract towards title can significantly influence the extraction performance of document-level biomedical relation extraction tasks.
- Subjects :
- Dependency (UML)
Computer science
business.industry
Gaussian
Normal Distribution
computer.software_genre
Relationship extraction
Semantics
Computer Science Applications
Document level
symbols.namesake
Text mining
Health Information Management
Softmax function
symbols
Humans
Probability distribution
Artificial intelligence
Electrical and Electronic Engineering
business
computer
Sentence
Natural language processing
Probability
Biotechnology
Subjects
Details
- ISSN :
- 21682208 and 21682194
- Volume :
- 26
- Database :
- OpenAIRE
- Journal :
- IEEE Journal of Biomedical and Health Informatics
- Accession number :
- edsair.doi.dedup.....da4ab0bfd2b1987480189ed626d61eea