Back to Search Start Over

DocR-BERT: Document-Level R-BERT for Chemical-Induced Disease Relation Extraction via Gaussian Probability Distribution

Authors :
Huayue Chen
Zhengguang Li
Ruihua Qi
Hongfei Lin
Heng Chen
Source :
IEEE Journal of Biomedical and Health Informatics. 26:1341-1352
Publication Year :
2022
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2022.

Abstract

Chemical-induced disease (CID) relation extraction from biomedical articles plays an important role in disease treatment and drug development. Existing methods are insufficient for capturing complete document level semantic information due to ignoring semantic information of entities in different sentences. In this work, we proposed an effective document-level relation extraction model to automatically extract intra-/inter-sentential CID relations from articles. Firstly, our model employed BERT to generate contextual semantic representations of the title, abstract and shortest dependency paths (SDPs). Secondly, to enhance the semantic representation of the whole document, cross attention with self-attention (named cross2self-attention) between abstract, title and SDPs was proposed to learn the mutual semantic information. Thirdly, to distinguish the importance of the target entity in different sentences, the Gaussian probability distribution was utilized to compute the weights of the co-occurrence sentence and its adjacent entity sentences. More complete semantic information of the target entity is collected from all entities occurring in the document via our presented document-level R-BERT (DocR-BERT). Finally, the related representations were concatenated and fed into the softmax function to extract CIDs. We evaluated the model on the CDR corpus provided by BioCreative V. The proposed model without external resources is superior in performance as compared with other state-of-the-art models (our model achieves 53.5%, 70%, and 63.7% of the F1-score on inter-/intra-sentential and overall CDR dataset). The experimental results indicate that cross2self-attention, the Gaussian probability distribution and DocR-BERT can effectively improve the CID extraction performance. Furthermore, the mutual semantic information learned by the cross self-attention from abstract towards title can significantly influence the extraction performance of document-level biomedical relation extraction tasks.

Details

ISSN :
21682208 and 21682194
Volume :
26
Database :
OpenAIRE
Journal :
IEEE Journal of Biomedical and Health Informatics
Accession number :
edsair.doi.dedup.....da4ab0bfd2b1987480189ed626d61eea