Back to Search Start Over

BioMDSum: An Effective Hybrid Biomedical Multi-Document Summarization Method Based on PageRank and Longformer Encoder-Decoder

Authors :
Azzedine Aftiss
Salima Lamsiyah
Said Ouatik El Alaoui
Christoph Schommer
Source :
IEEE Access, Vol 12, Pp 188013-188031 (2024)
Publication Year :
2024
Publisher :
IEEE, 2024.

Abstract

Biomedical multi-document summarization (BioMDSum) involves automatically generating concise and informative summaries from collections of related biomedical documents. While extractive summarization methods have shown promise, they often produce incoherent summaries. Onethe other hand, fully abstractive methods yield coherent summaries but demand extensive training datasets and computational resources due to the typically lengthy nature of biomedical documents. Toeaddress these challenges, weepropose a hybrid summarization method that combines the strengths of both approaches. The proposed method consists of two main phases: (i) an extractive summarization phase that uses k-means clustering to group similar sentences based on their cosine similarity between embeddings generated by the sentence-BERT model, followed by the PageRank algorithm for sentence scoring and selection; and (ii) an abstractive summarization phase that fine-tunes a Longform Encoder-Decoder (LED) transformer model to generate a concise and coherent summary from the sentences selected during the extractive phase. Weeconducted several experiments on the standard biomedical multi-document summarization datasets Cochrane and MS^2. The results demonstrate that the proposed method is competitive and outperforms recent state-of-the-art systems based on ROUGE evaluation measures. Specifically, our model achieved ROUGE-1, ROUGE-2, ROUGE-L, BERTScore, and METEOR scores of 29.41%, 6.57%, 18.31%, 85.95%, and 22.15% on the Cochrane dataset, and 28.79%, 8.22%, 17.93%, 85.51%, and 25.17% on the MS^2 dataset, respectively. Furthermore, aneablation analysis shows that integrating extractive and abstractive phases in our hybrid summarization method enhances the overall performance of the proposed approach.

Details

Language :
English
ISSN :
21693536
Volume :
12
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.f706fb4ab66c45afbfa9140a4072fb95
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2024.3514915