Back to Search Start Over

Cross-Domain Document Summarization Model via Two-Stage Curriculum Learning.

Cross-Domain Document Summarization Model via Two-Stage Curriculum Learning.

Authors :
Lee, Seungsoo
Kim, Gyunyeop
Kang, Sangwoo
Source :
Electronics (2079-9292); Sep2024, Vol. 13 Issue 17, p3425, 12p
Publication Year :
2024

Abstract

Generative document summarization is a natural language processing technique that generates short summary sentences while preserving the content of long texts. Various fine-tuned pre-trained document summarization models have been proposed using a specific single text-summarization dataset. However, each text-summarization dataset usually specializes in a particular downstream task. Therefore, it is difficult to treat all cases involving multiple domains using a single dataset. Accordingly, when a generative document summarization model is fine-tuned to a specific dataset, it performs well, whereas the performance is degraded by up to 45% for datasets that are not used during learning. In short, summarization models perform well with in-domain cases, as the dataset domain during training and evaluation is the same but perform poorly with out-domain inputs. In this paper, we propose a new curriculum-learning method using mixed datasets while training a generative summarization model to be more robust on out-domain datasets. Our method performed better than XSum with 10%, 20%, and 10% lower performance degradation in CNN/DM, which comprised one of two test datasets used, compared to baseline model performance. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20799292
Volume :
13
Issue :
17
Database :
Complementary Index
Journal :
Electronics (2079-9292)
Publication Type :
Academic Journal
Accession number :
179646935
Full Text :
https://doi.org/10.3390/electronics13173425