Back to Search
Start Over
Cross-Domain Document Summarization Model via Two-Stage Curriculum Learning.
Cross-Domain Document Summarization Model via Two-Stage Curriculum Learning.
- Source :
- Electronics (2079-9292); Sep2024, Vol. 13 Issue 17, p3425, 12p
- Publication Year :
- 2024
-
Abstract
- Generative document summarization is a natural language processing technique that generates short summary sentences while preserving the content of long texts. Various fine-tuned pre-trained document summarization models have been proposed using a specific single text-summarization dataset. However, each text-summarization dataset usually specializes in a particular downstream task. Therefore, it is difficult to treat all cases involving multiple domains using a single dataset. Accordingly, when a generative document summarization model is fine-tuned to a specific dataset, it performs well, whereas the performance is degraded by up to 45% for datasets that are not used during learning. In short, summarization models perform well with in-domain cases, as the dataset domain during training and evaluation is the same but perform poorly with out-domain inputs. In this paper, we propose a new curriculum-learning method using mixed datasets while training a generative summarization model to be more robust on out-domain datasets. Our method performed better than XSum with 10%, 20%, and 10% lower performance degradation in CNN/DM, which comprised one of two test datasets used, compared to baseline model performance. [ABSTRACT FROM AUTHOR]
- Subjects :
- AUTOMATIC summarization
TEXT summarization
Subjects
Details
- Language :
- English
- ISSN :
- 20799292
- Volume :
- 13
- Issue :
- 17
- Database :
- Complementary Index
- Journal :
- Electronics (2079-9292)
- Publication Type :
- Academic Journal
- Accession number :
- 179646935
- Full Text :
- https://doi.org/10.3390/electronics13173425