Back to Search Start Over

ElmNet: a benchmark dataset for generating headlines from Persian papers

Authors :
Mohammad E. Shenassa
Behrouz Minaei-Bidgoli
Source :
Multimedia Tools and Applications. 81:1853-1866
Publication Year :
2021
Publisher :
Springer Science and Business Media LLC, 2021.

Abstract

Headline generation is a challenging subtask of abstractive text summarization, which its output should be a summary, shorter than one sentence. It would be precious to develop a dataset for the evaluation of abstractive summarization methods on this task in the Persian language. There are several datasets for headline generation in Persian, most of which are not large enough to be used by more sophisticated methods of text summarization, such as deep learning models. Moreover, all of these datasets are focused on daily news and there is no dataset for summarizing scientific Persian papers. In this article, we present “ElmNet,” a headline generation dataset of about 400,000 abstract/headline pairs of scientific papers, gathered from six major publishers for scientific articles in Persian. We, moreover, evaluate the performance of the most important deep learning-based headline generation methods, on the proposed dataset. The results prove the comparability of the performance of the state-of-the-art methods on this task, to their results on the existing English datasets.

Details

ISSN :
15737721 and 13807501
Volume :
81
Database :
OpenAIRE
Journal :
Multimedia Tools and Applications
Accession number :
edsair.doi...........f9a9e2248b043b7f065ec2aee9e81cc3
Full Text :
https://doi.org/10.1007/s11042-021-11641-1