Back to Search
Start Over
ElmNet: a benchmark dataset for generating headlines from Persian papers
- Source :
- Multimedia Tools and Applications. 81:1853-1866
- Publication Year :
- 2021
- Publisher :
- Springer Science and Business Media LLC, 2021.
-
Abstract
- Headline generation is a challenging subtask of abstractive text summarization, which its output should be a summary, shorter than one sentence. It would be precious to develop a dataset for the evaluation of abstractive summarization methods on this task in the Persian language. There are several datasets for headline generation in Persian, most of which are not large enough to be used by more sophisticated methods of text summarization, such as deep learning models. Moreover, all of these datasets are focused on daily news and there is no dataset for summarizing scientific Persian papers. In this article, we present “ElmNet,” a headline generation dataset of about 400,000 abstract/headline pairs of scientific papers, gathered from six major publishers for scientific articles in Persian. We, moreover, evaluate the performance of the most important deep learning-based headline generation methods, on the proposed dataset. The results prove the comparability of the performance of the state-of-the-art methods on this task, to their results on the existing English datasets.
- Subjects :
- Computer Networks and Communications
business.industry
Computer science
Deep learning
Comparability
Headline
computer.software_genre
Automatic summarization
language.human_language
Task (project management)
Hardware and Architecture
Media Technology
language
Benchmark (computing)
Artificial intelligence
business
computer
Software
Natural language processing
Sentence
Persian
Subjects
Details
- ISSN :
- 15737721 and 13807501
- Volume :
- 81
- Database :
- OpenAIRE
- Journal :
- Multimedia Tools and Applications
- Accession number :
- edsair.doi...........f9a9e2248b043b7f065ec2aee9e81cc3
- Full Text :
- https://doi.org/10.1007/s11042-021-11641-1