1. GAE-ISumm: Unsupervised Graph-Based Summarization of Indian Languages
- Author
-
Vakada, Lakshmi Sireesha, Ch, Anudeep, Marreddy, Mounika, Oota, Subba Reddy, and Mamidi, Radhika
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
Document summarization aims to create a precise and coherent summary of a text document. Many deep learning summarization models are developed mainly for English, often requiring a large training corpus and efficient pre-trained language models and tools. However, English summarization models for low-resource Indian languages are often limited by rich morphological variation, syntax, and semantic differences. In this paper, we propose GAE-ISumm, an unsupervised Indic summarization model that extracts summaries from text documents. In particular, our proposed model, GAE-ISumm uses Graph Autoencoder (GAE) to learn text representations and a document summary jointly. We also provide a manually-annotated Telugu summarization dataset TELSUM, to experiment with our model GAE-ISumm. Further, we experiment with the most publicly available Indian language summarization datasets to investigate the effectiveness of GAE-ISumm on other Indian languages. Our experiments of GAE-ISumm in seven languages make the following observations: (i) it is competitive or better than state-of-the-art results on all datasets, (ii) it reports benchmark results on TELSUM, and (iii) the inclusion of positional and cluster information in the proposed model improved the performance of summaries., 9 pages, 7 figures
- Published
- 2022