Back to Search Start Over

GupShup: An Annotated Corpus for Abstractive Summarization of Open-Domain Code-Switched Conversations

Authors :
Mehnaz, Laiba
Mahata, Debanjan
Gosangi, Rakesh
Gunturi, Uma Sushmitha
Jain, Riya
Gupta, Gauri
Kumar, Amardeep
Lee, Isabelle
Acharya, Anish
Shah, Rajiv Ratn
Publication Year :
2021

Abstract

Code-switching is the communication phenomenon where speakers switch between different languages during a conversation. With the widespread adoption of conversational agents and chat platforms, code-switching has become an integral part of written conversations in many multi-lingual communities worldwide. This makes it essential to develop techniques for summarizing and understanding these conversations. Towards this objective, we introduce abstractive summarization of Hindi-English code-switched conversations and develop the first code-switched conversation summarization dataset - GupShup, which contains over 6,831 conversations in Hindi-English and their corresponding human-annotated summaries in English and Hindi-English. We present a detailed account of the entire data collection and annotation processes. We analyze the dataset using various code-switching statistics. We train state-of-the-art abstractive summarization models and report their performances using both automated metrics and human evaluation. Our results show that multi-lingual mBART and multi-view seq2seq models obtain the best performances on the new dataset

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2104.08578
Document Type :
Working Paper