Back to Search Start Over

Bengali & Banglish: A monolingual dataset for emotion detection in linguistically diverse contexts

Authors :
Moshiur Rahman Faisal
Ashrin Mobashira Shifa
Md Hasibur Rahman
Mohammed Arif Uddin
Rashedur M. Rahaman
Source :
Data in Brief, Vol 55, Iss , Pp 110760- (2024)
Publication Year :
2024
Publisher :
Elsevier, 2024.

Abstract

The ever-evolving global landscape of communication, driven by Information Technology advancements, underscores the importance of emotion detection in natural language processing. However, challenges persist in interpreting emotions within linguistically diverse contexts, notably in low-resource languages like Bengali, compounded by the emergence of Banglish. To address this gap, we present “Bengali & Banglish,” an extensive dataset comprising 80,098 labelled samples across six emotion classes. Our dataset fills a void in fine-grained emotion classification for Bengali and pioneers in emotion detection in Banglish. We achieve significant performance metrics through meticulous annotation and rigorous evaluation, including a weighted F1 score of 71.30% for Bengali and 64.59% for Banglish using BanglaBERT. Also, our dataset facilitates Bengali-to-Banglish Machine Translation, contributing to the advancement of language processing models. Furthermore, our dataset demonstrates a high Cohen's Kappa score of 93.5%, affirming the reliability and consistency of our annotations. This research underscores the importance of linguistic diversity in NLP and provides a valuable resource for enhancing Emotion Detection capabilities in Bengali and Banglish across digital platforms.

Details

Language :
English
ISSN :
23523409 and 84442921
Volume :
55
Issue :
110760-
Database :
Directory of Open Access Journals
Journal :
Data in Brief
Publication Type :
Academic Journal
Accession number :
edsdoj.3eae1cf033c84442921d556fc41befa7
Document Type :
article
Full Text :
https://doi.org/10.1016/j.dib.2024.110760