Back to Search Start Over

Bengali & Banglish: A monolingual dataset for emotion detection in linguistically diverse contexts.

Authors :
Faisal MR
Shifa AM
Rahman MH
Uddin MA
Rahaman RM
Source :
Data in brief [Data Brief] 2024 Jul 20; Vol. 55, pp. 110760. Date of Electronic Publication: 2024 Jul 20 (Print Publication: 2024).
Publication Year :
2024

Abstract

The ever-evolving global landscape of communication, driven by Information Technology advancements, underscores the importance of emotion detection in natural language processing. However, challenges persist in interpreting emotions within linguistically diverse contexts, notably in low-resource languages like Bengali, compounded by the emergence of Banglish. To address this gap, we present "Bengali & Banglish," an extensive dataset comprising 80,098 labelled samples across six emotion classes. Our dataset fills a void in fine-grained emotion classification for Bengali and pioneers in emotion detection in Banglish. We achieve significant performance metrics through meticulous annotation and rigorous evaluation, including a weighted F1 score of 71.30% for Bengali and 64.59% for Banglish using BanglaBERT. Also, our dataset facilitates Bengali-to-Banglish Machine Translation, contributing to the advancement of language processing models. Furthermore, our dataset demonstrates a high Cohen's Kappa score of 93.5%, affirming the reliability and consistency of our annotations. This research underscores the importance of linguistic diversity in NLP and provides a valuable resource for enhancing Emotion Detection capabilities in Bengali and Banglish across digital platforms.<br /> (© 2024 The Author(s).)

Details

Language :
English
ISSN :
2352-3409
Volume :
55
Database :
MEDLINE
Journal :
Data in brief
Publication Type :
Academic Journal
Accession number :
39183968
Full Text :
https://doi.org/10.1016/j.dib.2024.110760