1. Cyberbullying detection of resource constrained language from social media using transformer-based approach
- Author
-
Syed Sihab-Us-Sakib, Md. Rashadur Rahman, Md. Shafiul Alam Forhad, and Md. Atiq Aziz
- Subjects
Cyberbullying classification ,Social media ,Transformers ,Machine Learning ,Low resource language ,Natural Language Processing ,Computational linguistics. Natural language processing ,P98-98.5 - Abstract
The rise of the internet and social media has facilitated diverse interactions among individuals, but it has also led to an increase in cyberbullying—a phenomenon with detrimental effects on mental health, including the potential to induce suicidal thoughts. To combat this issue, we have developed the Cyberbullying Bengali Dataset (CBD), a novel resource containing 2751 manually labeled texts categorized into five classes, including various forms of cyberbullying and non-bullying instances. In our study on cyberbullying detection, we conducted an extensive evaluation of various machine learning and deep learning models. Specifically, we examined Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Random Forest (RF) among the traditional machine learning models. For deep learning models, we explored Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM). We have also experimented with state-of-the-art transformer architectures, including m-BERT, BanglaBERT, and XLM-RoBERTa. After rigorous experimentation, XLM-RoBERTa emerged as the most effective model, achieving a significant F1-score of 0.83 and an accuracy of 82.61%, outperforming all other models. Our work provides insights into effective cyberbullying detection on platforms like Facebook, YouTube, and Instagram.
- Published
- 2024
- Full Text
- View/download PDF