Classification of toxic comments unified through diverse internet forums.

Authors :: Kaur, Gull
Kumar, Aakash
Chauhan, Aarjav
Babbar, Abhishek
Source :: AIP Conference Proceedings. 2023, Vol. 2754 Issue 1, p1-6. 6p.
Publication Year :: 2023
Abstract: In the last half-decade, India has seen exponential growth in the Internet and social media. This huge growth resulted in better communication among friends and families and freely spread information, content, opinions, and ideas. Some users misusethis freedom and make social media platforms intolerable. The magnitude of detrimental content online, such as toxic comments or content, is not manageable by humans. This study creates a homogeneous dataset by manually labelling comments taken from social platforms and combining them with some publicly available datasets. We have classified them into two category labels, toxic and non-toxic. This work presents our unified dataset, including a wide spectrum of comments and an approach to classify Hinglish comments using the BERT transformer model. The study also includes training baseline models and depicting their performance based on selected evaluation criteria. The BERT model outperformed the baseline and other models trained on the unified dataset. This study gives importance to Hinglish Comments and provides an implementation for classifying them to make internet platform much more secure and friendly for regional language users. [ABSTRACT FROM AUTHOR]