Back to Search Start Over

Italian Linguistic Features for Toxic Language Detection in Social Media

Authors :
Leonardo Grotti
Source :
IJCoL, Vol 10, Iss 1 (2024)
Publication Year :
2024
Publisher :
Accademia University Press, 2024.

Abstract

This study addresses the urgent issue of toxic language, prevalent on social media platforms, focusing on the detection of toxic comments on popular Italian Facebook pages. We build upon the framework suggested by the LiLaH project: a standardized framework for analyzing hateful content in multiple languages, including Dutch, English, French, Slovene, and Croatian. We start by examining the linguistic features of Italian toxic language on social media. Our analysis reveals that toxic comments in Italian tend to be longer and have fewer unique emojis compared to non-toxic comments, while both exhibit similar lexical diversity. To evaluate the impact of linguistic features on state-of-the-art models’ performance, we fine-tune three pre-trained language models (PoliBERT, UmBERTo, and bert-base-italian-xxl-uncased). Despite their significant correlation with comments’ toxicity, the inclusion of linguistic features worsens the best model’s performance.

Details

Language :
English
ISSN :
24994553
Volume :
10
Issue :
1
Database :
Directory of Open Access Journals
Journal :
IJCoL
Publication Type :
Academic Journal
Accession number :
edsdoj.92f78d451dac4b6f9619438b2bf5f7e3
Document Type :
article
Full Text :
https://doi.org/10.4000/125no