Start Over

Improving hate speech detection using Cross-Lingual Learning.

Authors :: Firmino, Anderson Almeida
de Souza Baptista, Cláudio
de Paiva, Anselmo Cardoso
Source :: Expert Systems with Applications. Jan2024, Vol. 235, pN.PAG-N.PAG. 1p.
Publication Year :: 2024
Abstract: The growth of social media worldwide has brought social benefits and challenges. One problem we highlight is the proliferation of hate speech on social media. We propose a novel method for detecting hate speech in texts using Cross-Lingual Learning. Our approach uses transfer learning from Pre-Trained Language Models (PTLM) with large corpora available to solve problems in languages with fewer resources for the specific task. The proposed methodology comprises four stages: corpora acquisition, the PTLM definition, training strategies, and evaluation. We carried out experiments using Pre-Trained Language Models in English, Italian, and Portuguese (BERT and XLM-R) to verify which best suited the proposed method. We used corpora in English (WH) and Italian (Evalita 2018) as the source language and the OffComBr-2 corpus in Portuguese (the target language). The results of the experiments showed that the proposed methodology is promising: for the OffComBr-2 corpus, the best state-of-the-art result was obtained (F1-measure = 92%). • The development of a new methodology for hate speech detection. • Portuguese hate speech detection using Cross-Lingual Learning. • Up to 20% performance improvement over other models using the OffComBr-2 corpus. [ABSTRACT FROM AUTHOR]