1. Improving hate speech detection using Cross-Lingual Learning.
- Author
-
Firmino, Anderson Almeida, de Souza Baptista, Cláudio, and de Paiva, Anselmo Cardoso
- Subjects
- *
HATE speech , *AUTOMATIC speech recognition , *LANGUAGE models , *NATURAL language processing , *PORTUGUESE language , *ITALIAN language - Abstract
The growth of social media worldwide has brought social benefits and challenges. One problem we highlight is the proliferation of hate speech on social media. We propose a novel method for detecting hate speech in texts using Cross-Lingual Learning. Our approach uses transfer learning from Pre-Trained Language Models (PTLM) with large corpora available to solve problems in languages with fewer resources for the specific task. The proposed methodology comprises four stages: corpora acquisition, the PTLM definition, training strategies, and evaluation. We carried out experiments using Pre-Trained Language Models in English, Italian, and Portuguese (BERT and XLM-R) to verify which best suited the proposed method. We used corpora in English (WH) and Italian (Evalita 2018) as the source language and the OffComBr-2 corpus in Portuguese (the target language). The results of the experiments showed that the proposed methodology is promising: for the OffComBr-2 corpus, the best state-of-the-art result was obtained (F1-measure = 92%). • The development of a new methodology for hate speech detection. • Portuguese hate speech detection using Cross-Lingual Learning. • Up to 20% performance improvement over other models using the OffComBr-2 corpus. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF