Back to Search Start Over

CiTIUS-COLE at SemEval-2019 Task 5: Combining Linguistic Features to Identify Hate Speech Against Immigrants and Women on Multilingual Tweets

Authors :
Pablo Gamallo
Sattam Almatarneh
Francisco Pena
Source :
SemEval@NAACL-HLT, Scopus-Elsevier
Publication Year :
2019
Publisher :
Association for Computational Linguistics, 2019.

Abstract

This article describes the strategy submitted by the CiTIUS-COLE team to SemEval 2019 Task 5, a task which consists of binary classi- fication where the system predicting whether a tweet in English or in Spanish is hateful against women or immigrants or not. The proposed strategy relies on combining linguis- tic features to improve the classifier’s perfor- mance. More precisely, the method combines textual and lexical features, embedding words with the bag of words in Term Frequency- Inverse Document Frequency (TF-IDF) repre- sentation. The system performance reaches about 81% F1 when it is applied to the training dataset, but its F1 drops to 36% on the official test dataset for the English and 64% for the Spanish language concerning the hate speech class

Details

Database :
OpenAIRE
Journal :
Proceedings of the 13th International Workshop on Semantic Evaluation
Accession number :
edsair.doi.dedup.....1f6fd9548992f69744cae6fd7beb0c0d
Full Text :
https://doi.org/10.18653/v1/s19-2068