CiTIUS-COLE at SemEval-2019 Task 5: Combining Linguistic Features to Identify Hate Speech Against Immigrants and Women on Multilingual Tweets

Authors :: Pablo Gamallo
Sattam Almatarneh
Francisco Pena
Source :: SemEval@NAACL-HLT, Scopus-Elsevier
Publication Year :: 2019
Publisher :: Association for Computational Linguistics, 2019.
Abstract: This article describes the strategy submitted by the CiTIUS-COLE team to SemEval 2019 Task 5, a task which consists of binary classi- fication where the system predicting whether a tweet in English or in Spanish is hateful against women or immigrants or not. The proposed strategy relies on combining linguis- tic features to improve the classifier’s perfor- mance. More precisely, the method combines textual and lexical features, embedding words with the bag of words in Term Frequency- Inverse Document Frequency (TF-IDF) repre- sentation. The system performance reaches about 81% F1 when it is applied to the training dataset, but its F1 drops to 36% on the official test dataset for the English and 64% for the Spanish language concerning the hate speech class

Subjects :: Spanish language
Computer science
business.industry
computer.software_genre
Class (biology)
SemEval
Term (time)
Task (project management)
Test (assessment)
Bag-of-words model
Classifier (linguistics)
Artificial intelligence
business
tf–idf
computer
Natural language processing

Database :: OpenAIRE
Journal :: Proceedings of the 13th International Workshop on Semantic Evaluation
Accession number :: edsair.doi.dedup.....1f6fd9548992f69744cae6fd7beb0c0d
Full Text :: https://doi.org/10.18653/v1/s19-2068

Full Text Access

Tools