Back to Search
Start Over
Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation
- Source :
- RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia, instname
- Publication Year :
- 2021
- Publisher :
- Springer Science and Business Media LLC, 2021.
-
Abstract
- The proliferation of harmful content on social media affects a large part of the user community. Therefore, several approaches have emerged to control this phenomenon automatically. However, this is still a quite challenging task. In this paper, we explore the offensive language as a particular case of harmful content and focus our study in the analysis of keywords in available datasets composed of offensive tweets. Thus, we aim to identify relevant words in those datasets and analyze how they can affect model learning. For keyword extraction, we propose an unsupervised hybrid approach which combines the multi-head self-attention of BERT and a reasoning on a word graph. The attention mechanism allows to capture relationships among words in a context, while a language model is learned. Then, the relationships are used to generate a graph from what we identify the most relevant words by using the eigenvector centrality. Experiments were performed by means of two mechanisms. On the one hand, we used an information retrieval system to evaluate the impact of the keywords in recovering offensive tweets from a dataset. On the other hand, we evaluated a keyword-based model for offensive language detection. Results highlight some points to consider when training models with available datasets.
- Subjects :
- Language identification
business.industry
Computer science
Keyword extraction
Offensive
Attention mechanism
Context (language use)
Management Science and Operations Research
Library and Information Sciences
computer.software_genre
Graph representation
Computer Science Applications
Task (project management)
Unsupervised keyword extraction
Hardware and Architecture
Offensive language detection
Graph (abstract data type)
Social media
Language model
Artificial intelligence
business
computer
Natural language processing
Subjects
Details
- ISSN :
- 16174917 and 16174909
- Volume :
- 27
- Database :
- OpenAIRE
- Journal :
- Personal and Ubiquitous Computing
- Accession number :
- edsair.doi.dedup.....1922f0ef0cc5bfbe50de39950801819a
- Full Text :
- https://doi.org/10.1007/s00779-021-01605-5