Start Over

The problem of varying annotations to identify abusive language in social media content.

Authors :: Seemann, Nina
Lee, Yeong Su
Höllig, Julian
Geierhos, Michaela
Source :: Natural Language Engineering; Nov2023, Vol. 29 Issue 6, p1561-1585, 25p
Publication Year :: 2023
Abstract: With the increase of user-generated content on social media, the detection of abusive language has become crucial and is therefore reflected in several shared tasks that have been performed in recent years. The development of automatic detection systems is desirable, and the classification of abusive social media content can be solved with the help of machine learning. The basis for successful development of machine learning models is the availability of consistently labeled training data. But a diversity of terms and definitions of abusive language is a crucial barrier. In this work, we analyze a total of nine datasets—five English and four German datasets—designed for detecting abusive online content. We provide a detailed description of the datasets, that is, for which tasks the dataset was created, how the data were collected, and its annotation guidelines. Our analysis shows that there is no standard definition of abusive language, which often leads to inconsistent annotations. As a consequence, it is difficult to draw cross-domain conclusions, share datasets, or use models for other abusive social media language tasks. Furthermore, our manual inspection of a random sample of each dataset revealed controversial examples. We highlight challenges in data annotation by discussing those examples, and present common problems in the annotation process, such as contradictory annotations and missing context information. Finally, to complement our theoretical work, we conduct generalization experiments on three German datasets. [ABSTRACT FROM AUTHOR]

Subjects :: MACHINE learning
USER-generated content
SOCIAL media
NATURAL language processing
ANNOTATIONS
LANGUAGE & languages

Details

Language :: English
ISSN :: 13513249
Volume :: 29
Issue :: 6
Database :: Complementary Index
Journal :: Natural Language Engineering
Publication Type :: Academic Journal
Accession number :: 174038041
Full Text :: https://doi.org/10.1017/S1351324923000098

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

The problem of varying annotations to identify abusive language in social media content.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

The problem of varying annotations to identify abusive language in social media content.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources