Introducing the Gab Hate Corpus: Defining and applying hate-based rhetoric to social media posts at scale

Authors :: Mohammad Atari
Gwenyth Portillo-Wightman
Kim Y
Park C
Joe Hoover
Brendan F. Kennedy
Shreya Havaldar
Leigh Yeh
Wang C
Morteza Dehghani
Hussain A
Wang X
Coombs K
Aida Mostafazadeh Davani
olmos g
Lara A
Elaine Gonzalez
Ali Omrani
Omary A
Zhang Y
Azatian A
Publication Year :: 2018
Publisher :: Center for Open Science, 2018.
Abstract: We present the Gab Hate Corpus (GHC), consisting of 27,665 posts from the social network service gab.com, each annotated for the presence of “hate-based rhetoric” by a minimum of three annotators. Posts were labeled according to a coding typology derived from a synthesis of hate speech definitions across legal precedent, previous hate speech coding typologies, and definitions from psychology and sociology, comprising hierarchical labels indicating dehumanizing and violent speech as well as indicators of targeted groups and rhetorical framing. We provide inter-annotator agreement statistics and perform a classification analysis in order to validate the corpus and establish performance baselines. The GHC complements existing hate speech datasets in its theoretical grounding and by providing a large, representative sample of richly annotated social media posts.

Subjects :: business.industry
Artificial intelligence
Psychology
computer.software_genre
business
computer
Natural language processing

Tools