Back to Search Start Over

Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features

Authors :
Andrew G. West
B. Thomas Adler
Paolo Rosso
Luca de Alfaro
Santiago M. Mola-Velasco
Source :
Computational Linguistics and Intelligent Text Processing ISBN: 9783642194368, CICLing (2), RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia, instname
Publication Year :
2011
Publisher :
Springer Berlin Heidelberg, 2011.

Abstract

Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions.<br />The authors from Universitat Politècnica de València thank also the MICINN research project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I+D+i). UPenn contributions were supported in part by ONR MURI N00014-07-1-0907. This research was partially supported by award 1R01GM089820-01A1 from the National Institute Of General Medical Sciences, and by ISSDM, a UCSC-LANL educational collaboration.

Details

ISBN :
978-3-642-19436-8
ISBNs :
9783642194368
Database :
OpenAIRE
Journal :
Computational Linguistics and Intelligent Text Processing ISBN: 9783642194368, CICLing (2), RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia, instname
Accession number :
edsair.doi.dedup.....542cffb5a20ffacb8453ad3b20780303