Back to Search
Start Over
Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features
- Source :
- Computational Linguistics and Intelligent Text Processing ISBN: 9783642194368, CICLing (2), RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia, instname
- Publication Year :
- 2011
- Publisher :
- Springer Berlin Heidelberg, 2011.
-
Abstract
- Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions.<br />The authors from Universitat Politècnica de València thank also the MICINN research project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I+D+i). UPenn contributions were supported in part by ONR MURI N00014-07-1-0907. This research was partially supported by award 1R01GM089820-01A1 from the National Institute Of General Medical Sciences, and by ISSDM, a UCSC-LANL educational collaboration.
- Subjects :
- Database Management
Information retrieval
Computer science
media_common.quotation_subject
Data Mining and Knowledge Discovery
Constructive
GeneralLiterature_MISCELLANEOUS
Task (project management)
Set (abstract data type)
World Wide Web
Metadata
Online encyclopedia
LENGUAJES Y SISTEMAS INFORMATICOS
Natural language
Reputation
media_common
Subjects
Details
- ISBN :
- 978-3-642-19436-8
- ISBNs :
- 9783642194368
- Database :
- OpenAIRE
- Journal :
- Computational Linguistics and Intelligent Text Processing ISBN: 9783642194368, CICLing (2), RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia, instname
- Accession number :
- edsair.doi.dedup.....542cffb5a20ffacb8453ad3b20780303