Back to Search Start Over

An Effective Identification Technology for Online News Comment Spammers in Internet Media

Authors :
Huayou Si
Wen Sun
Jilin Zhang
Jian Wan
Neal N. Xiong
Li Zhou
Yongjian Ren
Source :
IEEE Access, Vol 7, Pp 37792-37806 (2019)
Publication Year :
2019
Publisher :
IEEE, 2019.

Abstract

With the development of mobile Internet, it is changing the way we communicate with others. Internet media have gradually become the main mobile crowdsourcing applications for information dissemination and user communication, including online news and social networks. However, the potential business opportunities have stimulated the emergence of a large number of spammers, who release false speech, advertisements, pornographic contents, and phishing websites on the media to gain commercial benefits, which seriously affects the experience of normal users. Therefore, in order to reduce the harm of false information, the research on the identification technology of spammers has been carried out extensively. However, the traditional technologies of identifying spammers involve high data costs and poor effects, and most of them are concentrated in the field of social networks, while less research is carried out in the field of online news. In this paper, we propose an effective technology of identifying online news comment spammers based on the label propagation algorithm (LPA), making full use of the user comment behaviors and contents. First of all, we collect a large amount of news and comments from NetEase News and label some users in the data as spammers or normal users manually to construct a labeled dataset. Then, a set of behavioral and semantic features are extracted and quantified from the user comment behaviors and comment contents by statistical analysis. Next, we propose the identification technology based on the LPA. Finally, the set of feature values is input into the proposed technology in different combinations, and experiments and evaluations are carried out to determine the most effective combination of features and improve the technology. The results show that the technology proposed in this paper involves a lower data cost but a better identification effect than some traditional technologies based on the supervised classifier.

Details

Language :
English
ISSN :
21693536
Volume :
7
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.12fabd1476f0439290c533e9939aa9e8
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2019.2900474