Back to Search Start Over

An Effective Identification Technology for Online News Comment Spammers in Internet Media

Authors :
Wen Sun
Neal N. Xiong
Huayou Si
Jilin Zhang
Yongjian Ren
Li Zhou
Jian Wan
Source :
IEEE Access, Vol 7, Pp 37792-37806 (2019)
Publication Year :
2019
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2019.

Abstract

With the development of mobile Internet, it is changing the way we communicate with others. Internet media have gradually become the main mobile crowdsourcing applications for information dissemination and user communication, including online news and social networks. However, the potential business opportunities have stimulated the emergence of a large number of spammers, who release false speech, advertisements, pornographic contents, and phishing websites on the media to gain commercial benefits, which seriously affects the experience of normal users. Therefore, in order to reduce the harm of false information, the research on the identification technology of spammers has been carried out extensively. However, the traditional technologies of identifying spammers involve high data costs and poor effects, and most of them are concentrated in the field of social networks, while less research is carried out in the field of online news. In this paper, we propose an effective technology of identifying online news comment spammers based on the label propagation algorithm (LPA), making full use of the user comment behaviors and contents. First of all, we collect a large amount of news and comments from NetEase News and label some users in the data as spammers or normal users manually to construct a labeled dataset. Then, a set of behavioral and semantic features are extracted and quantified from the user comment behaviors and comment contents by statistical analysis. Next, we propose the identification technology based on the LPA. Finally, the set of feature values is input into the proposed technology in different combinations, and experiments and evaluations are carried out to determine the most effective combination of features and improve the technology. The results show that the technology proposed in this paper involves a lower data cost but a better identification effect than some traditional technologies based on the supervised classifier.

Details

ISSN :
21693536
Volume :
7
Database :
OpenAIRE
Journal :
IEEE Access
Accession number :
edsair.doi.dedup.....fba6d013aabc23eb3b9e9e9e47c5c1f5