Back to Search
Start Over
An Effective Identification Technology for Online News Comment Spammers in Internet Media
- Source :
- IEEE Access, Vol 7, Pp 37792-37806 (2019)
- Publication Year :
- 2019
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2019.
-
Abstract
- With the development of mobile Internet, it is changing the way we communicate with others. Internet media have gradually become the main mobile crowdsourcing applications for information dissemination and user communication, including online news and social networks. However, the potential business opportunities have stimulated the emergence of a large number of spammers, who release false speech, advertisements, pornographic contents, and phishing websites on the media to gain commercial benefits, which seriously affects the experience of normal users. Therefore, in order to reduce the harm of false information, the research on the identification technology of spammers has been carried out extensively. However, the traditional technologies of identifying spammers involve high data costs and poor effects, and most of them are concentrated in the field of social networks, while less research is carried out in the field of online news. In this paper, we propose an effective technology of identifying online news comment spammers based on the label propagation algorithm (LPA), making full use of the user comment behaviors and contents. First of all, we collect a large amount of news and comments from NetEase News and label some users in the data as spammers or normal users manually to construct a labeled dataset. Then, a set of behavioral and semantic features are extracted and quantified from the user comment behaviors and comment contents by statistical analysis. Next, we propose the identification technology based on the LPA. Finally, the set of feature values is input into the proposed technology in different combinations, and experiments and evaluations are carried out to determine the most effective combination of features and improve the technology. The results show that the technology proposed in this paper involves a lower data cost but a better identification effect than some traditional technologies based on the supervised classifier.
- Subjects :
- mobile crowdsourcing applications
General Computer Science
online news comment
Computer science
Information Dissemination
Internet media
02 engineering and technology
Crowdsourcing
Field (computer science)
World Wide Web
0202 electrical engineering, electronic engineering, information engineering
Feature (machine learning)
General Materials Science
Set (psychology)
Spammer identification
business.industry
General Engineering
020207 software engineering
Construct (python library)
Phishing
Identification (information)
label propagation algorithm
020201 artificial intelligence & image processing
lcsh:Electrical engineering. Electronics. Nuclear engineering
business
lcsh:TK1-9971
Subjects
Details
- ISSN :
- 21693536
- Volume :
- 7
- Database :
- OpenAIRE
- Journal :
- IEEE Access
- Accession number :
- edsair.doi.dedup.....fba6d013aabc23eb3b9e9e9e47c5c1f5