Back to Search
Start Over
Content Based Spam Text Classification: An Empirical Comparison between English and Chinese
- Source :
- INCoS
- Publication Year :
- 2013
- Publisher :
- IEEE, 2013.
-
Abstract
- Spam text including e-mails, SMS and etc, is a real and growing problem primarily due to the availability of digital handset and internet. To filter spam text is to be the utmost topic over varies study area. Text bodies of different forms of communication expose channel for spammers. In this study, text dataset in English and Chinese are pre-processed. Classical classifiers are applied on the pre-processed dataset to evaluate the accuracy of the same classifier. The behavior of classifiers among English and Chinese is evaluated. The paper also discussed the result of experiments. In addition, different from most existing text spam detection methods which are based on English, classifiers suited for English text classification is insufficient for Chinese text classification.
- Subjects :
- Information retrieval
Empirical comparison
Channel (digital image)
Noisy text analytics
Computer science
business.industry
Filter (software)
ComputingMethodologies_PATTERNRECOGNITION
Text mining
Bag-of-words model
Classifier (linguistics)
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
The Internet
business
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2013 5th International Conference on Intelligent Networking and Collaborative Systems
- Accession number :
- edsair.doi...........54736257aec5def3126739b1df52210a
- Full Text :
- https://doi.org/10.1109/incos.2013.21