1. Bayesian spam filtering for Vietnamese emails
- Author
-
Vu Duc Lung and Truong Nguyen Vu
- Subjects
Computer science ,business.industry ,InformationSystems_INFORMATIONSYSTEMSAPPLICATIONS ,Vietnamese ,Email address harvesting ,Filter (signal processing) ,computer.software_genre ,language.human_language ,Spamming ,Naive Bayes classifier ,Bag-of-words model ,language ,Artificial intelligence ,Data mining ,business ,computer ,Natural language processing ,Natural language ,Email filtering - Abstract
Spam filtering is seen as the considerable concern to the researchers, and there are some techniques and email filtering systems are implemented. They are, however, not so effective for Vietnamese language. Although methods for filtering English spam email can be still used in Vietnamese language, but Vietnamese has its own particular characteristic. The biggest difference is a signified word in Vietnamese usually a compound. When a compound used in spam email is separated into single words, it becomes words that are usually used in both spam and ham emails. This leads to the difficulty for the system to filter spam emails. The objective of this paper is to present a new model using the application of Naive Bayesian algorithms to analyze the segmentation of Vietnamese language. The process of demonstration and evaluation of this model shows the feasibility of this technique in filtering Vietnamese email.
- Published
- 2012