Back to Search
Start Over
Tokenising, Stemming and Stopword Removal on Anti-spam Filtering Domain.
- Source :
- Current Topics in Artificial Intelligence; 2006, p449-458, 10p
- Publication Year :
- 2006
-
Abstract
- Junk e-mail detection and filtering can be considered a cost-sensitive classification problem. Nevertheless, preprocessing methods and noise reduction strategies used to enhance the computational efficiency in text classification cannot be so efficient in e-mail filtering. This fact is demonstrated here where a comparative study of the use of stopword removal, stemming and different tokenising schemes is presented. The final goal is to preprocess the training e-mail corpora of several content-based techniques for spam filtering (machine approaches and case-based systems). Soundness conclusions are extracted from the experiments carried out where different scenarios are taken into consideration. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISBNs :
- 9783540459149
- Database :
- Complementary Index
- Journal :
- Current Topics in Artificial Intelligence
- Publication Type :
- Book
- Accession number :
- 32887929
- Full Text :
- https://doi.org/10.1007/11881216_47