Back to Search Start Over

Tokenising, Stemming and Stopword Removal on Anti-spam Filtering Domain.

Authors :
Marín, Roque
Onaindía, Eva
Bugarín, Alberto
Santos, José
Méndez, J. R.
Iglesias, E. L.
Fdez-Riverola, F.
Díaz, F.
Corchado, J. M.
Source :
Current Topics in Artificial Intelligence; 2006, p449-458, 10p
Publication Year :
2006

Abstract

Junk e-mail detection and filtering can be considered a cost-sensitive classification problem. Nevertheless, preprocessing methods and noise reduction strategies used to enhance the computational efficiency in text classification cannot be so efficient in e-mail filtering. This fact is demonstrated here where a comparative study of the use of stopword removal, stemming and different tokenising schemes is presented. The final goal is to preprocess the training e-mail corpora of several content-based techniques for spam filtering (machine approaches and case-based systems). Soundness conclusions are extracted from the experiments carried out where different scenarios are taken into consideration. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISBNs :
9783540459149
Database :
Complementary Index
Journal :
Current Topics in Artificial Intelligence
Publication Type :
Book
Accession number :
32887929
Full Text :
https://doi.org/10.1007/11881216_47