Back to Search Start Over

Enhancing Effectiveness of Dimension Reduction in Text Classification.

Authors :
Seyyedi, Seyyed Hossein
Minaei-Bidgoli, Behrouz
Source :
International Journal on Artificial Intelligence Tools. Jun2017, Vol. 26 Issue 3, p-1. 21p.
Publication Year :
2017

Abstract

Nowadays, text is one prevalent forms of data and text classification is a widely used data mining task, which has various application fields. One mass-produced instance of text is email. As a communication medium, despite having a lot of advantages, email suffers from a serious problem. The number of spam emails has steadily increased in the recent years, leading to considerable irritation. Therefore, spam detection has emerged as a separate field of text classification. A primary challenge of text classification, which is more severe in spam detection and impedes the process, is high-dimensionality of feature space. Various dimension reduction methods have been proposed that produce a lower dimensional space compared to the original. These methods are divided mainly into two groups: feature selection and feature extraction. This research deals with dimension reduction in the text classification task and especially performs experiments in the spam detection field. We employ Information Gain (IG) and Chi-square Statistic (CHI) as well-known feature selection methods. Also, we propose a new feature extraction method called Sprinkled Semantic Feature Space (SSFS). Furthermore, this paper presents a new hybrid method called IG_SSFS. In IG_SSFS, we combine the selection and extraction processes to reap the benefits from both. To evaluate the mentioned methods in the spam detection field, experiments are conducted on some well-known email datasets. According to the results, SSFS demonstrated superior effectiveness over the basic selection methods in terms of improving classifiers' performance, and IG_SSFS further enhanced the performance despite consuming less processing time. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02182130
Volume :
26
Issue :
3
Database :
Academic Search Index
Journal :
International Journal on Artificial Intelligence Tools
Publication Type :
Academic Journal
Accession number :
123732063
Full Text :
https://doi.org/10.1142/S0218213017500087