A Hybrid Text Classification Model based on Rough Sets and Genetic Algorithms

Authors :: Zhen Hua
Xiaoyue Wang
Rujiang Bai
Source :: SNPD
Publication Year :: 2008
Publisher :: IEEE, 2008.
Abstract: Automatic categorization of documents into pre-defined taxonomies is a crucial step in data mining and knowledge discovery. Standard machine learning techniques like support vector machines(SVM) and related large margin methods have been successfully applied for this task. Unfortunately, the high dimensionality of input feature vectors impacts on the classification speed. The kernel parameters setting for SVM in a training process impacts on the classification accuracy. Feature selection is another factor that impacts classification accuracy. The objective of this work is to reduce the dimension of feature vectors, optimizing the parameters to improve the SVM classification accuracy and speed. In order to improve classification speed we spent rough sets theory to reduce the feature vector space. We present a genetic algorithm approach for feature selection and parameters optimization to improve classification accuracy. Experimental results indicate our method is more effective than traditional SVM methods and other traditional methods.

Subjects :: Computer science
business.industry
Feature vector
Document classification
Pattern recognition
Linear classifier
Feature selection
computer.software_genre
Machine learning
k-nearest neighbors algorithm
Support vector machine
Statistical classification
ComputingMethodologies_PATTERNRECOGNITION
Rough set
Artificial intelligence
business
computer

Database :: OpenAIRE
Journal :: 2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing
Accession number :: edsair.doi...........d6388a6d4bd9ecd4f6f428e4f96577ff
Full Text :: https://doi.org/10.1109/snpd.2008.142