Some issues on scalable feature selection1This is an extended version of the paper presented at the Fourth World Congress of Expert Systems: Application of Advanced Information Technologies held in Mexico City in March 1998.1

Authors :: Rudy Setiono
Huan Liu
Source :: Expert Systems with Applications. 15:333-339
Publication Year :: 1998
Publisher :: Elsevier BV, 1998.
Abstract: Feature selection determines relevant features in the data. It is often applied in pattern classification, data mining, as well as machine learning. A special concern for feature selection nowadays is that the size of a database is normally very large, both vertically and horizontally. In addition, feature sets may grow as the data collection process continues. Effective solutions are needed to accommodate the practical demands. This paper concentrates on three issues: large number of features, large data size, and expanding feature set. For the first issue, we suggest a probabilistic algorithm to select features. For the second issue, we present a scalable probabilistic algorithm that expedites feature selection further and can scale up without sacrificing the quality of selected features. For the third issue, we propose an incremental algorithm that adapts to the newly extended feature set and captures `concept drifts' by removing features from previously selected and newly added ones. We expect that research on scalable feature selection will be extended to distributed and parallel computing and have impact on applications of data mining and machine learning.

Subjects :: business.industry
Computer science
Feature extraction
General Engineering
Feature selection
Machine learning
computer.software_genre
Computer Science Applications
Randomized algorithm
k-nearest neighbors algorithm
Artificial Intelligence
Feature (computer vision)
Scalability
Feature (machine learning)
Artificial intelligence
Data mining
business
computer
Feature learning

Tools