Back to Search
Start Over
Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach
- Source :
- Soft Computing. 23:2083-2100
- Publication Year :
- 2017
- Publisher :
- Springer Science and Business Media LLC, 2017.
-
Abstract
- Clustering is an unsupervised classification method used to group the objects of an unlabeled data set. The high dimensional data sets generally comprise of irrelevant and redundant features also along with the relevant features which deteriorate the clustering result. Therefore, feature selection is necessary to select a subset of relevant features as it improves discrimination ability of the original set of features which helps in improving the clustering result. Though many metaheuristics have been suggested to select subset of the relevant features in wrapper framework based on some criteria, most of them are marred by the three key issues. First, they require objects class information a priori which is unknown in unsupervised feature selection. Second, feature subset selection is devised on a single validity measure; hence, it produces a single best solution biased toward the cardinality of the feature subset. Third, they find difficulty in avoiding local optima owing to lack of balancing in exploration and exploitation in the feature search space. To deal with the first issue, we use unsupervised feature selection method where no class information is required. To address the second issue, we follow pareto-based approach to obtain diverse trade-off solutions by optimizing conceptually contradicting validity measures silhouette index (Sil) and feature cardinality (d). For the third issue, we introduce genetic crossover operator to improve diversity in a recent Newtonian law of gravity-based metaheuristic binary gravitational search algorithm (BGSA) in multi-objective optimization scenario; it is named as improved multi-objective BGSA for feature selection (IMBGSAFS). We use ten real-world data sets for comparison of the IMBGSAFS results with three multi-objective methods MBGSA, MOPSO, and NSGA-II in wrapper framework and the Pearson’s linear correlation coefficient (FM-CC) as a multi-objective filter method. We employ four multi-objective quality measures convergence, diversity, coverage and ONVG. The obtained results show superiority of the IMBGSAFS over its competitors. An external clustering validity index F-measure also establish the above finding. As the decision maker picks only a single solution from the set of trade-off solutions, we employee the F-measure to select a final single solution from the external archive. The quality of final solution achieved by IMBGSAFS is superior over competitors in terms of clustering accuracy and/or smaller subset size.
- Subjects :
- 0209 industrial biotechnology
Computer science
business.industry
Correlation clustering
k-means clustering
Feature selection
Pattern recognition
02 engineering and technology
Theoretical Computer Science
Data set
020901 industrial engineering & automation
CURE data clustering algorithm
Feature (computer vision)
0202 electrical engineering, electronic engineering, information engineering
Canopy clustering algorithm
FLAME clustering
020201 artificial intelligence & image processing
Geometry and Topology
Artificial intelligence
Cluster analysis
business
Software
Subjects
Details
- ISSN :
- 14337479 and 14327643
- Volume :
- 23
- Database :
- OpenAIRE
- Journal :
- Soft Computing
- Accession number :
- edsair.doi...........b85d1e06ecabb91ad88cd94b56c6f0e3
- Full Text :
- https://doi.org/10.1007/s00500-017-2923-x