1. Unsupervised Text Feature Selection Using Memetic Dichotomous Differential Evolution
- Author
-
Hong Xie, Kok Wai Wong, Chun Che Fung, and Ibraheem Al-Jadir
- Subjects
wrapper ,Optimization problem ,lcsh:T55.4-60.8 ,Computer science ,Feature selection ,02 engineering and technology ,lcsh:QA75.5-76.95 ,Theoretical Computer Science ,feature selection ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,memetic ,lcsh:Industrial engineering. Management engineering ,hybridization ,Numerical Analysis ,filter ,business.industry ,Pattern recognition ,Filter (signal processing) ,Document clustering ,Computational Mathematics ,Computational Theory and Mathematics ,Feature (computer vision) ,Differential evolution ,Simulated annealing ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Artificial intelligence ,lcsh:Electronic computers. Computer science ,business ,optimization - Abstract
Feature Selection (FS) methods have been studied extensively in the literature, and there are a crucial component in machine learning techniques. However, unsupervised text feature selection has not been well studied in document clustering problems. Feature selection could be modelled as an optimization problem due to the large number of possible solutions that might be valid. In this paper, a memetic method that combines Differential Evolution (DE) with Simulated Annealing (SA) for unsupervised FS was proposed. Due to the use of only two values indicating the existence or absence of the feature, a binary version of differential evolution is used. A dichotomous DE was used for the purpose of the binary version, and the proposed method is named Dichotomous Differential Evolution Simulated Annealing (DDESA). This method uses dichotomous mutation instead of using the standard mutation DE to be more effective for binary purposes. The Mean Absolute Distance (MAD) filter was used as the feature subset internal evaluation measure in this paper. The proposed method was compared with other state-of-the-art methods including the standard DE combined with SA, which is named DESA in this paper, using five benchmark datasets. The F-micro, F-macro (F-scores) and Average Distance of Document to Cluster (ADDC) measures were utilized as the evaluation measures. The Reduction Rate (RR) was also used as an evaluation measure. Test results showed that the proposed DDESA outperformed the other tested methods in performing the unsupervised text feature selection.
- Published
- 2020
- Full Text
- View/download PDF