1. 结合增益率与堆叠自编码器的并行随机森林算法.
- Author
-
刘卫明, 陈伟达, 毛伊敏, and 陈志刚
- Subjects
- *
RANDOM forest algorithms , *LATIN hypercube sampling , *ACTIVE learning , *BIG data , *ELECTRONIC data processing , *ALGORITHMS - Abstract
In the big data environment, the random forest algorithm suffers from excessive redundancy and irrelevant features, the insufficient spatial information content of feature subspace, and low parallelization efficiency. To resolve these issues, this paper presented PRFGRSAE. Firstly, this algorithm proposed a DRNGRSAE, which filtered redundant and irrelevant features of the feature set and extracted features by stacked auto-encoders to reduce the number of redundant and irrelevant features effectively. Secondly, it proposed a SSLF that combined Latin hypercube sampling and normalized correlation degree, which formed feature subspaces with high spatial expression by performing multi-layer division sampling on the feature set, and ensured the feature subspace information content. Finally, it proposed a reducer allocation strategy DSVLA combining with variable action learning automata, which allocated each cluster to reducers for processing evenly and improved the parallelization efficiency effectively. Experimental results show that compared with IMRF, KSMRF, and GAPRF algorithms, the speedup ratio and accuracy of the PRFGRSAE algorithm are significantly improved. Therefore, the algorithm can obtain higher accuracy and parallel efficiency when applied to process large data, especially for data sets with more features. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF