Back to Search Start Over

DQPFS: Distributed quadratic programming based feature selection for big data.

Authors :
Soheili, Majid
Eftekhari-Moghadam, Amir Masoud
Source :
Journal of Parallel & Distributed Computing. Apr2020, Vol. 138, p1-14. 14p.
Publication Year :
2020

Abstract

With the advent of the Big data, the scalability of the machine learning algorithms has become more crucial than ever before. Furthermore, Feature selection as an essential preprocessing technique can improve the performance of the learning algorithms in confront with large-scale dataset by removing the irrelevant and redundant features. Owing to the lack of scalability, most of the classical feature selection algorithms are not so proper to deal with the voluminous data in the Big Data era. QPFS is a traditional feature weighting algorithm that has been used in lots of feature selection applications. By inspiring the classical QPFS, in this paper, a scalable algorithm called DQPFS is proposed based on the novel Apache Spark cluster computing model. The experimental study is performed on three big datasets that have a large number of instances and features at the same time. Then some assessment criteria such as accuracy, execution time, speed-up and scale-out are figured. Moreover, to study more deeply, the results of the proposed algorithm are compared with the classical version QPFS and the DiRelief, a distributed feature selection algorithm proposed recently. The empirical results illustrate that proposed method has (a) better scale-out than DiRelief, (b) significantly lower execution time than DiRelief, (c) lower execution time than QPFS, (d) better accuracy of the Naïve Bayes classifier in two of three datasets than DiRelief. • Proposing a distributed and scalable feature selection for Big Data, DQPFS. • An experimental study in three big datasets over a real cluster of computers. • Comparing the DQPFS algorithm with a similar distributed algorithm, DiRelief. • The proposed method has better Execution time and Speed-up than DiRelief Algorithm. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
07437315
Volume :
138
Database :
Academic Search Index
Journal :
Journal of Parallel & Distributed Computing
Publication Type :
Academic Journal
Accession number :
141664380
Full Text :
https://doi.org/10.1016/j.jpdc.2019.12.001