Back to Search Start Over

Improving time efficiency in big data through progressive sampling-based classification model.

Authors :
Bangera, Nandita
Kayarvizhy
Luharuka, Shubham
Manek, Asha S.
Source :
Indonesian Journal of Electrical Engineering & Computer Science; Jan2024, Vol. 33 Issue 1, p248-260, 13p
Publication Year :
2024

Abstract

The proposed system aims to overcome challenges posed by large databases, data imbalance, heterogeneity, and multidimensionality through progressive sampling as a novel classification model. It leverages sampling techniques to enhance processing performance and overcome memory restrictions. The random forest regressor feature importance technique with the gini significance method is employed to identify important characteristics, reducing the data's features for classification. The system utilizes diverse classifiers such as random forest, ensemble learning, support vector machine (SVM), k-nearest neighbors' algorithm (KNN), and logistic regression, allowing flexibility in handling different data types and achieving high accuracy in classification tasks. By iteratively applying progressive sampling to the dataset with the best features, the proposed technique aims to significantly improve performance compared to using the entire dataset. This approach focuses computational resources on the most informative subsets of data, reducing time complexity. Results show that the system can achieve over 85% accuracy even with only 5-10% of the original data size, providing accurate predictions while reducing data processing requirements. In conclusion, the proposed system combines progressive sampling, feature selection using random forest regressor feature importance (RFRFI-PS), and a range of classifiers to address challenges in large databases and improve classification accuracy. It demonstrates promising results in accuracy and time complexity reduction. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
25024752
Volume :
33
Issue :
1
Database :
Complementary Index
Journal :
Indonesian Journal of Electrical Engineering & Computer Science
Publication Type :
Academic Journal
Accession number :
175002650
Full Text :
https://doi.org/10.11591/ijeecs.v33.i1.pp248-260