1. Non-MapReduce computing for intelligent big data analysis.
- Author
-
Sun, Xudong, Zhao, Lingxiang, Chen, Jiaqi, Cai, Yongda, Wu, Dingming, and Huang, Joshua Zhexue
- Subjects
- *
DISTRIBUTED computing , *DISTRIBUTED algorithms , *DATA analysis , *DATA transmission systems , *ALGORITHMS , *BIG data , *MACHINE learning - Abstract
MapReduce is a popular paradigm in distributed computing, but it is not efficient when executing iterative algorithms over a distributed big dataset due to its heavy data communication overhead. Non-MapReduce computing is an alternative for improving computing efficiency and data scalability when using iterative algorithms to process big distributed datasets on clusters. In this paper, we investigate Non-MapReduce approach in distributed computing and use Spark implementations of machine learning algorithms to discuss the problems of MapReduce in executing iterative algorithms over a big distributed dataset and the advantages of Non-MapReduce for the same tasks. We present a method to build a new machine learning library made of sequential algorithms for distributed computing. We use experiment results to show comparisons of computing efficiency and data scalability of MapReduce and Non-MapReduce in executing six machine learning algorithms over big datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF