Data mining on vast data sets as a cluster system benchmark

Authors :: Alexander Heinecke
Roman Karlstetter
Dirk Pflüger
Hans-Joachim Bungartz
Source :: Concurrency and Computation: Practice and Experience. 28:2145-2165
Publication Year :: 2015
Publisher :: Wiley, 2015.
Abstract: Comparing different accelerated cluster architectures by a single application is a tough piece of work because this application has to be optimized with respect to platform-dependent features. In this work, we demonstrate such an optimization for a data mining algorithm which solves regression and classification problems on vast data sets. Our technique is based on least squares regression, and its major component is the iterative matrix-free solution of a linear system of equations. By processing data sets ranging from several hundreds of thousands instances to multi-million data points in strong-scaling and weak-scaling settings, we are able to estimate the amount of parallelism needed to unleash the performance of classic CPU-based machines and clusters employing Intel Xeon Phi coprocessors and NVIDIA Kepler GPUs. Only in strong-scaling experiments, GPUs and coprocessors suffer from their tremendous amount of needed parallelism and get outperformed by dual socket Intel Sandy Bridge nodes at large scale more than 64 nodes/accelerators. However, in weak-scaling scenarios, a speed-up larger than 2X over an entire CPU node can be achieved by a single accelerator. Copyright © 2015 John Wiley & Sons, Ltd.