Back to Search Start Over

Model-Based Estimation of the Communication Cost of Hybrid Data-Parallel Applications on Heterogeneous Clusters.

Authors :
Rico-Gallego, Juan-Antonio
Lastovetsky, Alexey L.
Diaz-Martin, Juan-Carlos
Source :
IEEE Transactions on Parallel & Distributed Systems. Nov2017, Vol. 28 Issue 11, p3215-3228. 14p.
Publication Year :
2017

Abstract

Heterogeneous systems composed of CPUs and accelerators sharing communication channels of different performance are getting mainstream in HPC but, at the same time, they show a complexity that makes it difficult to optimize the deployment of a data parallel application. Recent analytical tools such as Functional Performance Models, combined with advanced partitioning algorithms, manage to achieve a balanced configuration by distributing the workload unevenly, according to the performance of the different processing units. Unfortunately, such uneven distribution of the computation load leads to communication unbalances that, very often, render worthless the previous workload balancing efforts. Finding the optimal communication scheme without expensive testing on the executing platform requires an analytical approach to the estimation of the communication cost of different configurations of the application. With this goal in mind, we propose and discuss an extension of the $\tau$<alternatives><inline-graphic xlink:href="ricogallego-ieq1-2715809.gif"/> </alternatives>-Lop communication performance model to cover heterogeneous architectures. In order to provide a quantitative assessment of this extended model, we conduct experiments with two representative computational kernels, the SUMMA algorithm and the 2D wave equation solver. The $\tau$ <alternatives><inline-graphic xlink:href="ricogallego-ieq2-2715809.gif"/></alternatives> -Lop predictions are compared against the HLogGP model and the observed costs for a variety of configurations, hardware resources and problem sizes. [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISSN :
10459219
Volume :
28
Issue :
11
Database :
Academic Search Index
Journal :
IEEE Transactions on Parallel & Distributed Systems
Publication Type :
Academic Journal
Accession number :
125562515
Full Text :
https://doi.org/10.1109/TPDS.2017.2715809