Back to Search Start Over

A Reconfigurable Compute-in-the-Network FPGA Assistant for High-Level Collective Support with Distributed Matrix Multiply Case Study

Authors :
Justin Broaddus
Martin C. Herbordt
Tong Geng
Derek Schafer
Pouya Haghi
Anqi Guo
Anthony Skjellum
Source :
FPT
Publication Year :
2020
Publisher :
IEEE, 2020.

Abstract

Collectives are a fundamental part of HPC applications and their optimization has undergone decades of study. In recent years collectives have been accelerated with in-network hardware support, initially in the NIC, but recently also in the switch. This support is limited, however, to a very small set of scalar operations. In this work, we first propose that these collectives be extended to operations on composite data types such as matrices. We then demonstrate how these high-level collectives can be supported in an FPGA-based switch. In this paper, we propose a reconfigurable compute-in-the-network FPGA assistant, FPin, to implement high-level collectives in MPI. To maintain streaming packet processing while retaining reuse-based compute-intensive processing we propose a bulk-streaming message passing interface along with a methodology to tune communication-computation overlap. As a proof of concept, we evaluate the efficiency of the FPGA assistant with the ubiquitous distributed matrix multiply kernel, PGEMM. Experimental results show that PGEMM accelerated with high-level collective support can achieve, on average, 2.4× and 1.8× speedups on an FPGA cluster compared to the state-of-the-art COSMA algorithm on Stampede2 Skylake for float and complex float data types, respectively.

Details

Database :
OpenAIRE
Journal :
2020 International Conference on Field-Programmable Technology (ICFPT)
Accession number :
edsair.doi...........17ffca5255583b1aea5ed48932bd1bbd
Full Text :
https://doi.org/10.1109/icfpt51103.2020.00030