Back to Search
Start Over
OPTWEB: A Lightweight Fully Connected Inter-FPGA Network for Efficient Collectives.
- Source :
-
IEEE Transactions on Computers . Jun2021, Vol. 70 Issue 6, p849-862. 14p. - Publication Year :
- 2021
-
Abstract
- Modern FPGA accelerators can be equipped with many high-bandwidth network I/Os, e.g., 64 x 50 Gbps, enabled by onboard optics or co-packaged optics. Some dozens of tightly coupled FPGA accelerators form an emerging computing platform for distributed data processing. However, a conventional indirect packet network using Ethernet's Intellectual Properties imposes an unacceptably large amount of the logic for handling such high-bandwidth interconnects on an FPGA. Besides the indirect network, another approach builds a direct packet network. Existing direct inter-FPGA networks have a low-radix network topology, e.g., 2-D torus. However, the low-radix network has the disadvantage of a large diameter and large average shortest path length that increases the latency of collectives. To mitigate both problems, we propose a lightweight, fully connected inter-FPGA network called OPTWEB for efficient collectives. Since all end-to-end separate communication paths are statically established using onboard optics, raw block data can be transferred with simple link-level synchronization. Once each source FPGA assigns a communication stream to a path by its internal switch logic between memory-mapped and stream interfaces for remote direct memory access (RDMA), a one-hop transfer is provided. Since each FPGA performs input/output of the remote memory access between all FPGAs simultaneously, multiple RDMAs efficiently form collectives. The OPTWEB network provides 0.71-μsec start-up latency of collectives among multiple Intel Stratix 10 MX FPGA cards with onboard optics. The OPTWEB network consumes 31.4 and 57.7 percent of adaptive logic modules for aggregate 400-Gbps and 800-Gbps interconnects on a custom Stratix 10 MX 2100 FPGA, respectively. The OPTWEB network reduces by 40 percent the cost compared to a conventional packet network. [ABSTRACT FROM AUTHOR]
- Subjects :
- *DISTRIBUTED computing
*FIELD programmable gate arrays
*COMPUTING platforms
Subjects
Details
- Language :
- English
- ISSN :
- 00189340
- Volume :
- 70
- Issue :
- 6
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Computers
- Publication Type :
- Academic Journal
- Accession number :
- 150448421
- Full Text :
- https://doi.org/10.1109/TC.2021.3068715