Back to Search Start Over

High-Quality Fault Resiliency in Fat Trees

Authors :
Gliksberg, John
Capra, Antoine
Louvet, Alexandre
Garcia, Pedro Javier
Sohier, Devan
Source :
IEEE Micro, 2020, 40 (1), pp.44-49. \&\#x27E8;10.1109/MM.2019.2949978\&\#x27E9
Publication Year :
2022

Abstract

Coupling regular topologies with optimised routing algorithms is key in pushing the performance of interconnection networks of supercomputers.In this paper we present Dmodc, a fast deterministic routing algorithm for Parallel Generalised Fat-Trees (PGFTs) which minimises congestion risk even under massive network degradation caused by equipment failure.Dmodc computes forwarding tables with a closed-form arithmetic formula by relying on a fast preprocessing phase.This allows complete re-routing of networks with tens of thousands of nodes in less than a second.In turn, this greatly helps centralised fabric management react to faults with high-quality routing tables and no impact to running applications in current and future very large-scale HPC clusters.<br />Comment: arXiv admin note: text overlap with arXiv:2211.11817

Details

Database :
arXiv
Journal :
IEEE Micro, 2020, 40 (1), pp.44-49. \&\#x27E8;10.1109/MM.2019.2949978\&\#x27E9
Publication Type :
Report
Accession number :
edsarx.2211.13101
Document Type :
Working Paper