Back to Search Start Over

Resource Conscious Diagnosis and Reconfiguration for NoC Permanent Faults.

Authors :
Parikh, Ritesh
Bertacco, Valeria
Source :
IEEE Transactions on Computers; 7/1/2016, Vol. 65 Issue 7, p2241-2256, 16p
Publication Year :
2016

Abstract

Networks-on-chip (NoCs) have been increasingly adopted in recent years due to the extensive integration of many components in modern multicore processors and system-on-chip designs. At the same time, transistor reliability is becoming a major concern due to the continuous scaling of silicon. As the sole medium of on-chip communication, it is critical for a NoC to be able to tolerate many permanent transistor failures. In this paper, we propose uDIREC, a unified framework for permanent fault diagnosis and subsequent reconfiguration in NoCs, which provides graceful performance degradation with an increasing number of faults. Upon in-field transistor failures, uDIREC leverages a fine-resolution diagnosis mechanism to disable faulty components very sparingly. At its core, uDIREC employs MOUNT, a novel routing algorithm to find reliable and deadlock-free routes that utilize all the still-functional links in the NoC. We implement uDIREC's reconfiguration as a truly-distributed hardware solution, still keeping the area overhead at a minimum. We also propose a software-implemented reconfiguration that provides greater integration with our software-based diagnosis scheme, at the cost of distributed nature of implementation. Regardless of the adopted implementation scheme, uDIREC places no restriction on topology, router architecture and number and location of faults. Experimental results show that uDIREC, implemented in a 64-node NoC, drops 3 $\times$<alternatives> <inline-graphic xlink:type="simple" xlink:href="parikh-ieq1-2479586.gif"/></alternatives> fewer nodes and provides greater than 25 percent throughput improvement (beyond 15 faults) when compared to other state-of-the-art fault-tolerance solutions. uDIREC's improvement over prior-art grows further with more faults, making it a effective NoC reliability solution for a wide range of fault rates. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00189340
Volume :
65
Issue :
7
Database :
Complementary Index
Journal :
IEEE Transactions on Computers
Publication Type :
Academic Journal
Accession number :
116115808
Full Text :
https://doi.org/10.1109/TC.2015.2479586