Back to Search Start Over

Fault-tolerant message switching based on wormhole switching and backtracking

Authors :
M. Sueishi
Masato Kitakami
Hideo Ito
Source :
PRDC
Publication Year :
2004
Publisher :
IEEE, 2004.

Abstract

Parallel computers are now popularly applied to applications where many calculations are required. In a NO Remote memory Access model (NORA) parallel computer, many processors are connected by communication links and calculation results are obtained by communications among processors. The message switching method, which controls message transmission in the parallel computer, is one of the most important parameters to improve the performance of the parallel computer. Since parallel computers include many processors, its failure rate is very high and many fault-tolerant switching methods have been proposed. The existing methods have problems, however, such as low communication throughput, low fault-tolerant capability, and large hardware overhead. We propose fault-tolerant switching by improving wormhole switching. The proposed method inserts dummy flits, having no information, after the header flit, the first flit of the packet. By overwriting the header flit to the dummy flit, backtracking is implemented without hardware overhead. Computer simulation says that in a 16 by 16 2D torus, for example, the throughput of the proposed method is almost equal to that of existing methods which require large hardware overhead if the number of the faulty nodes is less then 40.

Details

Database :
OpenAIRE
Journal :
10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings.
Accession number :
edsair.doi...........ce011a489b7498a1265bfb824a685c21
Full Text :
https://doi.org/10.1109/prdc.2004.1276569