Back to Search Start Over

Redesigning the message logging model for high performance.

Authors :
Bouteiller, Aurelien
Bosilca, George
Dongarra, Jack
Source :
Concurrency & Computation: Practice & Experience; Nov2010, Vol. 22 Issue 16, p2196-2211, 16p, 3 Diagrams, 1 Chart, 3 Graphs
Publication Year :
2010

Abstract

Over the past decade the number of processors used in high performance computing has increased to hundreds of thousands. As a direct consequence, and while the computational power follows the trend, the mean time between failures (MTBF) has suffered and is now being counted in hours. In order to circumvent this limitation, a number of fault-tolerant algorithms as well as execution environments have been developed using the message passing paradigm. Among them, message logging has been proved to achieve a better overall performance when the MTBF is low, mainly due to a faster failure recovery. However, message logging suffers from a high overhead when no failure occurs. Therefore, in this paper we discuss a refinement of the message logging model intended to improve the failure-free message logging performance. The proposed approach simultaneously removes useless memory copies and reduces the number of logged events. We present the implementation of a pessimistic message logging protocol in Open MPI and compare it with the previous reference implementation MPICH-V2. The results outline a several order of magnitude improvement on the performance and a zero overhead for most messages. Published in 2010 by John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15320626
Volume :
22
Issue :
16
Database :
Complementary Index
Journal :
Concurrency & Computation: Practice & Experience
Publication Type :
Academic Journal
Accession number :
54357917