Back to Search Start Over

Fine-Grained Checkpoint Recovery for Application-Specific Instruction-Set Processors.

Authors :
Li, Tuo
Shafique, Muhammad
Ambrose, Jude Angelo
Henkel, Jorg
Parameswaran, Sri
Source :
IEEE Transactions on Computers. Apr2017, Vol. 66 Issue 4, p647-660. 14p.
Publication Year :
2017

Abstract

Checkpoint recovery (CR) is a classic fault-tolerance technique, which enables computing systems to execute correctly even when affected by transient faults. Although a number of software and hardware based approaches for CR does exist, these approaches usually are either too large, too slow, or require extensive modifications to the software and the caching/memory schemes. In this paper, we propose a novel CR approach, which is based on re-engineering the instruction set of a target processor. We take the base instruction set and augment the native micro-operations, i.e., an architectural description language (ADL), with additional micro-operations to perform checkpointing at the granularity of basic blocks. The recovery mechanism is realized by three custom instructions, which can undo the corruptions caused by transient faults during instruction execution, including the values of general-purpose registers, data memory, and special-purpose registers (PC, status registers, etc.), which were incorrectly modified. Our checkpoint storage is sized according to the application program executed. The experimental results show that our approach degrades the system performance by just 0.76 percent when there is no fault, and introduces an area overhead of 44 percent on average and 79 percent in the worst case. During the fault injection test with the benchmark applications, the recovery took just 62 clock cycles (worst case). [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISSN :
00189340
Volume :
66
Issue :
4
Database :
Academic Search Index
Journal :
IEEE Transactions on Computers
Publication Type :
Academic Journal
Accession number :
121854119
Full Text :
https://doi.org/10.1109/TC.2016.2606378