Back to Search Start Over

Comparative analysis of soft-error sensitivity in LU decomposition algorithms on diverse GPUs.

Authors :
Leon, German
Badia, Jose M.
Belloch, Jose A.
Lindoso, Almudena
Entrena, Luis
Source :
Journal of Supercomputing. Jun2024, Vol. 80 Issue 9, p12844-12862. 19p.
Publication Year :
2024

Abstract

Graphics processing units (GPUs) have become integral to embedded systems and supercomputing centres due to their large memory, cutting-edge technology and high performance per watt. However, their susceptibility to transient errors requires a comprehensive analysis of error sensitivity, as well as the development of error mitigation techniques and fault-tolerant algorithms. This study focuses on evaluating the soft-error sensitivity of two distinct versions of LU decomposition algorithms implemented on two very different GPUs—a low-power SoC embedded GPU and a high-performance massively parallel GPU. Through extensive fault injection campaigns on both GPUs, we examine the vulnerability of the algorithms, identify error causes, and determine critical code components requiring enhanced protection. The experiments reveal that most single bit flip fault injections in the instruction results lead to erroneous outcomes or unrecoverable errors. Notably, efficient GPU resource utilisation can increase the number of masked errors, thereby enhancing error resilience. Additionally, while different parts of the code exhibit similar error occurrence types and rates, the propagation of errors to elements within the result matrix differs significantly. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09208542
Volume :
80
Issue :
9
Database :
Academic Search Index
Journal :
Journal of Supercomputing
Publication Type :
Academic Journal
Accession number :
177648342
Full Text :
https://doi.org/10.1007/s11227-024-05925-0