Back to Search Start Over

Hardware/software solutions to enable the use of high-performance processors in the most stringent safety-critical systems

Authors :
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Kosmidis, Leonidas
Abella Ferrer, Jaume
Alcaide Portet, Sergi
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Kosmidis, Leonidas
Abella Ferrer, Jaume
Alcaide Portet, Sergi
Source :
TDX (Tesis Doctorals en Xarxa)
Publication Year :
2023

Abstract

(English) Future Safety-Critical Systems require a boost in guaranteed performance in order to satisfy the increasing performance demands of the state-of-the-art complex software features. Ar1 approach to achieve these performance requirements is the usage of High-Performance Computing (HPC) components which can deliver more computation power than current safety­ critical components. However, the dependability support of these HPC components are not the same as that for the safety-­ critical components, so HPC components can jeopardize the functional safety of the entire system, especially since some of the highest-criticality functionalities maybe executed entirely on top of these components (e.g., neural networks in a Graphical Processing Unit (GPU)). Based on the safety requirements of performance-hungry critical applications, such as those for an autonomous operation, these HPC components must comply with the highest criticality levels, hence including the required dependability support. The overarching goal of this thesis is to present techniques to achieve that in different HPC components. In particular, we focus on GPUs and multicores. The techniques presented aim at providing diverse redundant execution, as needed to avoid Common Cause Failure (CCF)s, which are those defeating safety measures (e.g., pure redundancy) as a consequence of a single-point fault (e.g., a fault affecting both redundant instances identically). Such a solution is comparable to the lockstep execution employed on safety-critical processors. The first set of contributions of this thesis focuses on enabling diverse redundant execution on a single GPU. We propose two different solutions: (1) a slight hardware modification affecting the internal scheduler of the GPU and (2) a software-only approach that requires knowledge of the hardware resources of the GPU. In these contributions, we also analyze the staggering created due to the CPU-GPU inherent interaction. Finally, the last contributi<br />(Català) Els futurs sistemes de fiabilitat critica necessitaran un increment en el rendiment garantit per tal de salisfer l'elevada demanda de computaci6 del complex programari que actualment es l'estat de l'art. Una estrategia per aconseguir aquest rendiment es utilitzar components del camp de la computaci6 d'altes prestacions (HPC), ja que poden aportar mes poder de computaci6 que els actuals components de fiabilitat critica. Tanmateix, el suport de fiabilitat que tenen aquests HPC components no es el mateix que tenen els components de fiabilitat crilica, i l'us dels components HPC pot amenar;:ar la seguretat funcional de tot el sistema, especialment si algunes de les funcionalitats d'alta criticalitat poden ser totalment executades en algun d'aquests components (perexemple, xarxes neuronals dins d'una unitatde computaci6 grafica (GPU)). Basant-se en els requisits de fiabilitat de les aplicacions que requereixen una gran potencia de computaci6, com les d'operaci6 aut6noma, aquests components d'altes prestacions han de complir els maxims nivells de fiabilitat i per tan incloure mecanismes que aportin aquesta fiabilitat. L'objectiu general d'aquesta tesi es presentar tecniques per aconseguir aquests nivells en component d'altes prestacions. En concret, fem enfasis en unitats de computaci6 grafiques i sistemes multinucli. Les tecniques presentades tenen la intenci6 de proveir una execuci6 redundant i diversa, la qual es necessita per evitar les fallades de causa comuna (CCF), que s6n el tipus de fallades de les quals no poden ser protegides utilitzant metodes de fiabilitat (com simple redundancia) i que tenen un unic punt de fallada (una fallada que afecti les dues instancies redundants identicament). Aquestes solucions s6n comparables a l'execuci6 en lockstep (pas de bloqueig) emprada en sistemes de fiabilitat critica. El primer grup de contribucions d'aquesta tesis es bas en a habilitar una execuci6 redundant i diversa en una sola unitat de computaci6 grafica. Prop<br />Postprint (published version)

Details

Database :
OAIster
Journal :
TDX (Tesis Doctorals en Xarxa)
Notes :
155 p., application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1427129813
Document Type :
Electronic Resource