282 results on '"Kosmidis, Leonidas"'
Search Results
52. Evaluation of High-Level Programming Models for High-Performance Critical Systems
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Kosmidis, Leonidas, Peralta Quesada, Cristina, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Kosmidis, Leonidas, and Peralta Quesada, Cristina
- Abstract
Upcoming safety critical systems require high performance processing, which can be provided by multi-cores and embedded GPUs found in several Systems-on-chip (SoC) targeting these domains. So far, only low-level programming models and APIs, such as CUDA or OpenCL have been evaluated. In this Master thesis, we evaluate the effectiveness of higher level programming models, such as OpenACC and SYCL for critical applications executed in such embedded platforms. In particular, we are interested in two aspects: performance and programmability. In order to conduct our study, we use the GPU4S Bench benchmarking suite for space and a pedestrian detection application representing the automotive sector, which we port into the new programming models and analyze their behavior. We perform our evaluation on a representative embedded platform, the NVIDIA Xavier AGX which is considered a good candidate for future safety critical systems in both domains and compare our results with other programming models.
- Published
- 2022
53. On the MC/DC code coverage of Vulkan SC GPU code
- Author
-
Barcelona Supercomputing Center, Martín Alemán, Jaime Luis, Agenjo, Antonio, Carretero Jiménez, Sergio, Kosmidis, Leonidas, Barcelona Supercomputing Center, Martín Alemán, Jaime Luis, Agenjo, Antonio, Carretero Jiménez, Sergio, and Kosmidis, Leonidas
- Abstract
Next generation avionics systems require high performance, which can be provided by graphics processing units (GPUs). The newly introduced API Vulkan SC, enables the development of safety critical GPU software with complex control flow, whose certification is subject to DO-178C certifiability objectives, such as MC/DC code coverage.In this paper we explain for the first time how MC/DC coverage can be applied in Vulkan SC code as well as the type of potential development errors which can arise in GPU programming. We show how GPU code can be converted in equivalent sequential CPU code and how both versions can achieve 100% MC/DC code coverage., This work was performed within the Airbus TANIAGPU Project ADS (E/200). It was also partially supported by the European Space Agency (ESA) through the GPU4S (GPU for Space) activity, the Spanish Ministry of Economy and Competitiveness under grants PID2019- 107255GB-C21 and IJC-2020-045931-I ( Spanish State Research Agency / Agencia Espanola de Investigación (AEI) / http://dx.doi.org/10.13039/501100011033 ) and the HiPEAC Network of Excellence, Peer Reviewed, Postprint (author's final draft)
- Published
- 2022
54. Functional and timing implications of transient faults in critical systems
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Kritikakou, Angeliki, Nikolaou, Panagiota, Rodríguez Ferrández, Iván, Paturel, Joseph, Kosmidis, Leonidas, Michael, Maria K., Sentieys, Olivier, Steenari, David, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Kritikakou, Angeliki, Nikolaou, Panagiota, Rodríguez Ferrández, Iván, Paturel, Joseph, Kosmidis, Leonidas, Michael, Maria K., Sentieys, Olivier, and Steenari, David
- Abstract
Embedded systems in critical domains, such as auto-motive, aviation, space domains, are often required to guarantee both functional and temporal correctness. Considering transient faults, fault analysis and mitigation approaches are implemented at various levels of the system design, in order to maintain the functional correctness. However, transient faults and their mitigation methods have a timing impact, which can affect the temporal correctness of the system. In this work, we expose the functional and the timing implications of transient faults for critical systems. More precisely, we initially highlight the timing effect of transient faults occurring in the combinational and sequential logic of a processor. Furthermore, we propose a full stack vulnerability analysis that drives the design of selective hardware-based mitigation for real-time applications. Last, we study the timing impact of software-based reliability mitigation methods applied in a COTS GPU, using a fault tolerant middleware., This work has been partially funded by ANR-FASY (ANR-21-CE25-0008-01) and received funding by ESA through the 4000136514/21/NL/GLC/my co-funded PhD activity ”Mixed Software/Hardware-based Fault-tolerance Techniques for Complex COTS System-on-Chip in Radiation Environments” and the GPU4S (GPU for Space) project. Moreover, it was partially supported by the Spanish Ministry of Economy and Competitiveness under grants PID2019-107255GB-C21 and IJC2020-045931-I (Spanish State Research Agency / http://dx.doi.org/10.13039/501100011033), by the European Union’s Horizon 2020 grant agreement No 739551 (KIOS CoE) and from the Government of the Republic of Cyprus through the Cyprus Deputy Ministry of Research, Innovation and Digital Policy., Peer Reviewed, Postprint (author's final draft)
- Published
- 2022
55. Achieving diverse redundancy for GPU Kernels
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Alcaide Portet, Sergi, Kosmidis, Leonidas, Hernández Luz, Carles, Abella Ferrer, Jaume, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Alcaide Portet, Sergi, Kosmidis, Leonidas, Hernández Luz, Carles, and Abella Ferrer, Jaume
- Abstract
Autonomous driving requires high-performance computing devices including general-purpose CPUs as well as specific accelerators, with GPUs having a key role due to their flexibility. Safety-critical microcontrollers have achieved ASIL-D compliance by implementing diverse redundancy with lockstep execution on-chip. However, a GPU does not provide diverse redundancy natively, thus failing to reach ASIL-D, which could only be reached with fully redundant lockstepped GPUs (2 GPUs) or pairing a GPU with another accelerator. However, both options may be infeasible due to procurement costs, and additional power, space and reliability costs to accomodate two devices. In this work, we present a variety of solutions to enable diverse redundant execution using only one GPU by taking advantage of the already internal redundancy of GPUs. We provide two lowly-intrusive hardware solutions and a software-only solution, with the latter evaluated directly on a real platform. In the case of the software-only solution, kernel execution on the GPU may require tailoring some parameters. With that objective, we also propose an algorithm that performs such tailoring automatically to guarantee software-only diverse redundancy on GPUs. Overall, our solutions allow achieving ASIL-D with a single GPU either with software-only solutions on a Commercial off-the-shelf GPU, or in a more efficient manner by introducing minor changes in the GPU design., Peer Reviewed, Postprint (author's final draft)
- Published
- 2022
56. Compiler support for an AI-oriented SIMD extension of a space processor
- Author
-
Barcelona Supercomputing Center, Solé i Bonet, Marc, Kosmidis, Leonidas, Barcelona Supercomputing Center, Solé i Bonet, Marc, and Kosmidis, Leonidas
- Abstract
In this on going research paper, we present our work on the compiler support for an AI-oriented SIMD Extension, called SPARROW. The SPARROW hardware design has been developed during a recently defended, awardwinning Master Thesis and is targeting Cobham Gaisler's space processors Leon3 and NOEL-V. We present the compiler support we have included in two compiler toolchains, gcc and llvm as well as a SIMD intrinsics library for easy programmability. Compiler modifications are kept to minimum in order to enable incremental qualification of the toolchains. We present our experience working with the two compilers and performance results for the two compilers on top an FPGA implementation of the target space processor., This work was funded by the Ministerio de Ciencia e Innovacion - Agencia Estatal de Investigacion (PID2019-107255GB-C21/AEI/10.13039/501100011033 and IJC-2020-045931-I) and partially supported by the European Space Agency (ESA) through the GPU4S (GPU for Space) activity and the HiPEAC Network of Excellence., Peer Reviewed, Postprint (author's final draft)
- Published
- 2022
57. Contention tracking in GPU last-level cache
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Barrera Herrera, Javier Enrique, Kosmidis, Leonidas, Tabani, Hamid, Abella Ferrer, Jaume, Cazorla Almeida, Francisco Javier, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Barrera Herrera, Javier Enrique, Kosmidis, Leonidas, Tabani, Hamid, Abella Ferrer, Jaume, and Cazorla Almeida, Francisco Javier
- Abstract
The Last-level cache (LLC) is one of the main GPU’s shared resources that contributes to improve performance but also increases individual kernel’s performance variability. This is detrimental in scenarios in which some level of performance predictability is required. While predictability can be regained by deploying cache partitioning (isolation) mechanisms, isolation negatively affects performance efficiency. This work shows that not partitioning the LLC and providing the ability to track the contention that kernels generate on each other allows them to share LLC space, hence increasing efficiency, while the system designer obtains a clear view of how each kernel affects each other in the LLC so as to balance performance and predictability goals. In this line, we propose GPU demotion counters (GDC), a low-overhead hardware mechanism to track contention that kernels generate on each other in the shared LLC., This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grants PID2019-107255GB-C21 and IJC-2020- 045931-I funded by MCIN/AEI/ 10.13039/501100011033 and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2022
58. Space compression algorithms acceleration on embedded multi-core and GPU platforms
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Jover Álvarez, Álvaro, Rodríguez Ferrández, Iván, Kosmidis, Leonidas, Steenari, David, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Jover Álvarez, Álvaro, Rodríguez Ferrández, Iván, Kosmidis, Leonidas, and Steenari, David
- Abstract
Future space missions will require increased on-board computing power to process and compress massive amounts of data. Consequently, embedded multi-core and GPU platforms are considered, which have been shown beneficial for data processing. However, the acceleration of data compression - an inherently sequential task - has not been explored. In this on-going research paper, we parallelize two space compression standards on both CPUs and GPUs using two candidate embedded GPU platforms for space showing that despite the challenging nature of CCSDS algorithms, their parallelization is possible and can provide significant performance benefits., This work was funded by the Ministerio de Ciencia e Innovacion - Agencia Estatal de Investigacion (PID2019-107255GBC21/AEI/10.13039/501100011033 and IJC-2020-045931-I) and partially supported by the European Space Agency (ESA) through the GPU4S (GPU for Space) activity and the HiPEAC Network of Excellence., Peer Reviewed, Postprint (author's final draft)
- Published
- 2022
59. SPARROW: A low-cost hardware/software co-designed SIMD microarchitecture for AI operations in space processors
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Solé i Bonet, Marc, Kosmidis, Leonidas, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Solé i Bonet, Marc, and Kosmidis, Leonidas
- Abstract
Recently there is an increasing interest in the use of artificial intelligence for on-board processing as indicated by the latest space missions, which cannot be satisfied by existing low-performance space-qualified processors. Although COTS AI accelerators can provide the required performance, they are not designed to meet space requirements. In this work, we co-design a low-cost SIMD micro-architecture integrated in a space qualified processor, which can significantly increase its performance. Our solution has no impact on the processor's 100 MHz frequency and consumes minimal area thanks to its innovative design compared to conventional vector micro-architectures. For the minimum configuration of our baseline space processor, our results indicate a performance boost of up to 9.3× for commonly used AI-related and image processing algorithms and 5.5× faster for a complex, space-relevant inference application with just 30% area increase., This work was supported by ESA through the GPU4S (GPU for Space) project, the Spanish Ministry of Economy and Competitiveness under grants PID2019- 107255GB and FJCI-2017-34095 (Spanish State Research Agency / http://dx.doi.org/10.13039/501100011033), the European Commission’s Horizon 2020 programme under the UP2DATE project (grant agreement 871465), the HiPEAC Network of Excellence and a first prize in Xilinx’s University Open Hardware Competition 2021 in the student category., Peer Reviewed, Postprint (author's final draft)
- Published
- 2022
60. Real-time high-performance computing for embedded control systems
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Kosmidis, Leonidas, Nicolás Ramírez, Carlos Fernando, Calderón Torres, Alejandro Josué, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Kosmidis, Leonidas, Nicolás Ramírez, Carlos Fernando, and Calderón Torres, Alejandro Josué
- Abstract
The real-time control systems industry is moving towards the consolidation of multiple computing systems into fewer and more powerful ones, aiming for a reduction in size, weight, and power. The increasing demand for higher performance in other critical domains like autonomous driving has led the industry to recently include embedded GPUs for the implementation of advanced functionalities. The highly parallel architecture of GPUs could also be leveraged in the control systems industry to develop more advanced, energy-efficient, and scalable control systems. However, the closed-source and non-deterministic nature of GPUs complicates the resource provisioning analysis required for the implementation of critical real-time systems. On the other hand, there is no indication of the integration of GPUs in the traditional development cycle of control systems, which is oriented to the use of a model-based design approach. Recently, some model-based design tools vendors have extended their development frameworks with GPU code generation capabilities targeting hybrid computing platforms, so that the model-based design environment now enables the concurrent analysis of more complex and diverse functions by simulation and automating the deployment to the final target. However, there is no indication whether these tools are well-suited for the design and development of time-sensitive systems. Motivated by these challenges, in this thesis, we contribute to the state of the art of real-time control systems towards the adoption of embedded GPUs by providing tools to facilitate the resource provisioning analysis and the integration in the model-based design development cycle. First, we present a methodology and an automated tool to extract the properties of GPU memory allocators. This tool allows the computation of the real amount of memory used by GPU applications, facilitating a correct resource provisioning analysis. Then, we present a library which allows the characterization of, La industria de los sistemas de control en tiempo real avanza hacia la consolidación de múltiples sistemas informáticos en menos y más potentes sistemas, con el objetivo de reducir el tamaño, el peso y el consumo. La creciente demanda de un mayor rendimiento en otros dominios críticos, como la conducción autónoma, ha llevado a la industria a incluir recientemente GPU embebidas para la implementación de funcionalidades avanzadas. La arquitectura altamente paralela de las GPU también podría aprovecharse en la industria de los sistemas de control para desarrollar sistemas de control más avanzados, eficientes energéticamente y escalables. Sin embargo, la naturaleza privativa y no determinista de las GPUs complica el análisis de aprovisionamiento de recursos requerido para la implementación de sistemas críticos en tiempo real. Por otro lado, no hay indicios de la integración de las GPU en el ciclo de desarrollo tradicional de los sistemas de control, que está orientado al uso de un enfoque de diseño basado en modelos. Recientemente, algunos proveedores de herramientas de diseño basado en modelos han ampliado sus entornos de desarrollo con capacidades de generación de código de GPU dirigidas a plataformas informáticas híbridas, de modo que el entorno de diseño basado en modelos ahora permite el análisis simultáneo de funciones más complejas y diversas mediante la simulación y la automatización de la implementación para el objetivo final. Sin embargo, no hay indicación de si estas herramientas son adecuadas para el diseño y desarrollo de sistemas sensibles al tiempo. Motivados por estos desafíos, en esta tesis contribuimos al estado del arte de los sistemas de control en tiempo real hacia la adopción de GPUs integradas al proporcionar herramientas para facilitar el análisis de aprovisionamiento de recursos y la integración en el ciclo de desarrollo de diseño basado en modelos. Primero, presentamos una metodología y una herramienta automatizada para extraer las propiedades, Postprint (published version)
- Published
- 2022
61. Sources of single event effects in the NVIDIA Xavier SoC family under proton irradiation
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Rodríguez Ferrández, Iván, Tali, Maris, Kosmidis, Leonidas, Rovituso, Marta, Steenari, David, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Rodríguez Ferrández, Iván, Tali, Maris, Kosmidis, Leonidas, Rovituso, Marta, and Steenari, David
- Abstract
In this paper we characterise two embedded GPU devices from the NVIDIA Xavier family System-on-Chip (SoC) using a proton beam. We compare the NVIDIA Xavier NX and Industrial devices, that respectively target commercial and automotive applications. We evaluate the Single-Event Effect (SEE) rate of both modules and their sub-components, both the CPU and GPU, using different power modes, and we try for the first time to identify their exact sources using the on-line testing facilities included in their ARM based system. Our conclusion is that the most sensitive part of the CPU complex of the SoC is the tag array of the various cache structures, while no errors were observed in the GPU, probably because of its fast execution compared to the CPU part of the application during the radiation campaign., This work was supported by ESA through the 4000136514/21/NL/GLC/my co-funded PhD activity ”Mixed Software/Hardware-based Fault-tolerance Techniques for Complex COTS System-on-Chip in Radiation Environments” and the GPU4S (GPU for Space) project. Moreover, it was partially supported by the Spanish Ministry of Economy and Competitiveness under grants PID2019-107255GB-C21 and IJC2020-045931-I (Spanish State Research Agency / http://dx.doi.org/10.13039/501100011033) and the HiPEAC Network of Excellence., Peer Reviewed, Postprint (author's final draft)
- Published
- 2022
62. SPARROW: A low-cost hardware/software co-designed SIMD microarchitecture for AI operations in space processors
- Author
-
Bonet, Marc Sole, Kosmidis, Leonidas, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, and Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
- Subjects
Hardware ,Image processing ,Neon ,Microcontroladors ,Microcontrollers ,Microarchitecture ,Program processors ,Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC] ,Space missions - Abstract
Recently there is an increasing interest in the use of artificial intelligence for on-board processing as indicated by the latest space missions, which cannot be satisfied by existing low-performance space-qualified processors. Although COTS AI accelerators can provide the required performance, they are not designed to meet space requirements. In this work, we co-design a low-cost SIMD micro-architecture integrated in a space qualified processor, which can significantly increase its performance. Our solution has no impact on the processor's 100 MHz frequency and consumes minimal area thanks to its innovative design compared to conventional vector micro-architectures. For the minimum configuration of our baseline space processor, our results indicate a performance boost of up to 9.3× for commonly used AI-related and image processing algorithms and 5.5× faster for a complex, space-relevant inference application with just 30% area increase. This work was supported by ESA through the GPU4S (GPU for Space) project, the Spanish Ministry of Economy and Competitiveness under grants PID2019- 107255GB and FJCI-2017-34095 (Spanish State Research Agency / http://dx.doi.org/10.13039/501100011033), the European Commission’s Horizon 2020 programme under the UP2DATE project (grant agreement 871465), the HiPEAC Network of Excellence and a first prize in Xilinx’s University Open Hardware Competition 2021 in the student category.
- Published
- 2022
63. SPARROW: A Low-Cost Hardware/Software Co-designed SIMD Microarchitecture for AI Operations in Space Processors
- Author
-
Bonet, Marc Sole, primary and Kosmidis, Leonidas, additional
- Published
- 2022
- Full Text
- View/download PDF
64. DO-178C Certification of General-Purpose GPU Software: Review of Existing Methods and Future Directions
- Author
-
Maria Trompouki, Matina, primary and Kosmidis, Leonidas, additional
- Published
- 2021
- Full Text
- View/download PDF
65. OBPMark (On-Board Processing Benchmarks) – Open Source Computational Performance Benchmarks for Space Applications
- Author
-
Steenari, David, Kosmidis, Leonidas, Rodriguez-Ferrandez, Ivan, Jover-Alvarez, Alvaro, and F��rster, Kyra
- Subjects
obdp2021 ,obdp ,on-board processing - Abstract
Computational benchmarking of on-board processing performance for space applications has often been done in a case-to-case basis, taking into account only a small subset of devices and specific, often proprietary, applications, limiting domain coverage and reproducibility. While commercial benchmarks exists for embedded systems, they are usually limited to CPUs and are based on synthetic algorithms non-relevant for space. Consequently, they are not generally suitable for assessing highly parallel processors (GPUs, DSPs, etc.) and/or hardware implementations (i.e. ASICs and FPGAs) which are commonplace in space systems. For on-board processing, there are a number of application types which reoccur over multiple missions. These applications and algorithms are often driving the overall computational requirements of the mission, e.g. in the case of image and radar processing, RF signal processing and compression. In each case, there are certain performance metrics – such as the number of pixels processed per second – which are well-known and easily understandable by designers and users. Finally, with the rise of machine learning applications in on-board space applications, tasks such as image classification and object detection using SVMs and CNNs are becoming commonly used. OBPMark (On-Board Processing Benchmarks) defines a set of benchmarks covering the typical classes of applications commonly found on-board spacecraft. The benchmark suite is publicly available to enable easy comparison of different systems and to quickly down-select possible processing solutions for a mission. It is open source and includes multiple implementations, while it is easily extensible allowing porting and optimization to target platforms, including heterogeneous ones, for fair comparison. Currently, implementations in standard C, OpenMP, OpenCL and CUDA are included. A technical note, defining the algorithms used is also provided to allow implementers to provide additional dedicated versions, including reference inputs and outputs for correctness verification as well as an optional automated launching framework for reproducibility. This also allows the benchmarks to be implemented in FPGAs, while ensuring equivalence with the reference implementations. Five categories of benchmarks are defined 1) Image Processing Pipelines; 2) Standard Compression Algorithms; 3) Standard Encryption Algorithms; 4) Processing Building Blocks; and 5) Machine Learning Inference. In each category, specific benchmarks are included, e.g. both image and radar image compression. Recommended parameters for the CCSDS compression standards 121.0, 122.0 and 123.0 are provided. The processing building blocks include e.g. FIR filters and FFT processing. Two ML applications have been chosen: cloud screening and ship detection. Both will be provided as standard pre-trained machine learning models, both floating point and quantized integer models – to allow support for multiple microarchitectures. The specification of OBPMark has been initiated by ESA together with BSC as an open source project to allow transparent and open performance comparison of devices and systems. The project will also maintain a list of available benchmark results on its open repository. The work has been carried out both internally at ESA, and at BSC through the on-going ESA-funded GPU4S activity, whose optimised versions of algorithmic building blocks implemented in the open source GPU4S Bench benchmarking suite were used as a basis.
- Published
- 2021
- Full Text
- View/download PDF
66. GPU4S (GPUs for Space): Are we there yet?
- Author
-
Kosmidis, Leonidas, Rodriguez-Ferrandez, Iván, Jover-Alvarez, Alvaro, Cabo, Guillem, Alcaide, Sergi, Lachaize, Jérôme, Notebaert, Olivier, Certain, Antoine, and Steenari, David
- Subjects
obdp2021 ,obdp ,on-board processing - Abstract
In this contribution, we provide an overview of the results and lessons learnt from the on-going ESA-funded GPU4s project (GPU for Space) performed by the BSC as a prime and ADS as subcontractor. Embedded GPUs can provide significant computational power at a low-power for large amounts of data, allowing the use of software for on-board processing. They allow more flexibility, easier reconfiguration compared to FPGAs and can support several different processing tasks through reuse of compute resources. Moreover, they can leverage an abundance of specialised developers, familiar with widely-used programming models, resulting in an overall lower cost. The purpose of this exploratory project is to address the increased needs for on-board processing performance of future missions, exploring the possibility of using embedded GPUs in space and studying the initial steps required for their adoption. In particular, our goal is the evaluation of GPU IP for possible future space processors as well as the evaluation of COTS GPUs. We performed a survey of existing and future algorithms used in space across all divisions of ADS, to identify which domains expect higher needs for performance and whether their algorithms have good characteristics for GPU parallelisation. We concluded that most space algorithms are a good fit for the GPU programming model, something we confirmed also experimentally later. In another survey, we studied the available hardware solutions and their software ecosystem. We focused on embedded GPU IPs from European providers, to identify the most appropriate one for a radiation-hardened implementation in an ASIC or FPGA in the long term. Additionally, we covered the most important embedded COTS GPU solutions, to identify the most appropriate one for lower cost, short-term adoption. We also expanded our survey to open source IP and GPU-like solutions. From this extensive coverage we selected to benchmark a set of embedded GPUs. For this, we have defined GPU4S Bench [1], an open source embedded GPU benchmarking suite, consisting of algorithmic building blocks from multiple space domains, identified in our space survey. GPU4S Bench provides also the basis and optimised implementations of these algorithms for GPUs and Multi-core CPUs used in ESA’s OBPMark, an open source benchmarking suite for general on-board processing devices. In addition to these benchmarks, we ported complex space applications, such as the Euclid NIR, the image processing and CCSDS compression benchmarks from OBPMark, demonstrating that GPUs can benefit significantly existing and mainly future space processing, in terms of performance and power consumption, including efficiency. In our contribution we will present a summary of the obtained results. Finally, we identified issues such as radiation effect mitigation, thermal management and procurement of GPU devices, which need to be addressed for the adoption of GPUs in space, we proposed potential solutions and defined a roadmap. Overall, our conclusion is that embedded GPUs have a high potential for providing the performance needs of future missions, and can significantly reduce the cost, while offering new capabilities. [1] GPU4S Bench: Design and Implementation of an Open GPU Benchmarking Suite for Space On-board Processing: https://www.ac.upc.edu/app/research-reports/public/html/research_center_index-CAP-2019,en.html
- Published
- 2021
- Full Text
- View/download PDF
67. DO-178C certification of general-purpose GPU software: review of existing methods and future directions
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Trompouki, Matina Maria, Kosmidis, Leonidas, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Trompouki, Matina Maria, and Kosmidis, Leonidas
- Abstract
—General-Purpose GPU software is considered for use in avionics to satisfy the increased computational requirements of future systems. Therefore, it needs to be certified following the DO-178C guidance as all airborne software. In this work, we review the existing methods in the literature, we analyse their advantages and disadvantages, and we discuss how they can be combined to obtain certification with lower effort and cost. Our focus is restricted on application-level software, under the premise that successful completion of verification of avionics graphics GPU software products has been demonstrated, so their GPU compiler has been considered acceptable for these already DO-178C certified products, or existing qualified GPU compilers exist. Finally, we discuss upcoming solutions for certified general purpose GPU computing ., This work was performed within the Airbus TANIAGPU Project ADS (E/200) in collaboration with the project partners Airbus Defence and Space, Madrid, Spain and CoreAVI, Canada. It was also partially supported by the European Space Agency (ESA) through the GPU4S (GPU for Space) activity, the Spanish Ministry of Economy and Competitiveness under grants PID2019-107255GB and FJCI-2017-34095 (Spanish State Research Agency / http://dx.doi.org/10.13039/501100011033) and the HiPEAC Network of Excellence., Peer Reviewed, Postprint (author's final draft)
- Published
- 2021
68. Assessing and improving the suitability of model-based design for GPU-accelerated railway control systems
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Calderón Torres, Alejandro Josué, Kosmidis, Leonidas, Nicolás Ramírez, Carlos Fernando, Lasala, Javier de, Larrañaga, Ion, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Calderón Torres, Alejandro Josué, Kosmidis, Leonidas, Nicolás Ramírez, Carlos Fernando, Lasala, Javier de, and Larrañaga, Ion
- Abstract
Model-Based Design (MBD) is widely used for the design and simulation of electric traction control systems in the railway industry. Moreover, similar to other transportation industries, railway is moving towards the consolidation of multiple computing systems on fewer and more powerful ones, aiming for the reduction of Size, Weight and Power (SWaP). In that regard, Graphics Processing Units (GPUs) are increasingly considered by critical systems engineers, seeking to satisfy their ever increasing performance requirements. Recently, MBD tools have been enhanced with GPU code generation capabilities for machine learning acceleration, however, there is no indication whether these tools are ready for the design of time-sensitive systems. In this paper we analyse the suitability of commercial MBD toolsets by designing and deploying a model-based parallel control case study on embedded GPU platforms. While our results show promising feasibility evidence, they also reveal shortcomings which should be addressed before these toolsets become fit for developing critical systems. We propose certain improvements that have to be incorporated in these tools to achieve this goal. By implementing our proposals in the generated code, we experimentally show their effectiveness on two NVIDIA-based embedded GPUs., This work was partially supported by the European Commission’s Horizon 2020 programme under the UP2DATE project (grant agreement 871465), by the Spanish Ministry of Economy and Competitiveness under grants PID2019-107255GB and FJCI-2017-34095 and the HiPEAC Network of Excellence., Peer Reviewed, Postprint (author's final draft)
- Published
- 2021
69. Security, reliability and test aspects of the RISC-V ecosystem
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Abella Ferrer, Jaume, Alcaide Portet, Sergi, Anders, Jens, Bas Jalón, Francisco, Becker, Steffen, De Mulder, Elke, Elhamawy, Nourhan, Gürkaynak, Frank K., Handschuh, Helena, Hernández Luz, Carles, Hutter, Mike, Kosmidis, Leonidas, Polian, Ilia, Sauer, Matthias, Wagner, Stefan, Regazzoni, Francesco, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Abella Ferrer, Jaume, Alcaide Portet, Sergi, Anders, Jens, Bas Jalón, Francisco, Becker, Steffen, De Mulder, Elke, Elhamawy, Nourhan, Gürkaynak, Frank K., Handschuh, Helena, Hernández Luz, Carles, Hutter, Mike, Kosmidis, Leonidas, Polian, Ilia, Sauer, Matthias, Wagner, Stefan, and Regazzoni, Francesco
- Abstract
RISC-V has emerged as a viable solution on academia and industry. However, to use open source hardware for safety-critical applications, we need a deep understanding of the way in which well established mechanisms for testing and reliability could be integrated and deployed on the RISCV ecosystem, and we need a clear knowledge on how such an ecosystem can be leveraged to improve security. This paper includes four contributions presenting the potential of RISC-V in security research, the way in which RISC-V can be hardened against power analysis attacks, how to implement, using RISCV, software and hardware/software solutions for dual core lock step, and how to perform system-level testing in the RISC-V ecosystem., Peer Reviewed, Postprint (published version)
- Published
- 2021
70. Comparison of GPU computing methodologies for safety-critical systems: an avionics case study
- Author
-
Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Benito Bermúdez, Marc, Trompouki, Matina Maria, Kosmidis, Leonidas, García Martín, David, Carretero Jiménez, Sergio, Wenger, Ken, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Benito Bermúdez, Marc, Trompouki, Matina Maria, Kosmidis, Leonidas, García Martín, David, Carretero Jiménez, Sergio, and Wenger, Ken
- Abstract
Introducing advanced functionalities in safety-critical systems requires using more powerful architectures such as GPUs. However software in safety-critical industries is subject to functional certification, which cannot be achieved using standard GPU programming languages such as CUDA and OpenCL. Fortunately, GPUs are already used in certified critical systems for display tasks, using safety-certified solutions such as OpenGL SC 2.0. In this paper, we compare two state-of-the art graphics-based methodologies, OpenGL SC 2.0 and Brook Auto/BRASIL for the implementation of a prototype avionics case study. We evaluate both methods on a realistic industrial setup, composed by an avionics-grade GPU and a safety-certified GPU driver in terms of development metrics and performance, showing their feasibility., This work was funded by the Airbus TANIA-GPU Project ADS (E/200). It was also partially supported by the Spanish Ministry of Economy and Competitiveness under grants PID2019-107255GB and FJCI-2017-34095 and HiPEAC., Peer Reviewed, Postprint (author's final draft)
- Published
- 2021
71. Evaluation of the parallel computational capabilities of embedded platforms for critical systems
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Kosmidis, Leonidas, Jover-Alvarez, Alvaro, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Kosmidis, Leonidas, and Jover-Alvarez, Alvaro
- Abstract
Modern critical systems need higher performance which cannot be delivered by the simple architectures used so far. Latest embedded architectures feature multi-cores and GPUs, which can be used to satisfy this need. In this thesis we parallelise relevant applications from multiple critical domains represented in the GPU4S benchmark suite, and perform a comparison of the parallel capabilities of candidate platforms for use in critical systems. In particular, we port the open source GPU4S Bench benchmarking suite in the OpenMP programming model, and we benchmark the candidate embedded heterogeneous multi-core platforms of the H2020 UP2DATE project, NVIDIA TX2, NVIDIA Xavier and Xilinx Zynq Ultrascale+, in order to drive the selection of the research platform which will be used in the next phases of the project. Our result indicate that in terms of CPU and GPU performance, the NVIDIA Xavier is the highest performing platform.
- Published
- 2021
72. Hardware-software co-design for low-cost AI processing in space processors
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Kosmidis, Leonidas, Solé i Bonet, Marc, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Kosmidis, Leonidas, and Solé i Bonet, Marc
- Abstract
In the recent years there has been an increasing interest in artificial intelligence (AI) and machine learning (ML). The advantages of such applications are widespread across many areas and have drawn the attention of different sectors, such as aerospace. However, these applications require much more performance than the one provided by space processors. In space the environment is not ideal for high-performance cutting-edge processors, due to radiation. For this reason, radiation hardened or radiation tolerant processors are required, which use older technologies and redundant logic, reducing the available die resources that can be exploited. In order to accelerate demanding AI applications in space processors, this thesis presents SPARROW, a low-cost SIMD accelerator for AI operations. SPARROW has been designed following a hardware-software co-design approach by analyzing the requirements of common AI applications in order to improve the efficiency of the module. The design of such module does not use any existing vector extension and instead has in its portability one of the key advantages over other implementations. Furthermore, SPARROW reuses the integer register file of the processor avoiding complex managing of the data while reducing significantly the hardware cost of the module, which is specially interesting in the space domain due to the constraints in the processor area. SPARROW operates with 8-bit integer vector components in two different stages, performing parallel computations in the first and reduction operations in the second. This design is integrated within the baseline processor not requiring any additional pipeline stage nor a modification of the processor frequency. SPARROW also includes swizzling and masking capabilities for the input vectors as well as saturation to work with 8 bits without overflow. SPARROW has been integrated with the LEON3 and NOEL-V space-grade processors, both distributed by Cobham Gaisler. Since each of the baseline pr
- Published
- 2021
73. The UP2DATE baseline research platforms
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Jover Álvarez, Álvaro, Calderón Torres, Alejandro Josué, Rodríguez Ferrández, Iván, Kosmidis, Leonidas, Asifuzzaman, Kazi, Uven, Patrick, Gruttner, Kim, Poggi, Tomaso, Agirre, Irune, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Jover Álvarez, Álvaro, Calderón Torres, Alejandro Josué, Rodríguez Ferrández, Iván, Kosmidis, Leonidas, Asifuzzaman, Kazi, Uven, Patrick, Gruttner, Kim, Poggi, Tomaso, and Agirre, Irune
- Abstract
The UP2DATE H2020 project focuses on highperformance heterogeneous embedded platforms for critical systems. We will develop observability and controllability solutions to support online updates while ensuring safety and security for mixed-criticality tasks. In this paper, we describe the rationale behind the selection of the baseline research platforms which will be used to develop and demonstrate the project concepts, including a performance comparison to identify the most efficient one., This work is funded by the European Commission’s Horizon 2020 programme under the UP2DATE project (grant agreement 871465). It is also partially supported by the Spanish Ministry of Economy and Competitiveness under grants PID2019-107255GB and FJCI-2017-34095 and HiPEAC., Peer Reviewed, Postprint (author's final draft)
- Published
- 2021
74. GPU4S: Major project outcomes, lessons learnt and way forward
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Kosmidis, Leonidas, Rodríguez Ferrández, Iván, Jover Álvarez, Álvaro, Alcaide Portet, Sergi, Lachaize, Jérôme, Notebaert, Olivier, Certain, Antoine, Steenari, David, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Kosmidis, Leonidas, Rodríguez Ferrández, Iván, Jover Álvarez, Álvaro, Alcaide Portet, Sergi, Lachaize, Jérôme, Notebaert, Olivier, Certain, Antoine, and Steenari, David
- Abstract
Embedded GPUs have been identified from both private and government space agencies as promising hardware technologies to satisfy the increased needs of payload processing. The GPU4S (GPU for Space) project funded from the European Space Agency (ESA) has explored in detail the feasibility and the benefit of using them for space workloads. Currently at the closing phases of the project, in this paper we describe the main project outcomes and explain the lessons we learnt. In addition, we provide some guidelines for the next steps towards their adoption in space., This work is funded by ESA under the GPU4S (GPU for Space) project (ITT AO/1-9010/17/NL/AF). It is also partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grants PID2019-107255GB and FJCI-2017-34095 and HiPEAC., Peer Reviewed, Postprint (author's final draft)
- Published
- 2021
75. On the definition of resource sharing levels to understand and control the impact of contention in multicore processors
- Author
-
Barcelona Supercomputing Center, Mezzetti, Enrico, Abella Ferrer, Jaume, Cazorla Almeida, Francisco Javier, Tabani, Hamid, Kosmidis, Leonidas, Barcelona Supercomputing Center, Mezzetti, Enrico, Abella Ferrer, Jaume, Cazorla Almeida, Francisco Javier, Tabani, Hamid, and Kosmidis, Leonidas
- Abstract
The trend toward the adoption of a multiprocessor system on a chip (MPSoC) in critical real-time domains, like avionics or automotive, responds to the demand for increased computing performance to support advanced software functionalities. The other side of the coin is that MPSoCs challenge software timing analysis. This is so as co-running applications affect each other’s timing behavior on account of the interference incurred when accessing shared hardware resources, with the latter steadily increasing in number and complexity in every new generation of MPSoCs. For a solid and cost-contained software-timing validation approach, we contend that a taxonomy has to be developed to capture the different levels at which processors’ resources can be shared. Those levels are to be related to the conventional run-time software abstractions (e.g., task, thread, runnable) and the particular abstraction used to carry out contention analysis. From the standpoint of contention analysis, only the resources in those levels shared by the different run-time software entities need to be mastered and addressed by timing analysis, whereas the remaining resources can be safely disregarded. We tailor this approach to two of NVIDIA’s embedded platforms, TX2 and AGX Xavier, of particular relevance for the automotive domain. For the identified shared resources, we also characterize the contention that tasks can suffer and discuss the limitations and early approaches for modeling timing interference in shared hardware resources., This work has been partially supported by the SpanishMinistry of Science and Innovation under grants PID2019-107255GB and FJCI-2017 -34095; and the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 878752 (MASTECS) and the European Research Council (ERC) grant agreement No. 772773 (SuPerCom)., Peer Reviewed, Postprint (published version)
- Published
- 2021
76. An On-board Algorithm Implementation on an Embedded GPU: A Space Case Study
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Kosmidis, Leonidas, Rodríguez Ferrandez, Ivan, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Kosmidis, Leonidas, and Rodríguez Ferrandez, Ivan
- Abstract
On-board processing requirements of future space missions are constantly increasing, calling for new hardware than the traditional ones used in space. Embedded GPUs are an attractive candidate offering both high performance capabilities and low power consumption, but there are no complex industrial case studies from the space domain demonstrating these advantages. In this Master Thesis we present the GPU parallelization of an on-board algorithm in multiple GPU programming languages, as well as its performance and energy efficiency on a selection of promising embedded GPU COTS platforms.
- Published
- 2021
77. On the Definition of Resource Sharing Levels to Understand and Control the Impact of Contention in Multicore Processors
- Author
-
Tabani, Hamid, primary, Kosmidis, Leonidas, additional, Mezzetti, Enrico, additional, Abella, Jaume, additional, and Cazorla, Francisco J., additional
- Published
- 2021
- Full Text
- View/download PDF
78. Security, Reliability and Test Aspects of the RISC-V Ecosystem
- Author
-
Abella, Jaume, primary, Alcaide, Sergi, additional, Anders, Jens, additional, Bas, Francisco, additional, Becker, Steffen, additional, De Mulder, Elke, additional, Elhamawy, Nourhan, additional, Gurkaynak, Frank K., additional, Handschuh, Helena, additional, Hernandez, Carles, additional, Hutter, Mike, additional, Kosmidis, Leonidas, additional, Polian, Ilia, additional, Sauer, Matthias, additional, Wagner, Stefan, additional, and Regazzoni, Francesco, additional
- Published
- 2021
- Full Text
- View/download PDF
79. Generating and Exploiting Deep Learning Variants to Increase Utilization of the Heterogeneous Resources in Autonomous Driving Platforms
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Hamid Tabani, Kosmidis, Leonidas, Pujol Torramorell, Roger, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Hamid Tabani, Kosmidis, Leonidas, and Pujol Torramorell, Roger
- Abstract
Nowadays, Deep learning-based solutions and, in particular, deep neural networks (DNNs) are getting into several core functionalities in critical real-time embedded systems (CRTES), like those in planes, cars, and satellites, from vision-based perception (object detection and object tracking) systems to trajectory planning. As a result, several deep learning instances are running simultaneously at any time on the same computing platform. However, while modern computing platforms offer a variety of computing elements (e.g., CPUs, GPUs, and specific accelerators) in which those DNN instances can be executed depending on their computational requirements and temporal constraints. Currently, most DNNs are mainly programmed to exploit one particular computing element, regular cores of the GPUs. This lack of variety causes a resource imbalance and under-utilization of the various computing element resources when executing several DNN instances, causing an increase in DNN tasks' execution time requirements. In this Thesis, (a) we develop different variants (implementation) of well-known DNN libraries used in the Apollo Autonomous Driving software for each of the computing elements of the latest NVIDIA Xavier system-on-chip. Each variant is configured to balance resource requirements and performance: the regular CPU core implementation that can run on 2, 4, and 6 cores (always leaving 2 cores free for other computations); the GPU with regular and Tensor cores variants that can run on 4 or 8 GPU's Stream Multiprocessors (SM); and 1 or 2 NVIDIA's Deep Learning Accelerators (NVDLA); (b) we show that each particular variant/configuration offers different resource utilization/performance point. (c) we show how those heterogeneous computing elements can be exploited by a static scheduler to sustain the execution of multiple and diverse DNN variants on the same platform.
- Published
- 2020
80. An academic RISC-V silicon implementation based on open-source components
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Universitat Politècnica de Catalunya. HIPICS - Grup de Circuits i Sistemes Integrats d'Altes Prestacions, Abella Ferrer, Jaume, Bulla, Calvin, Cabo Pitarch, Guillem, Cazorla Almeida, Francisco Javier, Cristal Kestelman, Adrián, Doblas Font, Max, Figueras Bagué, Roger, González Trejo, Alberto, Hernández Luz, Carles, Hernández Calderón, César Alejandro, Jiménez Arador, Víctor, Kosmidis, Leonidas, Kostalampros, Ioannis-Vatistas, Langarita Benítez, Rubén, Leyva Santes, Neiel, López Paradís, Guillem, Marimon Illana, Joan, Martínez Martínez, Ricardo, Mendoza Escobar, Jonnatan, Moll Echeto, Francisco de Borja, Moretó Planas, Miquel, Pavón Rivera, Julián, Ramírez Lazo, Cristóbal, Ramírez Salinas, Marco Antonio, Rojas Morales, Carlos, Rubio Sola, Jose Antonio, Ruiz, Abraham Josafat, Sonmez, Nehir, Soria Pardos, Víctor, Teres Teres, Lluis, Unsal, Osman Sabri, Valero Cortés, Mateo, Vargas Valdivieso, Iván, Villa Vargas, Luis Alfonso, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Universitat Politècnica de Catalunya. HIPICS - Grup de Circuits i Sistemes Integrats d'Altes Prestacions, Abella Ferrer, Jaume, Bulla, Calvin, Cabo Pitarch, Guillem, Cazorla Almeida, Francisco Javier, Cristal Kestelman, Adrián, Doblas Font, Max, Figueras Bagué, Roger, González Trejo, Alberto, Hernández Luz, Carles, Hernández Calderón, César Alejandro, Jiménez Arador, Víctor, Kosmidis, Leonidas, Kostalampros, Ioannis-Vatistas, Langarita Benítez, Rubén, Leyva Santes, Neiel, López Paradís, Guillem, Marimon Illana, Joan, Martínez Martínez, Ricardo, Mendoza Escobar, Jonnatan, Moll Echeto, Francisco de Borja, Moretó Planas, Miquel, Pavón Rivera, Julián, Ramírez Lazo, Cristóbal, Ramírez Salinas, Marco Antonio, Rojas Morales, Carlos, Rubio Sola, Jose Antonio, Ruiz, Abraham Josafat, Sonmez, Nehir, Soria Pardos, Víctor, Teres Teres, Lluis, Unsal, Osman Sabri, Valero Cortés, Mateo, Vargas Valdivieso, Iván, and Villa Vargas, Luis Alfonso
- Abstract
©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works., The design presented in this paper, called preDRAC, is a RISC-V general purpose processor capable of booting Linux jointly developed by BSC, CIC-IPN, IMB-CNM (CSIC), and UPC. The preDRAC processor is the first RISC-V processor designed and fabricated by a Spanish or Mexican academic institution, and will be the basis of future RISC-V designs jointly developed by these institutions. This paper summarizes the design tasks, for FPGA first and for SoC later, from high architectural level descriptions down to RTL and then going through logic synthesis and physical design to get the layout ready for its final tapeout in CMOS 65nm technology., The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politecnico Nacional (IPN) ´ from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
81. UP2DATE: Safe and secure over-the-air software updates on high-performance mixed-criticality systems
- Author
-
Barcelona Supercomputing Center, Agirre, Irene, Onaindia, Peio, Poggi, Tomasso, Yarza, Irune, Cazorla Almeida, Francisco Javier, Kosmidis, Leonidas, Grüttner, Kim, Abuteir, Mohammed, Loewe, Jan, Orbegozo, Juan M., Botta, Stefania, Barcelona Supercomputing Center, Agirre, Irene, Onaindia, Peio, Poggi, Tomasso, Yarza, Irune, Cazorla Almeida, Francisco Javier, Kosmidis, Leonidas, Grüttner, Kim, Abuteir, Mohammed, Loewe, Jan, Orbegozo, Juan M., and Botta, Stefania
- Abstract
Following the same trend of consumer electronics, safety-critical industries are starting to adopt Over-The-Air Software Updates (OTASU) on their embedded systems. The motivation behind this trend is twofold. On the one hand, OTASU offer several benefits to the product makers and users by improving or adding new functionality and services to the product without a complete redesign. On the other hand, the increasing connectivity trend makes OTASU a crucial cyber-security demand to download latest security patches. However, the application of OTASU in the safety-critical domain is not free of challenges, specially when considering the dramatic increase of software complexity and the resulting high computing performance demands. This is the mission of UP2DATE, a recently launched project funded within the European H2020 programme focused on new software update architectures for heterogeneous high-performance mixed-criticality systems. This paper gives an overview of UP2DATE and its foundations, which seeks to improve existing OTASU solutions by considering safety, security and availability from the ground up in an architecture that builds around composability and modularity., The research presented throughout this paper has received funding from the European Community’s Horizon 2020 programme under the UP2DATE project (grant agreement 871465)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
82. Modeling contention interference in crossbar-based systems via sequence-aware pairing (SeAP)
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Kosmidis, Leonidas, Mezzet, Enrico, Giesen León, Jeremy Jens, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Kosmidis, Leonidas, Mezzet, Enrico, and Giesen León, Jeremy Jens
- Abstract
Critical Real-time Embedded Systems encompasses an increasingly relevant class of embedded systems for which the timely execution of a functionality is as important as its functional correctness. The derivation of trustworthy timing bounds, an inescapable requirement for that class of systems, is challenged by the inherent parallelism in multicore platforms. When shifting from single-core to multicore systems, some hardware resources become shared among available cores. Under such a scenario, contention may arise when two or more cores send requests to the same hardware shared resource at the same time. Contention causes potential delays in the time required to serve each request, which in turn affects the overall execution time requirements of an application and hence its Worst-Case Execution Time (WCET). The computation of trustworthy bounds to the impact of contention in multicore systems is further challenged by the increasing complexity of modern cutting-edge multicore and manycore hardware solutions, which are increasingly adopted in the Critical Real-time Embedded Systems domain to respond to increasing computational and performance requirements. Contention bounds are required to be at the same time accounting for the worst-case scenario, and tight, avoiding unnecessary pessimism and ultimately the development costs and system over-dimensioning. Under the above considerations, this Thesis aims at improving (reducing) the bounds on contention delay when accessing shared resources. In particular, we focus on systems featuring interconnects that allow some form of parallelism such as crossbars and alike. We differentiate from state-of-the-art solutions, which only address bus-like interconnects and only exploit access counts, by exploiting information on the sequence of accesses performed by contenting tasks. Instead, we exploit the sequence of requests to the different target resources produced by each core to produce tighter bounds by discarding contention sce
- Published
- 2020
83. GPU4S: Embedded GPUs in Space - Latest project updates
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Kosmidis, Leonidas, Rodríguez Ferrandez, Iván, Jover Álvarez, Álvaro, Alcaide Portet, Sergi, Lachaize, Jérôme, Abella Ferrer, Jaume, Notebaert, Olivier, Cazorla Almeida, Francisco Javier, Steenari, David, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Kosmidis, Leonidas, Rodríguez Ferrandez, Iván, Jover Álvarez, Álvaro, Alcaide Portet, Sergi, Lachaize, Jérôme, Abella Ferrer, Jaume, Notebaert, Olivier, Cazorla Almeida, Francisco Javier, and Steenari, David
- Abstract
Following the trend of other safety-critical industries like automotive and avionics, the space domain is witnessing an increase in the on-board computing performance demands. This raise in performance needs comes from both control and payload parts of the spacecraft and calls for advanced electronics systems able to provide high computational power under the constraints of the harsh space environment. On the non-technical side, for strategic reasons it is mandatory to get European independence on the used computing technology. In this project, we study the applicability of embedded GPUs in space, which have shown a dramatic improvement of their performance per-watt ratio coming from their proliferation in consumer markets based on competitive European technology. To that end, we perform an analysis of the existing space application domains to identify which software domains can benefit from their use. Moreover, we survey the embedded GPU domain in order to assess whether embedded GPUs can provide the required computational power and identify the challenges which need to be addressed for their adoption in space. In this paper, we describe the steps followed in the project, as well as a summary of results obtained from our analyses so far in the project., This work has received funding from the the European Space Agency (ESA) under the GPU4S (GPU for Space) Project, answer to the ESA ITT AO/1-9010/17/NL/AF tender with title ”Low Power GPU Solutions For High Performance On-Board Data Processing” and from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773). This work has also been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P and the HiPEAC Network of Excellence. MINECO partially supported Leonidas Kosmidis under Juan de la Cierva Formació postdoctoral fellowship (FJCI-2017-34095) and Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013- 14717)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
84. Software-only triple diverse redundancy on GPUs for autonomous driving platforms
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Alcaide Portet, Sergi, Kosmidis, Leonidas, Hernández Luz, Carles, Abella Ferrer, Jaume, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Alcaide Portet, Sergi, Kosmidis, Leonidas, Hernández Luz, Carles, and Abella Ferrer, Jaume
- Abstract
Autonomous driving (AD) imposes the need for safe computations in high-performance computing (HPC) components such as GPUs, thus with capabilities to detect and recover from errors since a safe state may not exist anymore. This can be achieved with Triple Modular Redundancy (TMR) for computation components. Furthermore, error detection capabilities need to provide some form of diversity to avoid the case where a single fault leads all redundant executions lead to the same error, which would go undetected. In our past work, we assessed GPUs against dual modular redundancy (DMR) with diversity, showing their potential and limitations to provide diverse redundancy building on reset and restart for recovery. However, such recovery scheme may be too slow for some applications. This paper proposes a software-only solution to deliver diverse TMR on commercial off-the-shelf (COTS) GPUs. Our work details how staggered execution can be achieved and assesses the performance of TMR on COTS GPUs. Moreover, we identify those elements where diversity cannot be guaranteed and provide some discussion comparing the case of DMR and TMR for those elements., This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871467 (SELENE). Leonidas Kosmidis has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under a Juan de la Cierva Formacion postdoctoral fellowship with number FJCI-2017-34095., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
85. Timing of autonomous driving software: problem analysis and prospects for future solutions
- Author
-
Barcelona Supercomputing Center, Alcon, Miguel, Tabani, Hamid, Kosmidis, Leonidas, Mezzetti, Enrico, Abella Ferrer, Jaume, Cazorla Almeida, Francisco Javier, Barcelona Supercomputing Center, Alcon, Miguel, Tabani, Hamid, Kosmidis, Leonidas, Mezzetti, Enrico, Abella Ferrer, Jaume, and Cazorla Almeida, Francisco Javier
- Abstract
The software used to implement advanced functionalities in critical domains (e.g. autonomous operation) impairs software timing. This is not only due to the complexity of the underlying high-performance hardware deployed to provide the required levels of computing performance, but also due to the complexity, non-deterministic nature, and huge input space of the artificial intelligence (AI) algorithms used. In this paper, we focus on Apollo, an industrial-quality Autonomous Driving (AD) software framework: we statistically characterize its observed execution time variability and reason on the sources behind it. We discuss the main challenges and limitations in finding a satisfactory software timing analysis solution for Apollo and also show the main traits for the acceptability of statistical timing analysis techniques as a feasible path. While providing a consolidated solution for the software timing analysis of Apollo is a huge effort far beyond the scope of a single research paper, our work aims to set the basis for future and more elaborated techniques for the timing analysis of AD software., This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P, the SuPerCom European Research Council (ERC) project under the European Union’s Horizon 2020 research and innovation programme (grant agreement No.772773), and the HiPEAC Network of Excellence. MINECO partially supported Enrico Mezzetti under Juan de la Cierva-Incorporación postdoctoral fellowship (IJCI-2016-27396), and Leonidas Kosmidis under Juan de la Cierva-Formación postdoctoral fellowship (FJCI-2017-34095)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
86. IntPred: flexible, fast, and accurate object detection for autonomous driving systems
- Author
-
Barcelona Supercomputing Center, Tabani, Hamid, Fusi, Matteo, Kosmidis, Leonidas, Abella Ferrer, Jaume, Cazorla Almeida, Francisco Javier, Barcelona Supercomputing Center, Tabani, Hamid, Fusi, Matteo, Kosmidis, Leonidas, Abella Ferrer, Jaume, and Cazorla Almeida, Francisco Javier
- Abstract
Deep Neural-Network (DNN) based Object Detection is one of the most important and time-consuming stages of Autonomous Driving software in cars. In non-critical domains, the performance and energy requirements of object detection can be reduced at the cost of accuracy in the detected objects. This is not the case in a critical domain like automotive, for which a delicate balance between performance/energy overheads and accuracy of object detection must be found. We propose IntPred to achieve such a balance by leveraging on the fact that, with high frame rates, objects do not move significantly across frames. IntPred tailors object interpolation for the case of object detection in autonomous driving frameworks, in line with approaches devised for other domains, thus heavily reducing the performance requirements of full-fledged DNN-based object prediction. IntPred results in comparable accuracy to the original object detection, while saving more than 70% of the computations. The latter allows using lower-performance and cheaper platforms resulting in saving energy and reducing heat dissipation: for instance, in an NVIDIA Jetson TX2 platform, specific for autonomous driving systems, our technique increases the frame processing rate by 4.6x. IntPred also allows consolidating additional applications onto the same platform., This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P, the SuPerCom European Research Council (ERC) project under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773), and the HiPEACNetwork of Excellence. MINECO partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717) and Leonidas Kosmidis under Juan de la Cierva-Formación postdoctoral fellowship (FJCI-2017-34095), Postprint (author's final draft)
- Published
- 2020
87. On the use of probabilistic worst-case execution time estimation for parallel applications in high performance systems
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. VIRTUOS - Virtualisation and Operating Systems, Fusi, Matteo, Mazzocchetti, Fabio, Farres, Albert, Kosmidis, Leonidas, Canal Corretger, Ramon, Cazorla Almeida, Francisco Javier, Abella Ferrer, Jaume, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. VIRTUOS - Virtualisation and Operating Systems, Fusi, Matteo, Mazzocchetti, Fabio, Farres, Albert, Kosmidis, Leonidas, Canal Corretger, Ramon, Cazorla Almeida, Francisco Javier, and Abella Ferrer, Jaume
- Abstract
Some high performance computing (HPC) applications exhibit increasing real-time requirements, which call for effective means to predict their high execution times distribution. This is a new challenge for HPC applications but a well-known problem for real-time embedded applications where solutions already exist, although they target low-performance systems running single-threaded applications. In this paper, we show how some performance validation and measurement-based practices for real-time execution time prediction can be leveraged in the context of HPC applications on high-performance platforms, thus enabling reliable means to obtain real-time guarantees for those applications. In particular, the proposed methodology uses coordinately techniques that randomly explore potential timing behavior of the application together with Extreme Value Theory (EVT) to predict rare (and high) execution times to, eventually, derive probabilistic Worst-Case Execution Time (pWCET) curves. We demonstrate the effectiveness of this approach for an acoustic wave inversion application used for geophysical exploration, This research was funded by the Horizon 2020 Framework Programme, grant number 801137, project RECIPE, Peer Reviewed, Postprint (published version)
- Published
- 2020
88. En-route: on enabling resource usage testing for autonomous driving frameworks
- Author
-
Barcelona Supercomputing Center, Alcon, Miguel, Tabani, Hamid, Abella Ferrer, Jaume, Kosmidis, Leonidas, Cazorla Almeida, Francisco Javier, Barcelona Supercomputing Center, Alcon, Miguel, Tabani, Hamid, Abella Ferrer, Jaume, Kosmidis, Leonidas, and Cazorla Almeida, Francisco Javier
- Abstract
Software resource usage testing, including execution time bounds and memory, is a mandatory validation step during the integration of safety-related real-time systems. However, the inherent complexity of Autonomous Driving (AD) systems challenges current practice for resource usage testing. This paper exposes the difficulties to perform resource usage testing for AD frameworks by analyzing a complex and critical module of an AD framework, and provides some guidelines and practical evidence on how resource usage testing can be effectively performed, thus enabling end users to validate their safety-related real-time AD frameworks., This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P, the UP2DATE European Union’s Horizon 2020 (H2020) research and innovation programme under grant agreement No 871465, and the HiPEAC Network of Excellence. MINECO partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717) and Leonidas Kosmidis under Juan de la Cierva-Formación postdoctoral fellowship (FJCI-2017-34095), Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
89. Software-only based diverse redundancy for ASIL-D automotive applications on embedded HPC platforms
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Alcaide Portet, Sergi, Kosmidis, Leonidas, Hernández Luz, Carles, Abella Ferrer, Jaume, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Alcaide Portet, Sergi, Kosmidis, Leonidas, Hernández Luz, Carles, and Abella Ferrer, Jaume
- Abstract
©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works., High-Performance Computing (HPC) platforms become a must in automotive systems to enable autonomous driving. However, automotive platforms must avoid Common Cause Failures (CCFs), as indicated by the ISO26262 automotive safety standard. CCFs can be avoided enforcing diverse redundancy. Unfortunately, HPC platforms fail to provide such support. This paper proposes a flexible and efficient software-based scheme to implement diverse redundancy on HPC platforms. A software implementation on a Commercial Off-The-Shelf ARM multicore proves the effectiveness of this scheme to guarantee diverse redundancy with negligible performance degradation. Our solution is the first step towards an automotive-compliant HPC platform., This work has been supported by the Spanish Ministry of Science and Innovation under grant PID2019-107255GB and the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No 826647 (EPI)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
90. On the analysis of hardware event monitors accuracy in MPSoCs for real-time computing systems
- Author
-
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Kosmidis, Leonidas, Tabani, Hamid, Barrera Herrera, Javier Enrique, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Kosmidis, Leonidas, Tabani, Hamid, and Barrera Herrera, Javier Enrique
- Abstract
The number of mechanical subsystems enhanced or completely replaced by electrical/electronic components is on the rise in critical real-time embedded systems (CRTES) like those in cars, planes, trains, and satellites. In this line, software is increasingly used to control (safety-related) critical aspects of CRTES. More complex software requires unprecedented computing performance requirements, that can only be achieved by deploying aggressive processor designs, multiprocessor system on chips (SoCs or MPSoCs). The other side of the coin is that MPSoCs make software timing analysis -- a mandatory pre-requisite for CRTES -- more complex. Performance Monitoring Units (PMUs) are at the heart of most advanced software timing analysis techniques to control and bound the impact of contention in Commercial Off The-Shelf (COTS) System-on-Chips (SoCs) with shared resources (e.g., GPUs and multicore CPUs). However, PMUs are designed with an assurance level below the role they assume in software timing analysis. In this Thesis, we aim at taking an initial step toward reconciling PMU verification with its key role for timing analysis. In particular, this Thesis covers the analysis of the correctness of hardware event monitor (HEM) in embedded processors for CRTES domains. This Thesis illustrates that some event monitors do not behave as expected in their specification, which can in turn invalidate the software timing analysis process performed building on those HEMs. For three real processors used in different CRTES domains, we report discrepancies on the values obtained from the PMU's HEMs and the number of events expected based on HEM description in the processor's official documentation. Discrepancies, which may be either due to actual errors or inaccurate specifications, make PMU readings unreliable. This is particularly problematic in consideration of the critical role played by event monitors for timing analysis in domains such as automotive and avionics. This Thesis propo
- Published
- 2020
91. GMAI: Understanding and exploiting the internals of GPU resource allocation in critical systems
- Author
-
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Calderón Torres, Alejandro Josué, Kosmidis, Leonidas, Nicolás Ramírez, Carlos Fernando, Cazorla Almeida, Francisco Javier, Onaindia, Peio, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Calderón Torres, Alejandro Josué, Kosmidis, Leonidas, Nicolás Ramírez, Carlos Fernando, Cazorla Almeida, Francisco Javier, and Onaindia, Peio
- Abstract
Critical real-time systems require strict resource provisioning in terms of memory and timing. The constant need for higher performance in these systems has led industry to recently include GPUs. However, GPU software ecosystems are by their nature closed source, forcing system engineers to consider them as black boxes, complicating resource provisioning. In this work, we reverse engineer the internal operations of the GPU system software to increase the understanding of their observed behaviour and how resources are internally managed. We present our methodology that is incorporated in GMAI (GPU Memory Allocation Inspector), a tool that allows system engineers to accurately determine the exact amount of resources required by their critical systems, avoiding underprovisioning. We first apply our methodology on a wide range of GPU hardware from different vendors showing its generality in obtaining the properties of the GPU memory allocators. Next, we demonstrate the benefits of such knowledge in resource provisioning of two case studies from the automotive domain, where the actual memory consumption is up to 5.6× more than the memory requested by the application., This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P, the HiPEAC Network of Excellence and the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation programme (grant agreement No. 772773). Leonidas Kosmidis is also funded by the Spanish Ministry of Economy and Competitiveness (MINECO) under a Juan de la Cierva Formación postdoctoral fellowship (FJCI-2017-34095)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
92. An on-board algorithm implementation on an embedded GPU: A space case study
- Author
-
Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Rodríguez Ferrandez, Iván, Kosmidis, Leonidas, Notebaert, Olivier, Cazorla Almeida, Francisco Javier, Steenari, David, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Rodríguez Ferrandez, Iván, Kosmidis, Leonidas, Notebaert, Olivier, Cazorla Almeida, Francisco Javier, and Steenari, David
- Abstract
On-board processing requirements of future space missions are constantly increasing, calling for new hardware than the traditional ones used in space. Embedded GPUs are an attractive candidate offering both high performance capabilities and low power consumption, but there are no complex industrial case studies from the space domain demonstrating these advantages. In this paper we present the GPU parallelisation of an on-board algorithm, as well as its performance on a promising embedded GPU COTS platform targeting critical systems., This work is funded by ESA under the GPU4S (GPU for Space) project (ITT AO/1-9010/17/NL/AF) and an ERC grant (No. 772773). It is also partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grants TIN2015-65316-P and FJCI-2017-34095 and HiPEAC., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
93. On the reliability of hardware event monitors in MPSoCs for critical domains
- Author
-
Barcelona Supercomputing Center, Barrera Herrera, Javier Enrique, Kosmidis, Leonidas, Tabani, Hamid, Mezzetti, Enrico, Abella Ferrer, Jaume, Fernández, Mikel, Bernat Nicolau, Guillem Joan, Cazorla Almeida, Francisco Javier, Barcelona Supercomputing Center, Barrera Herrera, Javier Enrique, Kosmidis, Leonidas, Tabani, Hamid, Mezzetti, Enrico, Abella Ferrer, Jaume, Fernández, Mikel, Bernat Nicolau, Guillem Joan, and Cazorla Almeida, Francisco Javier
- Abstract
Performance Monitoring Units (PMUs) are at the heart of most-advanced timing analysis techniques to control and bound the impact of contention in Commercial Off-The-Shelf (COTS) SoCs with shared resources (e.g. GPUs and multicore CPUs). In this paper, we report discrepancies on the values obtained from the PMU event monitors and the number of events expected based on PMU event description in the processor's official documentation. Discrepancies, which may be either due to actual errors or inaccurate specifications, make PMU readings unreliable. This is particularly problematic in consideration of the critical role played by event monitors for timing analysis in domains such as automotive and avionics. This paper proposes a systematic procedure for event monitor validation. We apply it to validate event monitors in the NVIDIA Xavier and TX2, and the Zynq UltraScale+ MPSoC. We show that, while some event monitors count as expected, this is not the case for others whose discrepancies with expected values we analyze., This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P, the SELENE European Union’s Horizon 2020 (H2020) research and innovation programme under grant agreement No 871467, and the HiPEAC Network of Excellence. MINECO partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717), Enrico Mezzetti under Juan de la-Cierva-Incorporacion postdoctoral fellowship (IJCI-2016-27396), and Leonidas Kosmidis under Juan de la Cierva-Formacion postdoctoral fellowship (FJCI-2017-34095)., Peer Reviewed, Postprint (author's final draft)
- Published
- 2020
94. Comparison of GPU Computing Methodologies for Safety-Critical Systems: An Avionics Case Study
- Author
-
Benito, Marc, primary, Trompouki, Matina Maria, additional, Kosmidis, Leonidas, additional, Garcia, Juan David, additional, Carretero, Sergio, additional, and Wenger, Ken, additional
- Published
- 2021
- Full Text
- View/download PDF
95. GPU4S: Major Project Outcomes, Lessons Learnt and Way Forward
- Author
-
Kosmidis, Leonidas, primary, Rodriguez, Ivan, additional, Jover-Alvarez, Alvaro, additional, Alcaide, Sergi, additional, Lachaize, Jerome, additional, Notebaert, Olivier, additional, Certain, Antoine, additional, and Steenari, David, additional
- Published
- 2021
- Full Text
- View/download PDF
96. The UP2DATE Baseline Research Platforms
- Author
-
Jover-Alvarez, Alvaro, primary, Calderon, Alejandro J., additional, Rodriguez, Ivan, additional, Kosmidis, Leonidas, additional, Asifuzzaman, Kazi, additional, Uven, Patrick, additional, Gruttner, Kim, additional, Poggi, Tomaso, additional, and Agirre, Irune, additional
- Published
- 2021
- Full Text
- View/download PDF
97. Achieving diverse redundancy for GPU Kernels
- Author
-
Alcaide, Sergi, primary, Kosmidis, Leonidas, additional, Hernandez, Carles, additional, and Abella, Jaume, additional
- Published
- 2021
- Full Text
- View/download PDF
98. Compiler Support for an AI-oriented SIMD Extension of a Space Processor.
- Author
-
Solé, Marc and Kosmidis, Leonidas
- Subjects
- *
COMPILERS (Computer programs) , *WORK experience (Employment) - Abstract
In this on going research paper, we present our work on the compiler support for an AI-oriented SIMD Extension, called SPARROW. The SPARROW hardware design has been developed during a recently defended, award-winning Master Thesis and is targeting Cobham Gaisler's space processors Leon3 and NOEL-V. We present the compiler support we have included in two compiler toolchains, gcc and llvm as well as a SIMD intrinsics library for easy programmability. Compiler modifications are kept to minimum in order to enable incremental qualification of the toolchains. We present our experience working with the two compilers and performance results for the two compilers on top an FPGA implementation of the target space processor. [ABSTRACT FROM AUTHOR]
- Published
- 2022
99. Space Compression Algorithms Acceleration on Embedded Multi-core and GPU Platforms.
- Author
-
Jover-Alvarez, Alvaro, Rodriguez, Ivan, Kosmidis, Leonidas, and Steenari, David
- Subjects
DATA compression ,ALGORITHMS ,ELECTRONIC data processing ,COMPUTER performance ,IMAGE compression - Abstract
Future space missions will require increased on-board computing power to process and compress massive amounts of data. Consequently, embedded multi-core and GPU platforms are considered, which have been shown beneficial for data processing. However, the acceleration of data compression - an inherently sequential task - has not been explored. In this on-going research paper, we parallelize two space compression standards on both CPUs and GPUs using two candidate embedded GPU platforms for space showing that despite the challenging nature of CCSDS algorithms, their parallelization is possible and can provide significant performance benefits. [ABSTRACT FROM AUTHOR]
- Published
- 2022
100. An Academic RISC-V Silicon Implementation Based on Open-Source Components
- Author
-
Abella, Jaume, primary, Bulla, Calvin, additional, Cabo, Guillem, additional, Cazorla, Francisco J., additional, Cristal, Adrian, additional, Doblas, Max, additional, Figueras, Roger, additional, Gonzalez, Alberto, additional, Hernandez, Carles, additional, Hernandez, Cesar, additional, Jimenez, Victor, additional, Kosmidis, Leonidas, additional, Kostalabros, Vatistas, additional, Langarita, Ruben, additional, Leyva, Neiel, additional, Lopez-Paradis, Guillem, additional, Marimon, Joan, additional, Martinez, Ricardo, additional, Mendoza, Jonnatan, additional, Moll, Francesc, additional, Moreto, Miquel, additional, Pavon, Julian, additional, Ramirez, Cristobal, additional, Ramirez, Marco A., additional, Rojas, Carlos, additional, Rubio, Antonio, additional, Ruiz, Abraham, additional, Sonmez, Nehir, additional, Soria, Victor, additional, Teres, Lluis, additional, Unsal, Osman, additional, Valero, Mateo, additional, Vargas, Ivan, additional, and Villa, Luis, additional
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.