Back to Search
Start Over
Combining PREM compilation and static scheduling for high-performance and predictable MPSoC execution.
- Source :
-
Parallel Computing . Jul2019, Vol. 85, p27-44. 18p. - Publication Year :
- 2019
-
Abstract
- • We present a compiler capable of transforming code to be suitable for predictable execution and real-time scheduling. The compiler generates programs that adhere to the Predictable Execution Model (PREM). • We shed light on compiler optimizations for prefetching based systems, and their impact on the ARM Cortex-A57. • We extend the state-of-the-art in scheduling heuristics to support multiple so called take-give resources, and are able to solve complex scheduling problems, infeasible for optimal solvers, in a few seconds. The heuristics create schedules that are close (about 10%) to the optimal schedule. • We provide insights on the effects of memory contention in MPSoC systems, and how active memory scheduling can greatly reduce the pessimism in worst-case execution time and scheduling jitter in real-time systems. Many applications require both high performance and predictable timing. High-performance can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these systems share main memory, they are susceptible to interference from each other, which is a problem for timing predictability. We achieve predictability on multi-cores by employing the predictable execution model (PREM), which splits execution into a sequence of memory and compute phases, and schedules these such that only a single core is executing a memory phase at a time. We present a toolchain consisting of a compiler and a scheduling tool. Our compiler uses region and loop based analysis and performs tiling to transform application code into PREM-compliant binaries. In addition to enabling predictable execution, the compiler transformation optimizes accesses to the shared main memory. The scheduling tool uses a state-of-the-art heuristic algorithm and is able to schedule industrial-size instances. For smaller instances, we compare the results of the algorithm with optimal solutions found by solving an integer linear programming model. Furthermore, we solve the problem of scheduling execution on multiple cores while preventing interference of memory phases. We evaluate our toolchain on Advanced Driver Assistance System (ADAS) application workloads running on an NVIDIA Tegra X1 embedded system-on-chip (SoC). The results show that our approach maintains similar average performance to the original (unmodified) program code and execution, while reducing variance of completion times by a factor of 9 with the identified optimal solutions and by a factor of 5 with schedules generated by our heuristic scheduler. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 01678191
- Volume :
- 85
- Database :
- Academic Search Index
- Journal :
- Parallel Computing
- Publication Type :
- Academic Journal
- Accession number :
- 136581229
- Full Text :
- https://doi.org/10.1016/j.parco.2018.11.002