1. Embedded Multi-Core Code Generation with Cross-Layer Parallelization
- Author
-
Oliver Oey and Michael Huebner and Timo Stripf and Juergen Becker, Oey, Oliver, Huebner, Michael, Stripf, Timo, Becker, Juergen, Oliver Oey and Michael Huebner and Timo Stripf and Juergen Becker, Oey, Oliver, Huebner, Michael, Stripf, Timo, and Becker, Juergen
- Abstract
In this paper, we present a method for optimizing C code for embedded multi-core systems using cross-layer parallelization. The method has two phases. The first is to develop the algorithm without any optimization for the target platform. Then, the second step is to optimize and parallelize the code across four defined layers which are the algorithm, code, task, and data layers, for efficient execution on the target hardware. Each layer is focused on selected hardware characteristics. By using an iterative approach, individual kernels and composite algorithms can be very well adapted to execution on the hardware without further adaptation of the algorithm itself. The realization of this cross-layer parallelization consists of algorithm recognition, code transformations, task distribution, and insertion of synchronization and communication statements. The method is evaluated first on a common kernel and then on a sample image processing algorithm to showcase the benefits of the approach. Compared to other methods that only rely on two or three of these layers, 20 to 30 % of additional performance gain can be achieved.
- Published
- 2024
- Full Text
- View/download PDF