Start Over

A methodology for speeding up loop kernels by exploiting the software information and the memory architecture

Authors :: Angeliki Kritikakou
Vasilios Kelefouras
Costas E. Goutis
Department of Electrical and Computer Engineering [Patras] (ECE)
University of Patras
Energy Efficient Computing ArchItectures with Embedded Reconfigurable Resources (CAIRN)
Inria Rennes – Bretagne Atlantique
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-ARCHITECTURE (IRISA-D3)
Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA)
Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes)
Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes)
Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA)
Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)
University of Patras [Patras]
CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1)
Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes)
Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1)
Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA)
Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes)
Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)
Source :: Computer Languages, Systems and Structures, Computer Languages, Systems and Structures, 2015, 41, pp.21-41. ⟨10.1016/j.cl.2015.01.003⟩, Computer Languages, Systems and Structures, Elsevier, 2015, 41, pp.21-41. ⟨10.1016/j.cl.2015.01.003⟩
Publication Year :: 2015
Publisher :: Elsevier, 2015.
Abstract: It is well-known that today?s compilers and state of the art libraries have three major drawbacks. First, the compiler sub-problems are optimized separately; this is not efficient because the separate sub-problems optimization gives a different schedule for each sub-problem and these schedules cannot coexist as the refining of one, causes the degradation of another. Second, they take into account only part of the specific algorithm?s information. Third, they take into account only a few hardware architecture parameters. These approaches cannot give an optimal solution.In this paper, a new methodology/pre-compiler is introduced, which speeds up loop kernels, by overcoming the above problems. This methodology solves four of the major scheduling sub-problems, together as one problem and not separately; these are the sub-problems of finding the schedules with the minimum numbers of (i) L1 data cache accesses, (ii) L2 data cache accesses, (iii) main memory data accesses, (iv) addressing instructions. First, the exploration space (possible solutions) is found according to the algorithm?s information, e.g. array subscripts. Then, the exploration space is decreased by orders of magnitude, by applying constraint propagation to the software and hardware parameters.We take the C-code and the memory architecture parameters as input and we automatically produce a new faster C-code; this code cannot be obtained by applying the existing compiler transformations to the original code. The proposed methodology has been evaluated for five well-known algorithms in both general and embedded processors; it is compared with gcc and clang compilers and also with iterative compilation. HighlightsFor the first time, the software optimization problem is addressed theoretically.Exploit the software structure and the hardware architecture parameters.Solve the major scheduling subproblems together as one problem and not separately.

Subjects :: Hardware architecture
Schedule
Memory hierarchy
Computer Networks and Communications
Computer science
020207 software engineering
02 engineering and technology
Parallel computing
Loop tiling
Program optimization
computer.software_genre
020202 computer hardware & architecture
Memory architecture
0202 electrical engineering, electronic engineering, information engineering
[INFO]Computer Science [cs]
Compiler
computer
Software
Register allocation

Details

Language :: English
ISSN :: 14778424
Database :: OpenAIRE
Journal :: Computer Languages, Systems and Structures, Computer Languages, Systems and Structures, 2015, 41, pp.21-41. ⟨10.1016/j.cl.2015.01.003⟩, Computer Languages, Systems and Structures, Elsevier, 2015, 41, pp.21-41. ⟨10.1016/j.cl.2015.01.003⟩
Accession number :: edsair.doi.dedup.....16c504322c6752bc0b791e15ccb04905

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

A methodology for speeding up loop kernels by exploiting the software information and the memory architecture

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

A methodology for speeding up loop kernels by exploiting the software information and the memory architecture

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources