Back to Search Start Over

Locality-aware task scheduling for homogeneous parallel computing systems

Authors :
Isil Oz
Konstantin Popov
Muhammad Khurram Bhatti
Maria Mushtaq
Sarah Amin
Umer Farooq
Mats Brorsson
COMSATS Institute of Information Technology [Islamabad] (CIIT)
Information Technology University [Lahore] (ITU)
Izmir Institute of Technology (IZTECH)
Lab-STICC_UBS_CACS_MOCS
Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance (Lab-STICC)
École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique)
Institut Mines-Télécom [Paris] (IMT)-École Nationale d'Ingénieurs de Brest (ENIB)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique)
Institut Mines-Télécom [Paris] (IMT)
Department of Electrical and Computer Engineering, Dhofar University, Salalah
SICS
Swedish Institute of Computer Science [Stockholm] (SICS)
Royal Institute of Technology [Stockholm] (KTH )
Institut Mines-Télécom [Paris] (IMT)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique)
Institut Mines-Télécom [Paris] (IMT)-École Nationale d'Ingénieurs de Brest (ENIB)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)-Institut Mines-Télécom [Paris] (IMT)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique)
Institut Mines-Télécom [Paris] (IMT)-École Nationale d'Ingénieurs de Brest (ENIB)-École Nationale Supérieure de Techniques Avancées Bretagne (ENSTA Bretagne)-Université de Bretagne Sud (UBS)-Université de Brest (UBO)-Centre National de la Recherche Scientifique (CNRS)-Université Bretagne Loire (UBL)
Dhofar University
Öz, Işıl
Izmir Institute of Technology. Computer Engineering
Source :
Computing, Computing, Springer Verlag, 2017, ⟨10.1007/s00607-017-0581-6⟩
Publication Year :
2017
Publisher :
Springer Science and Business Media LLC, 2017.

Abstract

In systems with complex many-core cache hierarchy, exploiting data locality can significantly reduce execution time and energy consumption of parallel applications. Locality can be exploited at various hardware and software layers. For instance, by implementing private and shared caches in a multi-level fashion, recent hardware designs are already optimised for locality. However, this would all be useless if the software scheduling does not cast the execution in a manner that promotes locality available in the programs themselves. Since programs for parallel systems consist of tasks executed simultaneously, task scheduling becomes crucial for the performance in multi-level cache architectures. This paper presents a heuristic algorithm for homogeneous multi-core systems called locality-aware task scheduling (LeTS). The LeTS heuristic is a work-conserving algorithm that takes into account both locality and load balancing in order to reduce the execution time of target applications. The working principle of LeTS is based on two distinctive phases, namely; working task group formation phase (WTG-FP) and working task group ordering phase (WTG-OP). The WTG-FP forms groups of tasks in order to capture data reuse across tasks while the WTG-OP determines an optimal order of execution for task groups that minimizes the reuse distance of shared data between tasks. We have performed experiments using randomly generated task graphs by varying three major performance parameters, namely: (1) communication to computation ratio (CCR) between 0.1 and 1.0, (2) application size, i.e., task graphs comprising of 50-, 100-, and 300-tasks per graph, and (3) number of cores with 2-, 4-, 8-, and 16-cores execution scenarios. We have also performed experiments using selected real-world applications. The LeTS heuristic reduces overall execution time of applications by exploiting inter-task data locality. Results show that LeTS outperforms state-of-the-art algorithms in amortizing inter-task communication cost.

Details

ISSN :
14365057 and 0010485X
Volume :
100
Database :
OpenAIRE
Journal :
Computing
Accession number :
edsair.doi.dedup.....82c44d054e69d2cea164f0d3003e5ccc