1. Mitigating the NUMA effect on task-based runtime systems
- Author
-
Maroñas Bravo, Marcos, Navarro Muñoz, Antoni, Ayguadé Parra, Eduard, Beltran Querol, Vicenç, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, and Universitat Politècnica de Catalunya. PM - Programming Models
- Subjects
Application program interfaces (Computer software) ,Task-aware ,Parallel processing (Electronic computers) ,Scheduling ,Parallel programming model ,Processament en paral·lel (Ordinadors) ,Parallel programming (Computer science) ,Interfícies de programació d'aplicacions (Programari) ,Programació en paral·lel (Informàtica) ,Theoretical Computer Science ,Hardware and Architecture ,Informàtica::Arquitectura de computadors::Arquitectures paral·leles [Àrees temàtiques de la UPC] ,NUMA-awareness ,OmpSs-2 ,Software ,Information Systems - Abstract
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processors usually expose a single shared address space. However, due to hardware restrictions, they adopt a NUMA approach, where each processor accesses local memory faster than remote memories. Reducing data motion is crucial to improve the overall performance. Thus, computations must run as close as possible to where the data resides. We propose a new approach that mitigates the NUMA effect on NUMA systems. Our solution is based on the OmpSs-2 programming model, a task-based parallel programming model, similar to OpenMP. We first provide a simple API to allocate memory in NUMA systems using different policies. Then, combining user-given information that specifies dependences between tasks, and information collected in a global directory when allocating data, we extend our runtime library to perform NUMA-aware work scheduling. Our heuristic considers data location, distance between NUMA nodes, and the load of each NUMA node to seamlessly minimize data motion costs and load imbalance. Our evaluation shows that our NUMA support can significantly mitigate the NUMA effect by reducing the amount of remote accesses, and so improving performance on most benchmarks, reaching up to 2x speedup in a 2-NUMA machine, and up to 7.1x in a 8-NUMA machine. This research has received funding from the European Union’s Horizon 2020/EuroHPC research and innovation programme under grant agreement No 955606 (DEEP-SEA), project PCI2021121958 financed by the Spanish State Research Agency - Ministry of Science and Innovation, Generalitat de Catalunya (contract 2021-SGR-01007), the Spanish Ministry of Science and Technology (contract PID2019-107255GB), and Severo Ochoa (CEX2021- 001148-S / MCIN/AEI /10.13039/501100011033).
- Published
- 2023
- Full Text
- View/download PDF