Descriptor: "task-based programming model" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"task-based programming model"' showing total 21 results

Start Over Descriptor "task-based programming model"

21 results on '"task-based programming model"'

1. Task-Level Checkpointing System for Task-Based Parallel Workflows

Author: Vergés, Pere, Lordan, Francesc, Ejarque, Jorge, Badia, Rosa M., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Singer, Jeremy, editor, Elkhatib, Yehia, editor, Blanco Heras, Dora, editor, Diehl, Patrick, editor, Brown, Nick, editor, and Ilic, Aleksandar, editor
Published: 2023
Full Text: View/download PDF

2. Post-cloud Computing: Addressing Resource Management in the Resource Continuum

Author: Zanella, Michele and Riva, Carlo G., editor
Published: 2023
Full Text: View/download PDF

3. Runtime-Assisted Shared Cache Insertion Policies Based on Re-reference Intervals

Author: Dimić, Vladimir, Moretó, Miquel, Casas, Marc, Valero, Mateo, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Rivera, Francisco F., editor, Pena, Tomás F., editor, and Cabaleiro, José C., editor
Published: 2017
Full Text: View/download PDF

4. Abstraction Layer For Standardizing APIs of Task-Based Engines.

Author: Alomairy, Rabab, Ltaief, Hatem, Abduljabbar, Mustafa, and Keyes, David
Subjects: *DYNAMICAL systems, *ENGINES, *APPLICATION program interfaces, *TASK analysis, *COMPILERS (Computer programs)
Abstract: We introduce AL4SAN, a lightweight library for abstracting the APIs of task-based runtime engines. AL4SAN unifies the expression of tasks and their data dependencies. It supports various dynamic runtime systems relying on compiler technology and user-defined APIs. It enables a single application to employ different runtimes and their respective scheduling components, while providing user-obliviousness to the underlying hardware configurations. AL4SAN exposes common front-end APIs and connects to different back-end runtimes. Experiments on performance and overhead assessments are reported on various shared- and distributed-memory systems, possibly equipped with hardware accelerators. A range of workloads, from compute-bound to memory-bound regimes, are employed as proxies for current scientific applications. The low overhead (less than 10 percent) achieved using a variety of workloads enables AL4SAN to be deployed for fast development of task-based numerical algorithms. More interestingly, AL4SAN enables runtime interoperability by switching runtimes at runtime. Blending runtime systems permits to achieve a twofold speedup on a task-based generalized symmetric eigenvalue solver, relative to state-of-the-art implementations. The ultimate goal of AL4SAN is not to create a new runtime, but to strengthen co-design of existing runtimes/applications, while facilitating user productivity and code portability. The code of AL4SAN is freely available at https://github.com/ecrc/al4san , with extensions in progress. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

5. Enabling Model-Centric Debugging for Task-Based Programming Models—A Tasking Control Interface

Author: Nachtmann, Mathias, Gracia, José, Knüpfer, Andreas, editor, Hilbrich, Tobias, editor, Niethammer, Christoph, editor, Gracia, José, editor, Nagel, Wolfgang E., editor, and Resch, Michael M., editor
Published: 2016
Full Text: View/download PDF

6. Unified fault-tolerance framework for hybrid task-parallel message-passing applications.

Author: Subasi, Omer, Martsinkevich, Tatiana, Zyulkyarov, Ferad, Unsal, Osman, Labarta, Jesus, and Cappello, Franck
Subjects: *APPLICATION software, *FAULT-tolerant computing, *MESSAGE passing (Computer science), *COMPUTER network protocols, *PARALLEL programs (Computer programs)
Abstract: We present a unified fault-tolerance framework for task-parallel message-passing applications to mitigate transient errors. First, we propose a fault-tolerant message-logging protocol that only requires the restart of the task that experienced the error and transparently handles any message passing interface calls inside the task. In our experiments we demonstrate that our fault-tolerant solution has a reasonable overhead, with a maximum observed overhead of 4.5%. We also show that fine-grained parallelization is important for hiding the overheads related to the protocol as well as the recovery of tasks. Secondly, we develop a mathematical model to unify task-level checkpointing and our protocol with system-wide checkpointing in order to provide complete failure coverage. We provide closed formulas for the optimal checkpointing interval and the performance score of the unified scheme. Experimental results show that the performance improvement can be as high as 98% with the unified scheme. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

7. Task-level checkpointing system for task-based parallel workflows

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Vergés Boncompte, Pere, Lordan Gomis, Francesc, Ejarque Artigas, Jorge, Badia Sala, Rosa Maria, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Vergés Boncompte, Pere, Lordan Gomis, Francesc, Ejarque Artigas, Jorge, and Badia Sala, Rosa Maria
Abstract: Scientific applications are large and complex; task-based programming models are a popular approach to developing these applications due to their ease of programming and ability to handle complex workflows and distribute their workload across large infrastructures. In these environments, either the hardware or the software may lead to failures from a myriad of origins: application logic, system software, memory, network, or disk. Re-executing a failed application can take hours, days, or even weeks, thus, dragging out the research. This article proposes a recovery system for dynamic task-based models to reduce the re-execution time of failed runs. The design encapsulates in a checkpointing manager the automatic checkpointing of the execution, leveraging different mechanisms that can be arbitrarily defined and tuned to fit the needs of each performance. Additionally, it offers an API call to establish snapshots of the execution from the application code. The experiments executed on a prototype implementation have reached a speedup of 1.9× after re-execution and shown no overhead on the execution time on successful first runs of specific applications., This work has been supported by the Spanish Government (PID2019-107255GB), by Generalitat de Catalunya (contract 2017-SGR-01414), and by the European Commission through the Horizon 2020 Research and Innovation program under Grant Agreement No. 955558 (eFlows4HPC- project). This work has partially been co-funded with 50% by the European Regional Development Fund under the framework of the ERFD Operative Programme for Catalunya 2014-2020., Peer Reviewed, Postprint (author's final draft)
Published: 2022

8. Performance testing of ML and HDC : parallelized applications on top of RISC-V architecture

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Badia Sala, Rosa Maria, Nicolau, Alexandru, Veidenbaum, Alex, Vergés Boncompte, Pere, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Badia Sala, Rosa Maria, Nicolau, Alexandru, Veidenbaum, Alex, and Vergés Boncompte, Pere
Abstract: The economic impact that proprietary ISA has on the market increased the interest in using Open Source ISA. More specifically RISC-V has been getting a lot of traction in the research community. The Open Source environment allowed for the development of software and hardware stack for Exascale computations. To take advantage of these resources and allow for executions of large and complex applications, task-based programming models have become more popular, thanks to their ease when handling composite workflows that require a large amount of data and computation time. Moreover, most of the applications being developed nowadays are related to Machine Learning in general, and in the context of RISC-V, there is a lot of interest in developing applications for Embedded Systems, where the framework of Hyperdimensional Computing is becoming more popular. For these reasons in we present this study in the scope of the MareNostrum Experimental Exascale Platform (MEEP), which is a flexible FPGA-based emulation platform designed for future RISC-V supercomputers. This study evaluates Machine Learning algorithms, classical Linear Algebra algorithms used for ML, and Hyperdimensional Computing Algorithms using COMPSs, a task-based programming model for the development of applications for distributed infrastructures, in different RISC-V boards being developed in the MEEP project and different mathematical libraries.
Published: 2022

9. Task-based Runtime Optimizations Towards High Performance Computing Applications

Author: Cao, Qinglei
Subjects: Low-rank approximations, Mixed-precision, Cholesky factorization, Data redistribution, Task-based programming model, Dynamic runtime system, Numerical Analysis and Scientific Computing, Programming Languages and Compilers, Software Engineering
Abstract: The last decades have witnessed a rapid improvement of computational capabilities in high-performance computing (HPC) platforms thanks to hardware technology scaling. HPC architectures benefit from mainstream advances on the hardware with many-core systems, deep hierarchical memory subsystem, non-uniform memory access, and an ever-increasing gap between computational power and memory bandwidth. This has necessitated continuous adaptations across the software stack to maintain high hardware utilization. In this HPC landscape of potentially million-way parallelism, task-based programming models associated with dynamic runtime systems are becoming more popular, which fosters developers’ productivity at extreme scale by abstracting the underlying hardware complexity. In this context, this dissertation highlights how a software bundle powered by a task-based programming model can address the heterogeneous workloads engendered by HPC applications., i.e., data redistribution, geospatial modeling and 3D unstructured mesh deformation here. Data redistribution aims to reshuffle data to optimize some objective for an algorithm, whose objective can be multi-dimensional, such as improving computational load balance or decreasing communication volume or cost, with the ultimate goal of increasing the efficiency and therefore reducing the time-to-solution for the algorithm. Geostatistical modeling, one of the prime motivating applications for exascale computing, is a technique for predicting desired quantities from geographically distributed data, based on statistical models and optimization of parameters. Meshing the deformable contour of moving 3D bodies is an expensive operation that can cause huge computational challenges in fluid-structure interaction (FSI) applications. Therefore, in this dissertation, Redistribute-PaRSEC, ExaGeoStat-PaRSEC and HiCMA-PaRSEC are proposed to efficiently tackle these HPC applications respectively at extreme scale, and they are evaluated on multiple HPC clusters, including AMD-based, Intel-based, Arm-based CPU systems and IBM-based multi-GPU system. This multidisciplinary work emphasizes the need for runtime systems to go beyond their primary responsibility of task scheduling on massively parallel hardware system for servicing the next-generation scientific applications.
Published: 2022

10. RICH: implementing reductions in the cache hierarchy

Author: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Dimic, Vladimir, Moretó Planas, Miquel, Casas, Marc, Ciesko, Jan, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Dimic, Vladimir, Moretó Planas, Miquel, Casas, Marc, Ciesko, Jan, and Valero Cortés, Mateo
Abstract: Reductions constitute a frequent algorithmic pattern in high-performance and scientific computing. Sophisticated techniques are needed to ensure their correct and scalable concurrent execution on modern processors. Reductions on large arrays represent the most demanding case where traditional approaches are not always applicable due to low performance scalability. To address these challenges, we propose RICH, a runtime-assisted solution that relies on architectural and parallel programming model extensions. RICH updates the reduction variable directly in the cache hierarchy with the help of added in-cache functional units. Our programming model extensions fit with the most relevant parallel programming solutions for shared memory environments like OpenMP. RICH does not modify the ISA, which allows the use of algorithms with reductions from pre-compiled external libraries. Experiments show that our solution achieves the performance improvements of 11.2% on average, compared to the state-of-the-art hardware-based approaches, while it introduces 2.4% area and 3.8% power overhead., This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), and by Generalitat de Catalunya (contracts 2017- SGR-1414 and 2017-SGR-1328). V. Dimić has been partially supported by the Agency for Management of University and Research Grants (AGAUR) of the Government of Catalonia under Ajuts per a la contractació de personal investigador novell fellowship number 2017 FI_B 00855. M. Moretó has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramón y Cajal fellowship number RYC-2016-21104. M. Casas has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2017-23269. This manuscript has been co-authored by National Technology & Engineering Solutions of Sandia, LLC. under Contract No. DENA0003525 with the U.S. Department of Energy/National Nuclear Security Administration, Peer Reviewed, Postprint (author's final draft)
Published: 2020

11. RICH: implementing reductions in the cache hierarchy

Author: Jan Ciesko, Mateo Valero, Marc Casas, Vladimir Dimić, Miquel Moreto, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, and Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
Subjects: Computer science, Parallel programming (Computer science), 02 engineering and technology, Parallel computing, Programació en paral·lel (Informàtica), 01 natural sciences, Reduction (complexity), Shared memory, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), Task-based programming model, Cache hierarchy, Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC], 010302 applied physics, Gestió de memòria (Informàtica), Caches, 020202 computer hardware & architecture, Variable (computer science), Memory management (Computer science), Parallel programming model, Scalability, Programming paradigm, Superordinadors, High performance computing, Reductions
Abstract: Reductions constitute a frequent algorithmic pattern in high-performance and scientific computing. Sophisticated techniques are needed to ensure their correct and scalable concurrent execution on modern processors. Reductions on large arrays represent the most demanding case where traditional approaches are not always applicable due to low performance scalability. To address these challenges, we propose RICH, a runtime-assisted solution that relies on architectural and parallel programming model extensions. RICH updates the reduction variable directly in the cache hierarchy with the help of added in-cache functional units. Our programming model extensions fit with the most relevant parallel programming solutions for shared memory environments like OpenMP. RICH does not modify the ISA, which allows the use of algorithms with reductions from pre-compiled external libraries. Experiments show that our solution achieves the performance improvements of 11.2% on average, compared to the state-of-the-art hardware-based approaches, while it introduces 2.4% area and 3.8% power overhead. This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), and by Generalitat de Catalunya (contracts 2017- SGR-1414 and 2017-SGR-1328). V. Dimić has been partially supported by the Agency for Management of University and Research Grants (AGAUR) of the Government of Catalonia under Ajuts per a la contractació de personal investigador novell fellowship number 2017 FI_B 00855. M. Moretó has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramón y Cajal fellowship number RYC-2016-21104. M. Casas has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2017-23269. This manuscript has been co-authored by National Technology & Engineering Solutions of Sandia, LLC. under Contract No. DENA0003525 with the U.S. Department of Energy/National Nuclear Security Administration
Published: 2020

12. Asynchronous Task-Based Execution of the Reverse Time Migration for the Oil and Gas Industry

Author: I. Said, Samuel Thibault, Amani AlOnazi, David E. Keyes, Hatem Ltaief, King Abdullah University of Science and Technology (KAUST), NVIDIA (NVIDIA), STatic Optimizations, Runtime Methods (STORM), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), This research used resources of the Oak Ridge Leader-ship Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of theU.S. Department of Energy under Contract No. DE-AC05-00OR22725., and Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest
Subjects: Instruction prefetch, Out-Of-Core Algorithms, Overlapping I/O with Computation, Memory hierarchy, business.industry, Computer science, Asynchronous Executions, 010103 numerical & computational mathematics, Parallel computing, Task-Based Programming Model, 010502 geochemistry & geophysics, 01 natural sciences, Reverse Time Migration, STARPU OOC, Runtime system, High memory, Asynchronous communication, Computer data storage, Scalability, [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], 0101 mathematics, business, Massively parallel, 0105 earth and related environmental sciences
Abstract: International audience; We propose a new framework for deploying Reverse Time Migration (RTM) simulations on distributed-memory systems equipped with multiple GPUs. Our software, TB-RTM, infrastructure engine relies on the STARPU dynamic runtime system to orchestrate the asynchronous scheduling of RTM computational tasks on the underlying resources. Besides dealing with the challenging hardware heterogeneity, TB-RTM supports tasks with different workload characteristics, which stress disparate components of the hardware system. RTM is challenging in that it operates intensively at both ends of the memory hierarchy, with compute kernels running at the highest level of the memory system, possibly in GPU main memory, while I/O kernels are saving solution data to fast storage. We consider how to span the wide performance gap between the two extreme ends of the memory system, i.e., GPU memory and fast storage, on which large-scale RTM simulations routinely execute. To maximize hardware occupancy while maintaining high memory bandwidth throughout the memory subsystem, our framework presents the new out-of-core (OOC) feature from STARPU to prefetch data solutions in and out not only from/to the GPU/CPU main memory but also from/to the fast storage system. The OOC technique may trigger opportunities for overlapping expensive data movement with computations. TB-RTM framework addresses this challenging problem of heterogeneity with a systematic approach that is oblivious to the targeted hardware architectures. Our resulting RTM framework can effectively be deployed on massively parallel GPU-based systems, while delivering performance scalability up to 500 GPUs.
Published: 2019
Full Text: View/download PDF

13. Optimizing computation-communication overlap in asynchronous task-based programs

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Castillo, Emilio, Jain, Nikhil, Casas, Marc, Moretó Planas, Miquel, Schulz, Martin, Beivide Palacio, Julio Ramon, Valero Cortés, Mateo, Bhatele, Abhinav, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Castillo, Emilio, Jain, Nikhil, Casas, Marc, Moretó Planas, Miquel, Schulz, Martin, Beivide Palacio, Julio Ramon, Valero Cortés, Mateo, and Bhatele, Abhinav
Abstract: Asynchronous task-based programming models are gaining popularity to address the programmability and performance challenges in high performance computing. One of the main attractions of these models and runtimes is their potential to automatically expose and exploit overlap of computation with communication. However, we find that inefficient interactions between these programming models and the underlying messaging layer (in most cases, MPI) limit the achievable computation-communication overlap and negatively impact the performance of parallel programs. We address this challenge by exposing and exploiting information about MPI internals in a task-based runtime system to make better task-creation and scheduling decisions. In particular, we present two mechanisms for exchanging information between MPI and a task-based runtime, and analyze their trade-offs. Further, we present a detailed evaluation of the proposed mechanisms implemented in MPI and a task-based runtime. We show performance improvements of up to 16.3% and 34.5% for proxy applications with point-to-point and collective communication, respectively., Peer Reviewed, Postprint (author's final draft)
Published: 2019

14. PureMEM: A Structured Programming Model for Transiently Powered Computers

Author: Geylani Kardas, Kasim Yildirim, Caglar Durmaz, and Ege Üniversitesi
Subjects: Data consistency, Computer science, Distributed computing, Control (management), 020207 software engineering, Structured Programming Model, 02 engineering and technology, Transiently Powered Computers, Structured programming, Task-Based Programming Model, Task (project management), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Programming paradigm, Embedded Systems and Software
Abstract: EgeUn###, Advances in energy harvesting circuits and energy efficient architecture of processors create the potential for batteryless computing and sensing systems called transiently powered computers. These computers can only operate intermittently due to fluctuating nature of ambient energy. Intermittent operation requires a new programming model that should preserve forward progress and maintain data consistency; which are challenging. We propose a structured task-based programming model; namely PureMEM, to cope with these challenges. We discuss how PureMEM prevents interdependencies caused by the unstructured control encountered in intermittent operation, enables re-usability of the tasks, provides dynamic memory management and supports error handling. We also present intermittent programs to exemplify the features of PureMEM., Assoc Comp Machinery Special Interest Grp Appl Comp
Published: 2019

15. Optimizing computation-communication overlap in asynchronous task-based programs

Author: Ramon Beivide, Marc Casas, Emilio Castillo, Miquel Moreto, Abhinav Bhatele, Nikhil Jain, Mateo Valero, Martin Schulz, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, and Barcelona Supercomputing Center
Subjects: 020203 distributed computing, Exploit, Parallel processing (Electronic computers), Computer science, Computation-communication overlap, Distributed computing, Computation, Processament en paral·lel (Ordinadors), Parallel programming (Computer science), 02 engineering and technology, Programació en paral·lel (Informàtica), Supercomputer, Popularity, 020202 computer hardware & architecture, Scheduling (computing), Runtime system, Asynchronous communication, 0202 electrical engineering, electronic engineering, information engineering, Programming paradigm, Mpi, High performance computing, Task-based programming model, Informàtica::Arquitectura de computadors::Arquitectures paral·leles [Àrees temàtiques de la UPC], Càlcul intensiu (Informàtica)
Abstract: Asynchronous task-based programming models are gaining popularity to address the programmability and performance challenges in high performance computing. One of the main attractions of these models and runtimes is their potential to automatically expose and exploit overlap of computation with communication. However, we find that inefficient interactions between these programming models and the underlying messaging layer (in most cases, MPI) limit the achievable computation-communication overlap and negatively impact the performance of parallel programs. We address this challenge by exposing and exploiting information about MPI internals in a task-based runtime system to make better task-creation and scheduling decisions. In particular, we present two mechanisms for exchanging information between MPI and a task-based runtime, and analyze their trade-offs. Further, we present a detailed evaluation of the proposed mechanisms implemented in MPI and a task-based runtime. We show performance improvements of up to 16.3% and 34.5% for proxy applications with point-to-point and collective communication, respectively.
Published: 2019
Full Text: View/download PDF

16. Graph partitioning applied to DAG scheduling to reduce NUMA effects

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Sánchez Barrera, Isaac, Casas, Marc, Moretó Planas, Miquel, Ayguadé Parra, Eduard, Labarta Mancho, Jesús José, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Sánchez Barrera, Isaac, Casas, Marc, Moretó Planas, Miquel, Ayguadé Parra, Eduard, Labarta Mancho, Jesús José, and Valero Cortés, Mateo
Abstract: The complexity of shared memory systems is becoming more relevant as the number of memory domains increases, with different access latencies and bandwidth rates depending on the proximity between the cores and the devices containing the data. In this context, techniques to manage and mitigate non-uniform memory access (NUMA) effects consist in migrating threads, memory pages or both and are typically applied by the system software. We propose techniques at the runtime system level to reduce NUMA effects on parallel applications. We leverage runtime system metadata in terms of a task dependency graph. Our approach, based on graph partitioning methods, is able to provide parallel performance improvements of 1.12X on average with respect to the state-of-the-art., This work has been partially supported by the RoMoL ERC Advanced Grant (GA 321253), the European HiPEAC Network of Excellence and the Spanish Government (contract TIN2015-65316-P). I. Sánchez Barrera has been supported by the Spanish Government under Formación del Profesorado Universitario fellowship number FPU15/03612., Peer Reviewed, Postprint (published version)
Published: 2018

17. Graph partitioning applied to DAG scheduling to reduce NUMA effects

Author: Marc Casas, Jesús Labarta, Miquel Moreto, Eduard Ayguadé, Mateo Valero, Isaac Sánchez Barrera, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, and Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
Subjects: graph partitioning, Computer science, Parallel computing, Thread (computing), 02 engineering and technology, 01 natural sciences, Scheduling (computing), Runtime system, NUMA, Shared memory, 020204 information systems, 0103 physical sciences, Informàtica::Sistemes d'informació::Emmagatzematge i recuperació de la informació [Àrees temàtiques de la UPC], 0202 electrical engineering, electronic engineering, information engineering, Task-based programming model, 010302 applied physics, Scheduling, Graph partition, 020207 software engineering, Gestió de memòria (Informàtica), Computer Graphics and Computer-Aided Design, Metadata, Memory management (Computer science), Graph (abstract data type), 020201 artificial intelligence & image processing, Software, System software
Abstract: The complexity of shared memory systems is becoming more relevant as the number of memory domains increases, with different access latencies and bandwidth rates depending on the proximity between the cores and the devices containing the data. In this context, techniques to manage and mitigate non-uniform memory access (NUMA) effects consist in migrating threads, memory pages or both and are typically applied by the system software. We propose techniques at the runtime system level to reduce NUMA effects on parallel applications. We leverage runtime system metadata in terms of a task dependency graph. Our approach, based on graph partitioning methods, is able to provide parallel performance improvements of 1.12X on average with respect to the state-of-the-art. This work has been partially supported by the RoMoL ERC Advanced Grant (GA 321253), the European HiPEAC Network of Excellence and the Spanish Government (contract TIN2015-65316-P). I. Sánchez Barrera has been supported by the Spanish Government under Formación del Profesorado Universitario fellowship number FPU15/03612.
Published: 2018

18. Runtime-assisted shared cache insertion policies based on re-reference intervals

Author: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Dimic, Vladimir, Moretó Planas, Miquel, Casas, Marc, Valero Cortés, Mateo, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Dimic, Vladimir, Moretó Planas, Miquel, Casas, Marc, and Valero Cortés, Mateo
Abstract: Processor speed is improving at a faster rate than the speed of main memory, which makes memory accesses increasingly expensive. One way to solve this problem is to reduce miss ratio of the processor’s last level cache by improving its replacement policy. We approach the problem by co-designing the runtime system and hardware and exploiting the semantics of the applications written in data-flow task-based programming models to provide hardware with information about the task types and task data-dependencies. We propose the Task-Type aware Insertion Policy, TTIP, which uses the runtime system to dynamically determine the best probability per task type for bimodal insertion in the recency stack and the static Dependency-Type aware Insertion Policy, DTIP, that inserts cache lines in the optimal position taking into account the dependency types of the current task. TTIP and DTIP perform similarly or better than state-of-the-art replacement policies, while requiring less hardware., This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). V. Dimic has been partially supported by AGAUR of the Government of Catalonia (contract 2017 FI B 00855). M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas has been supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (contract 2013 BP B 00243)., Peer Reviewed, Postprint (author's final draft)
Published: 2017

19. Unified fault-tolerance framework for hybrid task-parallel message-passing applications

Author: Barcelona Supercomputing Center, Subasi, Omer, Martsinkevich, Tatiana, Zyulkyarov, Ferad, Unsal, Osman Sabri, Labarta Mancho, Jesús José, Cappello, Franck, Barcelona Supercomputing Center, Subasi, Omer, Martsinkevich, Tatiana, Zyulkyarov, Ferad, Unsal, Osman Sabri, Labarta Mancho, Jesús José, and Cappello, Franck
Abstract: We present a unified fault-tolerance framework for task-parallel message-passing applications to mitigate transient errors. First, we propose a fault-tolerant message-logging protocol that only requires the restart of the task that experienced the error and transparently handles any message passing interface calls inside the task. In our experiments we demonstrate that our fault-tolerant solution has a reasonable overhead, with a maximum observed overhead of 4.5%. We also show that fine-grained parallelization is important for hiding the overheads related to the protocol as well as the recovery of tasks. Secondly, we develop a mathematical model to unify task-level checkpointing and our protocol with system-wide checkpointing in order to provide complete failure coverage. We provide closed formulas for the optimal checkpointing interval and the performance score of the unified scheme. Experimental results show that the performance improvement can be as high as 98% with the unified scheme., The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the FI-DGR 2013 scholarship and the European Community’s Seventh Framework Programme [FP7/2007-2013] under the Mont-blanc 2 Project (www.montblanc-project.eu), grant agreement no. 610402 and TIN2015-65316-P., Peer Reviewed, Postprint (author's final draft)
Published: 2016

20. Runtime assisted cache memory optimizations

Author: Dimic, Vladimir, Moreto Planas, Miquel, Valero Cortés, Mateo, and Casas Guix, Marc
Subjects: arquitectura de computadors, Jerarquia de memòria (Informàtica), Memory hierarchy (Computer science), sistema operatiu, runtime, model de programació basat en tasques, processor design, operating system, cache, processor cache, computer architecture, task-based programming model, disseny del processador, Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC]
Published: 2015

21. Runtime assisted cache memory optimizations

Author: Moretó Planas, Miquel, Valero Cortés, Mateo, Casas, Marc, Dimic, Vladimir, Moretó Planas, Miquel, Valero Cortés, Mateo, Casas, Marc, and Dimic, Vladimir
Published: 2015

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

21 results on '"task-based programming model"'

1. Task-Level Checkpointing System for Task-Based Parallel Workflows

2. Post-cloud Computing: Addressing Resource Management in the Resource Continuum

3. Runtime-Assisted Shared Cache Insertion Policies Based on Re-reference Intervals

4. Abstraction Layer For Standardizing APIs of Task-Based Engines.

5. Enabling Model-Centric Debugging for Task-Based Programming Models—A Tasking Control Interface

6. Unified fault-tolerance framework for hybrid task-parallel message-passing applications.

7. Task-level checkpointing system for task-based parallel workflows

8. Performance testing of ML and HDC : parallelized applications on top of RISC-V architecture

9. Task-based Runtime Optimizations Towards High Performance Computing Applications

10. RICH: implementing reductions in the cache hierarchy

11. RICH: implementing reductions in the cache hierarchy

12. Asynchronous Task-Based Execution of the Reverse Time Migration for the Oil and Gas Industry

13. Optimizing computation-communication overlap in asynchronous task-based programs

14. PureMEM: A Structured Programming Model for Transiently Powered Computers

15. Optimizing computation-communication overlap in asynchronous task-based programs

16. Graph partitioning applied to DAG scheduling to reduce NUMA effects

17. Graph partitioning applied to DAG scheduling to reduce NUMA effects

18. Runtime-assisted shared cache insertion policies based on re-reference intervals

19. Unified fault-tolerance framework for hybrid task-parallel message-passing applications

20. Runtime assisted cache memory optimizations

21. Runtime assisted cache memory optimizations

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

21 results on '"task-based programming model"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources