1. Optimizing Distributed Load Balancing for Workloads with Time-Varying Imbalance
- Author
-
Phil Miller, Philippe Pierre Pebay, Matthew Tyler Bettencourt, Nicole Lemaster Slattengren, Francesco Rizzi, and Jonathan Lifflander
- Subjects
Dynamic programming ,Load management ,Speedup ,Computer science ,Distributed computing ,Scalability ,Programming paradigm ,Load balancing (computing) ,Protocol (object-oriented programming) ,Domain (software engineering) - Abstract
This paper explores dynamic load balancing algorithms used by asynchronous many-task (AMT), or ‘taskbased’, programming models to optimize task placement for scientific applications with dynamic workload imbalances. AMT programming models use overdecomposition of the computational domain. Overdecompostion provides a natural mechanism for domain developers to expose concurrency and break their computational domain into pieces that can be remapped to different hardware. This paper explores fully distributed load balancing strategies that have shown great promise for exascale-level computing but are challenging to theoretically reason about and implement effectively. We present a novel theoretical analysis of a gossip-based load balancing protocol and use it to build an efficient implementation with fast convergence rates and high load balancing quality. We demonstrate our algorithm in a next-generation plasma physics application (EMPIRE) that induces time-varying workload imbalance due to spatial non-uniformity in particle density across the domain. Our highly scalable, novel load balancing algorithm, achieves over a 3x speedup (particle work) compared to a bulk-synchronous MPI implementation without load balancing.
- Published
- 2021
- Full Text
- View/download PDF