Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Ortega Carrasco, Cristobal, Alvarez Martí, Lluc, Buyuktosunoglu, Alper, Bertran Monfort, Ramon, Rosedahl, Todd Jon, Bose, Pradip, Moretó Planas, Miquel, Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, Ortega Carrasco, Cristobal, Alvarez Martí, Lluc, Buyuktosunoglu, Alper, Bertran Monfort, Ramon, Rosedahl, Todd Jon, Bose, Pradip, and Moretó Planas, Miquel
The number and heterogeneity of compute devices, even within a single compute node, has been steadily on the rise. Since all systems must operate under a power cap, the number of discrete devices that can run simultaneously at their highest frequency is limited by the globally-imposed power cap. Current systems incorporate a centralized power management unit that statically controls the distribution of power among the devices within the node. However, such static distribution policies are unaware of the dynamic utilization profile across the devices, which leads to unfair power allocations that end up degrading system throughput performance. The problem is particularly acute in the presence of heterogeneity since type-specific performance-boost capabilities cannot be leveraged via utilization-agnostic static power allocations. This paper proposes Adaptive Power Shifting for multi-accelerator heterogeneous systems (APS), a technique that leverages system utilization information to dynamically allocate and re-distribute power budgets across multiple discrete devices. Democratizing the power allocation based on dynamic needs results in dramatic speedup over a need-agnostic static allocation. We use APS in a real OpenPOWER compute node with 2 CPUs and 4 GPUs to demonstrate the value of on-demand, equitable power allocations. Overall, the proposed solution increases performance with respect to two state-of-the-art techniques by up to 14.9% and 13.8%., This work has been partially supported by the European Union’s Horizon 2020 research and innovation program under the Mont-Blanc 2020 project (grant agreement 779877), by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB-C22), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and by the IBM/BSC Deep Learning Center initiative. Ll. Alvarez has been supported in part by the Spanish Ministry of Economy, Industry and Competitiveness under the Juan de la Cierva Formacion fellowship No. FJCI-2016- 30984. M. Moreto has been supported in part by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship No. RYC-2016-21104., Peer Reviewed, Postprint (author's final draft)