Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Jha, Sudhanshu Shekhar, Heirman, Wim, Falcón Samper, Ayose Jesus, Carlson, Trevor E., Van Craeynest, Kenzo, Tubella Murgadas, Jordi, González Colás, Antonio María, Eeckhout, Lieven, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors, Jha, Sudhanshu Shekhar, Heirman, Wim, Falcón Samper, Ayose Jesus, Carlson, Trevor E., Van Craeynest, Kenzo, Tubella Murgadas, Jordi, González Colás, Antonio María, and Eeckhout, Lieven
Modern microprocessors are increasingly power-constrained as a result of slowed supply voltage scaling (end of Dennard scaling) in conjunction with the transistor density scaling (Moore's Law). Existing many-core power management techniques such as chip-wide/per-core DVFS, and core and cache adaptation are quite effective in isolation at moderate to high power budgets. However, for future many-core chip, the existing techniques do not scale well to large core counts, small time slices and stringent power budgets. We need a new solution that combines different adaptation and reconfiguration techniques. In this paper, we present Chrysso, an integrated, scalable and low-overhead power management framework. Chrysso consists of a three-step process: leveraging simple analytical performance and power models, pruning the search space early using local Pareto front generation, followed by global utility-based power allocation. This ensures scalable and effective dynamic adaptation of many-core processors at short time scales along multiple axes, including core, cache and per-core DVFS adaptations. By integrating multiple power management techniques into a common methodology, Chrysso provides significant performance improvements over isolated mechanisms within a given power budget without power-gating cores. On a 64-core system, Chrysso improves system throughput by 1.6× and 1.9× over core-gating at stringent power envelops for multi-program (SPEC) and multi-threaded (PARSEC) workloads, respectively., Peer Reviewed, Postprint (author's final draft)