1. Design of an energy aware petaflops class high performance cluster based on power architecture
- Author
-
Wissam Abu Ahmad, Andrea Bartolini, Daniele Gregori, Andrea Borghesi, Francesco Beneventi, Marco Cicala, Cosimo Gianfreda, Antonio Libri, Privato Forestieri, Filippo Spiga, Luca Benini, Simone Tinti, DIPARTIMENTO DI INFORMATICA - SCIENZA E INGEGNERIA, DIPARTIMENTO DI INGEGNERIA DELL'ENERGIA ELETTRICA E DELL'INFORMAZIONE 'GUGLIELMO MARCONI', and Facolta' di INGEGNERIA
- Subjects
FOS: Computer and information sciences ,Power management ,Power Architecture ,Computer science ,Interface (computing) ,InfiniBand ,02 engineering and technology ,computer.software_genre ,Porting ,Software ,Bandwidth ,Hardware ,0202 electrical engineering, electronic engineering, information engineering ,020203 distributed computing ,business.industry ,power monitor ,Liquid cooling ,Supercomputer ,Power demand ,Supercomputers ,Energy Aware ,HPC ,NVLink ,liquid cooling ,Backplane ,Computer Science - Distributed, Parallel, and Cluster Computing ,Middleware ,Embedded system ,Component-based software engineering ,Operating system ,020201 artificial intelligence & image processing ,Distributed, Parallel, and Cluster Computing (cs.DC) ,business ,computer - Abstract
none 12 si In this paper we present D.A.V.I.D.E. (Development for an Added Value Infrastructure Designed in Europe), an innovative and energy efficient High Performance Computing cluster designed by E4 Computer Engineering for PRACE (Partnership for Advanced Computing in Europe). D.A.V.I.D.E. is built using best-in-class components (IBM’s POWER8-NVLink CPUs, NVIDIA TESLA P100 GPUs, Mellanox InfiniBand EDR 100 Gb/s networking) plus custom hardware and an innovative system middleware software. D.A.V.I.D.E. features (i) a dedicated power monitor interface, built around the BeagleBone Black Board that allows high frequency sampling directly from the power backplane and scalable integration with the internal node telemetry and system level power management software; (ii) a custom-built chassis, based on OpenRack form factor, and liquid cooling that allows the system to be used in modern, energy efficient, datacenter; (iii) software components designed for enabling fine grain power monitoring, power management (i.e. power capping and energy aware job scheduling) and application power profiling, based on dedicated machine learning components. Software APIs are offered to developers and users to tune the computing node performance and power consumption around on the application requirements. The first pilot system that we will deploy at the beginning of 2017, will demonstrate key HPC applications from different fields ported and optimized for this innovative platform. open Abu Ahmad, Wissam; Bartolini, Andrea; Beneventi, Francesco; Benini, Luca; Borghesi, Andrea; Cicala, Marco; Forestieri, Privato; Gianfreda, Cosimo; Gregori, Daniele; Libri, Antonio; Spiga, Filippo; Tinti, Simone Abu Ahmad, Wissam; Bartolini, Andrea; Beneventi, Francesco; Benini, Luca; Borghesi, Andrea; Cicala, Marco; Forestieri, Privato; Gianfreda, Cosimo; Gregori, Daniele; Libri, Antonio; Spiga, Filippo; Tinti, Simone
- Published
- 2023