1. MUSA: a multi-level simulation approach for next-generation HPC machines
- Author
-
Thomas Grass, Cesar Allande, Adria Armejach, Alejandro Rico, Eduard Ayguade, Jesus Labarta, Mateo Valero, Marc Casas, Miquel Moreto, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions, and Barcelona Supercomputing Center
- Subjects
020203 distributed computing ,Multi-level simulation infrastructure ,Simulation Time Cost Analysis ,Informàtica [Àrees temàtiques de la UPC] ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,High performance computing ,High Performance Computing (HPC) ,Càlcul intensiu (Informàtica) - Abstract
The complexity of High Performance Computing (HPC) systems is increasing in the number of components and their heterogeneity. Interactions between software and hardware involve many different aspects which are typically not transparent to scientific programmers and system architects. Therefore, predicting the behavior of current scientific applications on future HPC infrastructures is a challenging task. In this paper we present MUSA, an end-to-end methodology that employs a multi-level simulation infrastructure. By combining different levels of abstraction, MUSA is able to model the communication network, microarchitectural details and system software interactions, providing different trade-offs in terms of simulation cost and accuracy. We compare detailed MUSA simulations with native executions of up to 2,048 cores and find relative errors that are within 10% in the common case. In addition, we use MUSA to simulate up to 16,384 cores and successfully identify scalability bottlenecks due to different factors, e.g. memory contention or load imbalance. We also compare different system configurations, showing how MUSA can help system designers to assess the usefulness of future technologies in next-generation HPC machines.