Back to Search Start Over

SOMA: Observability, monitoring, and in situ analytics for exascale applications.

Authors :
Yokelson, Dewi
Lappi, Oskar
Ramesh, Srinivasan
Väisälä, Miikka S.
Huck, Kevin
Puro, Touko
Norris, Boyana
Korpi‐Lagg, Maarit
Heljanko, Keijo
Malony, Allen D.
Source :
Concurrency & Computation: Practice & Experience; 8/30/2024, Vol. 36 Issue 19, p1-19, 19p
Publication Year :
2024

Abstract

Summary: With the rise of exascale systems and large, data‐centric workflows, the need to observe and analyze high performance computing (HPC) applications during their execution is becoming increasingly important. HPC applications are typically not designed with online monitoring in mind, therefore, the observability challenge lies in being able to access and analyze interesting events with low overhead while seamlessly integrating such capabilities into existing and new applications. We explore how our service‐based observation, monitoring, and analytics (SOMA) approach to collecting and aggregating both application‐specific diagnostic data and performance data addresses these needs. We present our SOMA framework and demonstrate its viability with LULESH, a hydrodynamics proxy application. Then we focus on Astaroth, a multi‐GPU library for stencil computations, highlighting the integration of the TAU and APEX performance tools and SOMA for application and performance data monitoring. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15320626
Volume :
36
Issue :
19
Database :
Complementary Index
Journal :
Concurrency & Computation: Practice & Experience
Publication Type :
Academic Journal
Accession number :
178592004
Full Text :
https://doi.org/10.1002/cpe.8141