Back to Search Start Over

System-Level Resource Monitoring in High-Performance Computing Environments.

Authors :
Agarwala, Sandip
Poellabauer, Christian
Kong, Jiantao
Schwan, Karsten
Wolf, Matthew
Source :
Journal of Grid Computing; Sep2003, Vol. 1 Issue 3, p273-289, 17p
Publication Year :
2003

Abstract

Low-overhead resource monitoring is key to the successful management of distributed high-performance computing environments, particularly when applications have well-defined quality of service (QoS) requirements. The dproc system-level monitoring mechanisms provide tools both for efficiently monitoring system-level events and for notifying remote hosts of events relevant to their operation. Implemented as extension to the Linux kernel, dproc provides several key functions. First, utilizing the familiar /proc virtual filesystem, dproc extends this interface with resource information collected from both local and remote hosts. Second, to predictably capture and distribute monitoring information, dproc uses a kernel-level group communication facility, termed KECho, which implements events and event channels. Third, and the focus of this paper, is dproc's run-time customizability for resource monitoring, which includes the generation and deployment of monitoring functionality within remote operating system kernels. Using dproc, we show that (a) data streams can be customized according to a client's resource availabilities (dynamic stream management), (b) by dynamically varying distributed monitoring (dynamic filtering of monitoring information), an appropriate balance can be maintained between monitoring overheads and application quality, and (c) by performing monitoring at kernel-level, the information captured enables decision making that takes into account the multiple resources used by applications. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15707873
Volume :
1
Issue :
3
Database :
Complementary Index
Journal :
Journal of Grid Computing
Publication Type :
Academic Journal
Accession number :
49625360
Full Text :
https://doi.org/10.1023/B:GRID.0000035189.80518.5d