1. HBTM: A Heartbeat-based Behavior Detection Mechanism for POSIX Threads and OpenMP Applications
- Author
-
Wang, Weidong, Liao, Chunhua, Wang, Liqiang, Quinlan, Daniel J., and Lu, Wei
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Extreme-scale computing involves hundreds of millions of threads with multi-level parallelism running on large-scale hierarchical and heterogeneous hardware. In POSIX threads and OpenMP applications, some key behaviors occurring in runtime such as thread failure, busy waiting, and exit need to be accurately and timely detected. However, for the most of these applications, there are lack of unified and efficient detection mechanisms to do this. In this paper, a heartbeat-based behavior detection mechanism for POSIX threads (Pthreads) and OpenMP applications (HBTM) is proposed. In the design, two types of implementations are conducted, centralized and decentralized respectively. In both implementations, unified API has been designed to guarantee the generality of the mechanism. Meanwhile, a ring-based detection algorithm is designed to ease the burden of the centra thread at runtime. To evaluate the mechanism, the NAS Parallel Benchmarks (NPB) are used to test the performance of the HBTM. The experimental results show that the HBTM supports detection of behaviors of POSIX threads and OpenMP applications while acquiring a short latency and near 1% overhead., Comment: 7 pages, 7 figures
- Published
- 2015