Back to Search
Start Over
Exploring the interplay of resilience and energy consumption for a task-based partial differential equations preconditioner
- Source :
- Parallel Computing. 73:16-27
- Publication Year :
- 2018
- Publisher :
- Elsevier BV, 2018.
-
Abstract
- We discuss algorithm-based resilience to silent data corruptions (SDCs) in a task-based domain-decomposition preconditioner for partial differential equations (PDEs). The algorithm exploits a reformulation of the PDE as a sampling problem, followed by a solution update through data manipulation that is resilient to SDCs. The implementation is based on a server-client model where all state information is held by the servers, while clients are designed solely as computational units. Scalability tests run up to ∼51 K cores show a parallel efficiency greater than 90%. We use a 2D elliptic PDE and a fault model based on random single and double bit-flip to demonstrate the resilience of the application to synthetically injected SDC. We discuss two fault scenarios: one based on the corruption of all data of a target task, and the other involving the corruption of a single data point. We show that for our application, given the test problem considered, a four-fold increase in the number of faults only yields a 2% change in the overhead to overcome their presence, from 7% to 9%. We then discuss potential savings in energy consumption via dynamic voltage/frequency scaling, and its interplay with fault-rates, and application overhead.
- Subjects :
- 020203 distributed computing
Partial differential equation
Computer Networks and Communications
Computer science
Preconditioner
Domain decomposition methods
010103 numerical & computational mathematics
02 engineering and technology
Energy consumption
Parallel computing
01 natural sciences
Computer Graphics and Computer-Aided Design
Theoretical Computer Science
Artificial Intelligence
Hardware and Architecture
Server
Scalability
0202 electrical engineering, electronic engineering, information engineering
Overhead (computing)
0101 mathematics
Fault model
Frequency scaling
Software
Subjects
Details
- ISSN :
- 01678191
- Volume :
- 73
- Database :
- OpenAIRE
- Journal :
- Parallel Computing
- Accession number :
- edsair.doi...........ac919b319441a99fd02e1e469f69a198
- Full Text :
- https://doi.org/10.1016/j.parco.2017.05.005