1. Domain decomposition, irregular applications, and parallel computers.
- Author
-
Tomko, Karen Arnold
- Subjects
- Computer Science
- Abstract
Many large-scale computational problems are based on irregular (unstructured) domains. Some examples are finite element methods in structural analysis, finite volume methods in fluid dynamics, and circuit simulation for VLSI design. Domain decomposition is a common technique for distributing the data and work of irregular scientific applications across a distributed memory parallel machine. To obtain efficiency, subdomains must be constructed such that the work is divided with a reasonable balance among the processors while the communication-causing subdomain boundary is kept small. Application and machine specific information can be used in conjunction with domain decomposition to achieve a level of performance not possible with traditional domain decomposition methods. Application profiling characterizes the performance of an application on a specific machine. We present a method that uses curve-fitting of application profile data to calculate vertex and edge weights for use with weighted graph decomposition algorithms. We demonstrate its potential on two routines from a production finite element application running on the IBM SP2. Our method combined with a multilevel spectral algorithm reduced load imbalance from 52% to less than 10% for one routine in our study. Many irregular applications have several phases, that must be load balanced individually to achieve high overall application performance. We propose finding one decomposition that can be used effectively for each phase of the application, and introduce a decomposition that can be used effectively for each phase of the application, and introduce a decomposition algorithm which load balances according to two vertex weight sets for use on two-phase applications. We show that this dual weight algorithm call be as successful at load balancing two individual routines together as the traditional single weight algorithm is at load balancing each routine independently. Domain decomposition algorithms take a simplistic view of multiprocessor communication. Higher performance can be achieved by considering the communication characteristics of the target multiprocessor in conjunction with decomposition techniques. We provide a methodology for tuning an application for a shared-address space multiprocessor by using intelligent layout of the application data to reduce coherence traffic and employing latency hiding mechanisms to overlap communication with useful work. These techniques have been applied to a finite element radar application running on the Kendall Square KSR1.
- Published
- 1995