Back to Search Start Over

Using Compiler Directives to Port Large Scientific Applications to GPUs: An Example from Atmospheric Science.

Authors :
Lapillonne, Xavier
Fuhrer, Oliver
Source :
Parallel Processing Letters. Mar2014, Vol. 24 Issue 1, p-1. 18p.
Publication Year :
2014

Abstract

For many scientific applications, Graphics Processing Units (GPUs) can be an interesting alternative to conventional CPUs as they can deliver higher memory bandwidth and computing power. While it is conceivable to re-write the most execution time intensive parts using a low-level API for accelerator programming, it may not be feasible to do it for the entire application. But, having only selected parts of the application running on the GPU requires repetitively transferring data between the GPU and the host CPU, which may lead to a serious performance penalty. In this paper we assess the potential of compiler directives, based on the OpenACC standard, for porting large parts of code and thus achieving a full GPU implementation. As an illustrative and relevant example, we consider the climate and numerical weather prediction code COSMO (Consortium for Small Scale Modeling) and focus on the physical parametrizations, a part of the code which describes all physical processes not accounted for by the fundamental equations of atmospheric motion. We show, by porting three of the dominant parametrization schemes, the radiation, microphysics and turbulence parametrizations, that compiler directives are an efficient tool both in terms of final execution time as well as implementation effort. Compiler directives enable to port large sections of the existing code with minor modifications while still allowing for further optimizations for the most performance critical parts. With the example of the radiation parametrization, which contains the solution of a block tri-diagonal linear system, the required code modifications and key optimizations are discussed in detail. Performance tests for the three physical parametrizations show a speedup of between 3× and 7× for execution time obtained on a GPU and on a multi-core CPU of an equivalent generation. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01296264
Volume :
24
Issue :
1
Database :
Academic Search Index
Journal :
Parallel Processing Letters
Publication Type :
Academic Journal
Accession number :
95123382
Full Text :
https://doi.org/10.1142/S0129626414500030