1. Performance optimizations for scalable implicit RANS calculations with SU2
- Author
-
Jongsoo Park, Mikhail Smelyanskiy, Alexander Heinecke, Pradeep Dubey, Gaurav Bansal, Thomas D. Economon, Dheevatsa Mudigere, Francisco Palacios, and Juan J. Alonso
- Subjects
020301 aerospace & aeronautics ,Speedup ,General Computer Science ,Computer science ,General Engineering ,Parallel algorithm ,02 engineering and technology ,Supercomputer ,01 natural sciences ,Generalized minimal residual method ,010305 fluids & plasmas ,Computational science ,Multigrid method ,0203 mechanical engineering ,Shared memory ,0103 physical sciences ,Vectorization (mathematics) ,Single-core - Abstract
In this paper, we present single- and multi-node optimizations of SU2, a widely-used, open-source Computational Fluid Dynamics application, aimed at improving performance and scalability for implicit Reynolds-averaged Navier–Stokes calculations on unstructured grids. Typical industry-standard implementations are currently limited by unstructured accesses, variable degrees of parallelism, as well as the global synchronizations inherent in traditionally used Krylov linear solvers. Therefore, we rely on aggressive single-node optimizations, such as hierarchical parallelism, dynamic threading, compacted memory layout, and vectorization, along with a communication-friendly agglomeration (geometric) linear multigrid solver. Based on results with the well-known ONERA M6 geometry, our single core and shared memory optimizations result in a speedup of 2.6X on the latest 14-core Intel® Xeon™ 1 E5-2697v3 processor when compared to the baseline SU2 implementation with 14 MPI ranks. In multi-node settings, the hybrid OpenMP+MPI multigrid implementation achieves 2X higher parallel efficiency on 256 nodes over conventional Krylov-based (GMRES) methods.
- Published
- 2016