7 results on '"Andrey Gorobets"'
Search Results
2. Heterogeneous CPU+GPU parallelization for high-accuracy scale-resolving simulations of compressible turbulent flows on hybrid supercomputers
- Author
-
Pavel Alexeevisch Bakhvalov and Andrey Gorobets
- Subjects
Computer science ,Computation ,Numerical analysis ,Parallel algorithm ,General Physics and Astronomy ,Parallel computing ,Computer Science::Performance ,Stream processing ,Hardware and Architecture ,Computer Science::Mathematical Software ,Polygon mesh ,Enhanced Data Rates for GSM Evolution ,General-purpose computing on graphics processing units ,Computer Science::Distributed, Parallel, and Cluster Computing ,Xeon Phi - Abstract
A heterogeneous parallel algorithm for simulation of compressible turbulent flows and its portable software implementation are presented. The underlying numerical method is based on a family of higher accuracy edge-based reconstruction schemes on unstructured mixed-element meshes. The proposed parallel solution can engage a large number of computing devices of most of the existing computing architectures used in modern supercomputers, including manycore CPUs and GPUs. It is capable of co-execution on both CPUs and accelerators simultaneously. The multilevel parallel algorithm combines: MPI for distributing workload among hybrid cluster nodes and between devices inside nodes; OpenMP for manycore CPUs and other supporting devices, such as Intel Xeon Phi; OpenCL for massively-parallel accelerators, such as GPUs of various vendors, including NVIDIA, AMD, Intel. The main focus is on the adaptation of the numerical method and its computational algorithm to the stream processing parallel paradigm. The very limited device memory inherent in GPU computing is also taken into account. A detailed description of the parallel algorithm is presented, as well as the techniques used for its efficient parallel implementation. Special attention is paid to implicit time integration with its linear solver and calculation of convective fluxes and viscous terms. The use of mixed floating-point precision and overlapping communications and computations is also discussed. Parallel performance is demonstrated in practical applications on different kinds of supercomputers using up to 10 thousand cores and multiple GPUs of comparable overall performance.
- Published
- 2022
- Full Text
- View/download PDF
3. A hierarchical parallel implementation for heterogeneous computing. Application to algebra-based CFD simulations on hybrid supercomputers
- Author
-
F. Xavier Trias, Xavier Álvarez-Farré, Andrey Gorobets, Universitat Politècnica de Catalunya. Doctorat en Enginyeria Tèrmica, Universitat Politècnica de Catalunya. Departament de Màquines i Motors Tèrmics, and Universitat Politècnica de Catalunya. CTTC - Centre Tecnològic de la Transferència de Calor
- Subjects
General Computer Science ,MPI+OpenMP+OpenCL ,Computer science ,CUDA ,Multiprocessing ,Symmetric multiprocessor system ,Parallel CFD ,Computational fluid dynamics ,01 natural sciences ,010305 fluids & plasmas ,Software portability ,Supercomputadors ,SpMV ,0103 physical sciences ,Overhead (computing) ,0101 mathematics ,Hybrid supercomputer ,General Engineering ,Dot product ,Dinàmica de fluids computacional ,Supercomputers ,Supercomputer ,Data structure ,010101 applied mathematics ,Algebra ,CPU+GPU ,Heterogeneous computing ,Enginyeria mecànica::Mecànica de fluids [Àrees temàtiques de la UPC] - Abstract
The quest for new portable implementations of simulation algorithms is motivated by the increasing variety of computing architectures. Moreover, the hybridization of high-performance computing systems imposes additional constraints, since heterogeneous computations are needed to efficiently engage processors and massively-parallel accelerators. This, in turn, involves different parallel paradigms and computing frameworks and requires complex data exchanges between computing units. Typically, simulation codes rely on sophisticated data structures and computing subroutines, so-called kernels, which makes portability terribly cumbersome. Thus, a natural way to achieve portability is to dramatically reduce the complexity of both data structures and computing kernels. In our algebra-based approach, the scale-resolving simulation of incompressible turbulent flows on unstructured meshes relies on three fundamental kernels: the sparse matrix-vector product, the linear combination of vectors and the dot product. It is noteworthy that this approach is not limited to a particular kind of numerical method or a set of governing equations. In our code, an auto-balanced multilevel partitioning distributes workload among computing devices of various architectures. The overlap of computations and multistage communications efficiently hides the data exchanges overhead in large-scale supercomputer simulations. In addition to computing on accelerators, special attention is paid at efficiency on manycore processors in multiprocessor nodes with significant non-uniform memory access factor. Parallel efficiency and performance are studied in detail for different execution modes on various supercomputers using up to 9,600 processor cores and up to 256 graphics processor units. The heterogeneous implementation model described in this work is a general-purpose approach that is well suited for various subroutines in numerical simulation codes. The work of A. G. has been funded by the Russian Sci- ence Foundation, project 19-11-00299. The work of X. Á. F. and F. X. T. has been financially supported by the ANUMESOL project (ENE2017-88697-R) by the Spanish Research Agency, and the FusionCAT project (0 01-P-0 01722) by the Government of Catalo- nia RIS3CAT FEDER. X. Á. F. is supported by a predoctoral contract (2019FI_B2-0 0 076) by the Government of Catalonia. The work has been carried out using the MareNostrum 4 supercomputer of the Barcelona Supercomputing Center; the TSUBAME3.0 supercom-puter of the Global Scientific Information and Computing Center at Tokyo Institute of Technology; the Lomonosov-2 supercomputer of the shared research facilities of HPC computing resources at Lomonosov Moscow State University; the K-60 hybrid cluster of the Collective Usage Centre of KIAM RAS. The authors thankfully acknowledge these institutions.
- Published
- 2021
- Full Text
- View/download PDF
4. Direct numerical simulation of a fully developed turbulent square duct flow up to Reτ=1200
- Author
-
Andrey Gorobets, F. Xavier Trias, Hao Zhang, Assensi Oliva, and Yuanqiang Tan
- Subjects
Fluid Flow and Transfer Processes ,Physics ,Computer simulation ,Meteorology ,Turbulence ,Mechanical Engineering ,Direct numerical simulation ,Laminar sublayer ,Reynolds number ,Mechanics ,Condensed Matter Physics ,symbols.namesake ,symbols ,Mean flow ,Duct (flow) ,Large eddy simulation - Abstract
Various fundamental studies based on a turbulent duct flow have gained popularity including heat transfer, magnetohydrodynamics as well as particle-laden transportation. An accurate prediction on the turbulent flow field is critical for these researches. However, the database of the mean flow and turbulence statistics is fairly insufficient due to the enormous cost of numerical simulation at high Reynolds number. This paper aims at providing available information by conducting several Direct Numerical Simulations (DNS) on turbulent duct flows at Re τ = 300 , 600 , 900 and 1200 . A quantitative comparison between current and previous DNS results was performed where a good agreement was achieved at Re τ = 300 . However, further comparisons of the present results with the previous DNS results at Re τ = 600 obtained with much coarser meshes revealed some discrepancies which can be explained by the insufficient mesh resolution. At last, the mean flow and turbulent statistics at higher Re τ was presented and the effect of Re τ on the mean flow and flow dynamics was discussed.
- Published
- 2015
- Full Text
- View/download PDF
5. An OpenCL-based Parallel CFD Code for Simulations on Hybrid Systems with Massively-parallel Accelerators
- Author
-
F. Xavier Trias, Assensi Oliva, Andrey Gorobets, Universitat Politècnica de Catalunya. Departament de Màquines i Motors Tèrmics, and Universitat Politècnica de Catalunya. CTTC - Centre Tecnològic de la Transferència de Calor
- Subjects
Structured mesh ,Computer science ,GPU ,Parallel CFD ,Parallel computing ,Computational fluid dynamics ,structured mesh ,Computational science ,Algorithmic skeleton ,Computer Science::Operating Systems ,Massively parallel ,Engineering(all) ,Computer Science::Distributed, Parallel, and Cluster Computing ,Finite-volume ,Multi-core processor ,OpenCL ,OpenMP ,Dinàmica de fluids computacional ,General Medicine ,Supercomputer ,Computer Science::Performance ,Hybrid system ,Computer Science::Mathematical Software ,MPI ,Node (circuits) ,Distributed memory ,Xeon Phi ,Enginyeria mecànica::Mecànica de fluids [Àrees temàtiques de la UPC] ,finite-volume - Abstract
A parallel finite-volume CFD algorithm for modeling of incompressible flows on hybrid supercomputers is presented. It is based on a symmetry-preserving high-order numerical scheme for structured meshes. A multilevel approach that combines di erent parallel models is used for large-scale simulations on computing systems with massively-parallel accelerators. MPI is used on the first level within the distributed memory model to couple computing nodes of a supercomputer. On the second level OpenMP is used to engage multiple CPU cores of a computing node. The third level exploits the computing potential of massively-parallel accelerators such as GPU (Graphics Processing Units) of AMD and NVIDIA, or Intel Xeon Phi accelerators of the MIC (Many Integrated Core) architecture. The hardware independent OpenCL standard is used to compute on accelerators of di erent architectures within a general model for a combination of a central processor and a math co-processor.
- Published
- 2013
- Full Text
- View/download PDF
6. OpenCL Implementation of Basic Operations for a High-order Finite-volume Polynomial Scheme on Unstructured Hybrid Meshes
- Author
-
S. A. Soukov, Andrey Gorobets, and P. B. Bogdanov
- Subjects
Scheme (programming language) ,Polynomial ,Finite volume method ,OpenCL ,Computer science ,GPU ,Byte ,OpenMP ,Memory bandwidth ,Parallel CFD ,General Medicine ,Parallel computing ,FLOPS ,Computational science ,unstructured mesh ,Computer Science::Mathematical Software ,MPI ,Polygon mesh ,Implementation ,computer ,Engineering(all) ,finite-volume ,computer.programming_language - Abstract
A parallel finite-volume algorithm based on a cell-centered high-order polynomial scheme for unstructured hybrid meshes is under consideration. The work is focused on the adaptation and optimization of basic operations of the algorithm to different architec- tures of massively-parallel accelerators including GPU of AMD and NVIDIA. Such an algorithm is especially problematic for the GPU architectures since it has very low FLOP per byte ratio meaning that performance is dominated by the memory bandwidth but not the computing performance of a device. At the same time it has irregular memory access pattern since unstructured meshes are used. The calculation of polynomial coefficients and the calculation of convective fluxes through faces of cells are the most interesting and time consuming operations of the algorithm. Implementations of these operations for accelerators using OpenCL are considered here in detail. The ways to improve the computational efficiency are proposed, performance measurement results reaching up to 160 GFLOPS on a single GPU device are demonstrated.
- Published
- 2013
- Full Text
- View/download PDF
7. Direct Numerical Simulation of Incompressible Flows on Unstructured Meshes Using Hybrid CPU/GPU Supercomputers
- Author
-
Assensi Oliva, Guillermo Oyarzun, Oriol Lehmkuhl, Andrey Gorobets, and R. Borrell
- Subjects
Computer science ,business.industry ,CPU/GPU hybrid supercomputers ,Computation ,Direct numerical simulation ,General Medicine ,Parallel computing ,Software_PROGRAMMINGTECHNIQUES ,Computational fluid dynamics ,Computational science ,CUDA ,Scalability ,Code (cryptography) ,direct numerical simulation ,MPI ,Polygon mesh ,Navier-Stokes equations ,Navier–Stokes equations ,business ,Engineering(all) ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
This paper describes a hybrid MPI-CUDA parallelization strategy for the direct numerical simulation of incompressible flows using unstructured meshes. Our in-house MPI-based unstructured CFD code has been extended in order to increase its performance by means of GPU co-processors. Therefore, the main goal of this work is to take advantage of the current hybrid supercomputers to increase our computing capabilities. CUDA is used to perform the calculations on the GPU devices and MPI to handle the communications between them. The main drawback for the performance is the slowdown produced by the MPI communication episodes. Consequently, overlapping strategies, to hide MPI communication costs under GPU computations, are studied in detail with the aim to achieve scalability when executing the code on multiple nodes.
- Published
- 2013
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.