30 results on '"A. R. Brodtkorb"'
Search Results
2. VISUALIZATION OF MARINE SAND DUNE DISPLACEMENTS UTILIZING MODERN GPU TECHNIQUES
- Author
-
T. Gierlinger, A. R. Brodtkorb, A. Stumpf, M. Weiler, and F. Michel
- Subjects
Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Applied optics. Photonics ,TA1501-1820 - Abstract
Quantifying and visualizing deformation and material fluxes is an indispensable tool for many geoscientific applications at different scales comprising for example global convective models (Burstedde et al., 2013), co-seismic slip (Leprince et al., 2007) or local slope deformation (Stumpf et al., 2014b). Within the European project IQmulus (http://www.iqmulus.eu) a special focus is laid on the efficient detection and visualization of submarine sand dune displacements. In this paper we present our approaches on the visualization of the calculated displacements utilizing modern GPU techniques to enable the user to interactively analyze intermediate and final results within the whole workflow.
- Published
- 2015
- Full Text
- View/download PDF
3. Performance and Energy Efficiency of CUDA and OpenCL for GPU Computing Using Python.
- Author
-
Håvard H. Holm, André R. Brodtkorb, and Martin Lilleeng Sætra
- Published
- 2019
- Full Text
- View/download PDF
4. Estimating volcanic ash emissions using retrieved satellite ash columns and inverse ash transport modelling.
- Author
-
André R. Brodtkorb, Anna Benedictow, Heiko Klein, Arve Kylling, Agnes Nyiri, Alvaro Valdebenito, and Espen Sollum
- Published
- 2020
5. Simplified Ocean Models on the GPU.
- Author
-
André R Brodtkorb
- Published
- 2018
6. Real-World Oceanographic Simulations on the GPU using a Two-Dimensional Finite-Volume Scheme.
- Author
-
André R. Brodtkorb and Håvard Heitlo Holm
- Published
- 2019
7. GPU Computing with Python: Performance, Energy Efficiency and Usability.
- Author
-
Håvard H. Holm, André R. Brodtkorb, and Martin Lilleeng Sætra
- Published
- 2019
8. Estimating volcanic ash emissions using retrieved satellite ash columns and inverse ash transport modelling using VolcanicAshInversion v1.2.1, within the operational eEMEP volcanic plume forecasting system (version rv4_17)
- Author
-
André R. Brodtkorb, Anna Benedictow, Heiko Klein, Arve Kylling, Agnes Nyiri, Alvaro Valdebenito, Espen Sollum, and Nina Kristiansen
- Abstract
Accurate modelling of ash clouds from volcanic eruptions requires knowledge about the eruption source parameters including eruption onset, duration, mass eruption rates, particle size distribution, and vertical emission profiles. However, most of these parameters are unknown and must be estimated somehow. Some are estimated based on observed correlations and known volcano parameters. However, a more accurate estimate is often needed to bring the model into closer agreement to observations. This paper describes the inversion procedure implemented at the Norwegian Meteorological Institute for estimating ash emission rates from retrieved satellite ash column amounts and a priori knowledge. The overall procedure consists of five stages: (1) generate a priori emission estimates; (2) run forward simulations with a set of unit emission profiles; (3) collocate/match observations with emission simulations; (4) build system of linear equations; and (5) solve overdetermined system. We go through the mathematical foundations for the inversion procedure, performance for synthetic cases, and performance for real-world cases. The novelties of this paper includes a memory efficient formulation of the inversion problem, a detailed description and illustrations of the mathematical formulations, evaluation of the inversion method using synthetic known truth data as well as real data, and inclusion of observations of ash cloud-top height. The source code used in this work is freely available under an open source license, and is possible to use for other similar applications.
- Published
- 2023
- Full Text
- View/download PDF
9. GPU Computing with Python: Performance, Energy Efficiency and Usability.
- Author
-
Håvard H. Holm, André R. Brodtkorb, and Martin Lilleeng Sætra
- Published
- 2020
- Full Text
- View/download PDF
10. GPU computing in discrete optimization. Part I: Introduction to the GPU.
- Author
-
André R. Brodtkorb, Trond R. Hagen, Christian Schulz 0002, and Geir Hasle
- Published
- 2013
- Full Text
- View/download PDF
11. GPU computing in discrete optimization. Part II: Survey focused on routing problems.
- Author
-
Christian Schulz 0002, Geir Hasle, André R. Brodtkorb, and Trond R. Hagen
- Published
- 2013
- Full Text
- View/download PDF
12. Performance and Energy Efficiency of CUDA and OpenCL for GPU Computing Using Python
- Author
-
Håvard Heitlo Holm, Martin L. Sætra, and André R. Brodtkorb
- Subjects
CUDA ,Computer science ,Parallel computing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Software_PROGRAMMINGTECHNIQUES ,General-purpose computing on graphics processing units ,Python (programming language) ,computer ,ComputingMethodologies_COMPUTERGRAPHICS ,Efficient energy use ,computer.programming_language - Abstract
In this work, we examine the performance and energy efficiency when using Python for developing HPC codes running on the GPU. We investigate the portability of performance and energy efficiency between CUDA and OpenCL; between GPU generations; and between low-end, mid-range and high-end GPUs. Our findings show that for some combinations of GPU and GPU code, there is a significant speedup for CUDA over OpenCL, but that this does not hold in general. Our experiments show that performance in general varies more between different GPUs, than between using CUDA and OpenCL. Finally, we show that tuning for performance is a good way of tuning for energy efficiency.
- Published
- 2020
- Full Text
- View/download PDF
13. Data Assimilation for Ocean Drift Trajectories Using Massive Ensembles and GPUs
- Author
-
Håvard Heitlo Holm, André R. Brodtkorb, and Martin L. Sætra
- Subjects
Work (thermodynamics) ,Computer science ,Finite volume methods ,Shallow water simulations ,Nonlinear system ,Data assimilation ,Resampling ,Particle filters ,Particle filter ,Algorithm ,Trajectory (fluid mechanics) ,Scaling ,Physics::Atmospheric and Oceanic Physics - Abstract
In this work, we perform fully nonlinear data assimilation of ocean drift trajectories using multiple GPUs. We use an ensemble of up to 10000 members and the sequential importance resampling algorithm to assimilate observations of drift trajectories into the underlying shallow-water simulation model. Our results show an improved drift trajectory forecast using data assimilation for a complex and realistic simulation scenario, and the implementation exhibits good weak and strong scaling. This work is supported by the Research Council of Norway (RCN) through grant number 250935 (GPU Ocean). The computations in this paper were performed on equipment provided by the Experimental Infrastructure for Exploration of Exascale Computing (eX3 ), which is financially supported by the RCN under contract 270053. The source code for the methods and experiments described in this paper is available under an GNU free and open source license released under https://doi.org/10.5281/zenodo.3591850.
- Published
- 2020
- Full Text
- View/download PDF
14. Comparison Between Algebraic Multigrid and Multilevel Multiscale Methods for Reservoir Simulation
- Author
-
André R. Brodtkorb, K. Bao, Halvor Møll Nilsen, Olav Møyner, Arthur Moncorgé, and Knut-Andreas Lie
- Subjects
CUDA ,Multigrid method ,Rate of convergence ,Preconditioner ,Computer science ,Domain decomposition methods ,Basis function ,Parallel computing ,Solver ,Smoothing - Abstract
Summary Multiscale methods for solving strongly heterogenous systems in reservoirs have a long history from the early ideas used on incompressible flow to the newly released version in commercial simulation. Much effort has been put into making the MsFV method work for fully unstructured multiphase problems. The MsRSB version is a newly developed version, which tackles most of the "real" world problems. It is to our knowledge, the only multiscale method that has been released in a commercial simulator. You can alternatively see the method as a variant of smoothed aggregation or as an iterative approach to AMG with energy minimizing basis functions. This will be discussed in detail. So far, most work on comparing MsRSB with AMG methods has been on qualitative performance measures like iteration number rather than on pure runtime on fair code implementation. We discuss the theoretical performance and show the practical performance for our implementation. Here, we compare performance of pure AMG, standard two-level MsRSB with pure AMG as coarse solver, as well as a new truly multilevel MsRSB scheme. Our implementation uses the DUNE-ISTL framework. To limit the scope of the discussion we restrict our assessment to AMG with aggregation and smoothed aggregation and the MsRSB method. These three methods are closely related and are primarily distinguished in a preconditioner setting by the coarsening factors used, and the degree of smoothing applied to the basis. We also compare with other state-of-the-art AMG implementations, but do not investigate combinations of them with the MSRB method. For the MsRSB method, we also discuss practical considerations in different parallelization regimes including domain decomposition using MPI, shared memory using OpenMP, and GPU acceleration with CUDA. All comparisons will focus on the setting in which many similar systems should be solved, e.g. during a large-scale, multiphase flow simulation. That is, our emphasis is on the performance of updating a preconditioner and on the apply time for the preconditioner relative to the convergence rate. Performance of the solvers will be tested for pure parabolic/elliptic problems that either arise as part of a sequential splitting procedure or as a pseudo-elliptic preconditioner/solver as a part of a CPR preconditioner for a multiphase system, for which block ILU0 is used as the outer smoother.
- Published
- 2020
- Full Text
- View/download PDF
15. Coastal ocean forecasting on the GPU using a two-dimensional finite-volume scheme
- Author
-
HÅvard Heitlo Holm and André R. Brodtkorb
- Subjects
Scheme (programming language) ,Atmospheric Science ,Finite volume method ,High-resolution finite-volume methods ,Computer science ,Realistic use cases ,Ocean forecasting ,realistic use cases ,GC1-1581 ,GPU computing ,Oceanography ,high-resolution finite-volume methods ,shallow-water equations ,Computational science ,Work (electrical) ,Meteorology. Climatology ,QC851-999 ,General-purpose computing on graphics processing units ,Shallow-water equations ,Shallow water equations ,computer ,Physics::Atmospheric and Oceanic Physics ,gpu computing ,ComputingMethodologies_COMPUTERGRAPHICS ,computer.programming_language - Abstract
In this work, we take a modern high-resolution finite-volume scheme for solving the rotational shallow-water equations and extend it with features required to run real-world ocean simulations. Our contributions include a spatially varying north vector and Coriolis term required for large scale domains, moving wet-dry fronts, a static land mask, bottom shear stress, wind forcing, boundary conditions for nesting in a global model, and an efficient model reformulation that makes it well-suited for massively parallel implementations. Our model order is verified using a grid convergence test, and we show numerical experiments using three different sections along the coast of Norway based on data originating from operational forecasts run at the Norwegian Meteorological Institute. Our simulation framework shows perfect weak scaling on a modern P100 GPU, and is capable of providing tidal wave forecasts that are very close to the operational model at a fraction of the cost. All source code and data used in this work are publicly available under open licenses. This research has mainly been funded by the Research Council of Norway under grant number 250935 (GPU Ocean), and partly by grant number 310515 (Havvarsel). The GPU Ocean project has received support in form of compute time on UNINETT Sigma2 - the National Infrastructure for High Performance Computing and Data Storage in Norway under project number nn9550k.
- Published
- 2021
- Full Text
- View/download PDF
16. GPU Computing with Python: Performance, Energy Efficiency and Usability
- Author
-
Håvard Heitlo Holm, Martin L. Sætra, and André R. Brodtkorb
- Subjects
FOS: Computer and information sciences ,General Computer Science ,Computer science ,CUDA ,010103 numerical & computational mathematics ,02 engineering and technology ,Parallel computing ,Graphic processing units ,Software_PROGRAMMINGTECHNIQUES ,01 natural sciences ,lcsh:QA75.5-76.95 ,Theoretical Computer Science ,Computing unified device architecture ,Shallow water simulations ,Software portability ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Power efficiency ,0101 mathematics ,computer.programming_language ,OpenCL ,business.industry ,Applied Mathematics ,65Y05, 68U20 ,high-performance computing ,Usability ,Python (programming language) ,shallow-water simulation ,GPU computing ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Power efficiency ,Computer Science::Performance ,Computer Science::Graphics ,Computer Science - Distributed, Parallel, and Cluster Computing ,Modeling and Simulation ,Shallow-water simulation ,Computer Science::Mathematical Software ,Open compute languages ,lcsh:Electronic computers. Computer science ,High performance computing ,Distributed, Parallel, and Cluster Computing (cs.DC) ,General-purpose computing on graphics processing units ,business ,power efficiency ,computer ,Efficient energy use - Abstract
In this work, we examine the performance, energy efficiency, and usability when using Python for developing high-performance computing codes running on the graphics processing unit (GPU). We investigate the portability of performance and energy efficiency between Compute Unified Device Architecture (CUDA) and Open Compute Language (OpenCL), between GPU generations, and between low-end, mid-range, and high-end GPUs. Our findings showed that the impact of using Python is negligible for our applications, and furthermore, CUDA and OpenCL applications tuned to an equivalent level can in many cases obtain the same computational performance. Our experiments showed that performance in general varies more between different GPUs than between using CUDA and OpenCL. We also show that tuning for performance is a good way of tuning for energy efficiency, but that specific tuning is needed to obtain optimal energy efficiency.
- Published
- 2019
- Full Text
- View/download PDF
17. VISUALIZATION OF MARINE SAND DUNE DISPLACEMENTS UTILIZING MODERN GPU TECHNIQUES
- Author
-
M. Weiler, A. Stumpf, André R. Brodtkorb, F. Michel, and T. Gierlinger
- Subjects
lcsh:Applied optics. Photonics ,lcsh:T ,modern GPU techniques ,Submarine ,lcsh:TA1501-1820 ,geological displacements ,Slip (materials science) ,lcsh:Technology ,Sand dune stabilization ,Visualization ,Geography ,Workflow ,lcsh:TA1-2040 ,Computer graphics (images) ,lcsh:Engineering (General). Civil engineering (General) ,Interactive visualization ,interactive visualization - Abstract
Quantifying and visualizing deformation and material fluxes is an indispensable tool for many geoscientific applications at different scales comprising for example global convective models (Burstedde et al., 2013), co-seismic slip (Leprince et al., 2007) or local slope deformation (Stumpf et al., 2014b). Within the European project IQmulus (http://www.iqmulus.eu) a special focus is laid on the efficient detection and visualization of submarine sand dune displacements. In this paper we present our approaches on the visualization of the calculated displacements utilizing modern GPU techniques to enable the user to interactively analyze intermediate and final results within the whole workflow.
- Published
- 2015
18. Efficient GPU-Implementation of Adaptive Mesh Refinement for the Shallow-Water Equations
- Author
-
Knut-Andreas Lie, Martin L. Sætra, and André R. Brodtkorb
- Subjects
Numerical Analysis ,Adaptive mesh refinement ,Computer science ,Applied Mathematics ,General Engineering ,Domain decomposition methods ,Parallel computing ,Grid ,Stencil ,Theoretical Computer Science ,Computational Mathematics ,CUDA ,Computational Theory and Mathematics ,General-purpose computing on graphics processing units ,Shallow water equations ,Software ,ComputingMethodologies_COMPUTERGRAPHICS ,Block (data storage) - Abstract
The shallow-water equations model hydrostatic flow below a free surface for cases in which the ratio between the vertical and horizontal length scales is small and are used to describe waves in lakes, rivers, oceans, and the atmosphere. The equations admit discontinuous solutions, and numerical solutions are typically computed using high-resolution schemes. For many practical problems, there is a need to increase the grid resolution locally to capture complicated structures or steep gradients in the solution. An efficient method to this end is adaptive mesh refinement (AMR), which recursively refines the grid in parts of the domain and adaptively updates the refinement as the simulation progresses. Several authors have demonstrated that the explicit stencil computations of high-resolution schemes map particularly well to many-core architectures seen in hardware accelerators such as graphics processing units (GPUs). Herein, we present the first full GPU-implementation of a block-based AMR method for the second-order Kurganov---Petrova central scheme. We discuss implementation details, potential pitfalls, and key insights, and present a series of performance and accuracy tests. Although it is only presented for a particular case herein, we believe our approach to GPU-implementation of AMR is transferable to other hyperbolic conservation laws, numerical schemes, and architectures similar to the GPU.
- Published
- 2014
- Full Text
- View/download PDF
19. Graphics processing unit (GPU) programming strategies and trends in GPU computing
- Author
-
Martin L. Sætra, Trond Runar Hagen, and André R. Brodtkorb
- Subjects
Profiling (computer programming) ,Multimedia ,Computer Networks and Communications ,Computer science ,Process (engineering) ,media_common.quotation_subject ,Graphics processing unit ,Symmetric multiprocessor system ,Parallel computing ,computer.software_genre ,Theoretical Computer Science ,Stream processing ,Debugging ,Artificial Intelligence ,Hardware and Architecture ,Graphics ,General-purpose computing on graphics processing units ,Metaheuristic ,computer ,Software ,ComputingMethodologies_COMPUTERGRAPHICS ,media_common - Abstract
Over the last decade, there has been a growing interest in the use of graphics processing units (GPUs) for non-graphics applications. From early academic proof-of-concept papers around the year 2000, the use of GPUs has now matured to a point where there are countless industrial applications. Together with the expanding use of GPUs, we have also seen a tremendous development in the programming languages and tools, and getting started programming GPUs has never been easier. However, whilst getting started with GPU programming can be simple, being able to fully utilize GPU hardware is an art that can take months or years to master. The aim of this article is to simplify this process, by giving an overview of current GPU programming strategies, profile-driven development, and an outlook to future trends.
- Published
- 2013
- Full Text
- View/download PDF
20. Efficient shallow water simulations on GPUs: Implementation, visualization, verification, and validation
- Author
-
Martin L. Sætra, Mustafa S. Altinakar, and André R. Brodtkorb
- Subjects
CUDA ,General Computer Science ,Real-time simulation ,Computer science ,General Engineering ,Symmetric multiprocessor system ,Graphics ,Shallow water equations ,Single-precision floating-point format ,Computational science ,Visualization ,Verification and validation - Abstract
In this paper, we present an efficient implementation of a state-of-the-art high-resolution explicit scheme for the shallow water equations on graphics processing units. The selected scheme is well-balanced, supports dry states, and is particularly suitable for implementation on graphics processing units. We verify and validate our implementation, and show that use of efficient single precision hardware is sufficiently accurate for real-world simulations. Our framework further supports real-time visualization with both photorealistic and non-photorealistic display of the physical quantities. We present performance results showing that we can accurately simulate the first 4000 s of the Malpasset dam break case in 27 s using over 480,000 cells ( dx = dy = 15 m), in which our simulator runs at an average of 530 megacells per second.
- Published
- 2012
- Full Text
- View/download PDF
21. Simulation and visualization of the Saint-Venant system using GPUs
- Author
-
Trond Runar Hagen, Knut-Andreas Lie, Jostein R. Natvig, and André R. Brodtkorb
- Subjects
State variable ,Computer science ,General Engineering ,Bilinear interpolation ,Grid ,Theoretical Computer Science ,Visualization ,Rendering (computer graphics) ,Computational science ,CUDA ,Computational Theory and Mathematics ,Modelling and Simulation ,Modeling and Simulation ,Scalability ,Computer Vision and Pattern Recognition ,High-resolution scheme ,Algorithm ,Engineering(all) ,Software ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
We consider three high-resolution schemes for computing shallow-water waves as described by the Saint-Venant system and discuss how to develop highly efficient implementations using graphical processing units (GPUs). The schemes are well-balanced for lake-at-rest problems, handle dry states, and support linear friction models. The first two schemes handle dry states by switching variables in the reconstruction step, so that bilinear reconstructions are computed using physical variables for small water depths and conserved variables elsewhere. In the third scheme, reconstructed slopes are modified in cells containing dry zones to ensure non-negative values at integration points. We discuss how single and double-precision arithmetics affect accuracy and efficiency, scalability and resource utilization for our implementations, and demonstrate that all three schemes map very well to current GPU hardware. We have also implemented direct and close-to-photo-realistic visualization of simulation results on the GPU, giving visual simulations with interactive speeds for reasonably-sized grids.
- Published
- 2010
- Full Text
- View/download PDF
22. State-of-the-art in Heterogeneous Computing
- Author
-
Jon M. Hjelmervik, Christopher Dyken, Trond Runar Hagen, André R. Brodtkorb, and Olaf O. Storaasli
- Subjects
Workstation ,Cost efficiency ,business.industry ,Computer science ,Symmetric multiprocessor system ,Computer Science Applications ,law.invention ,QA76.75-76.765 ,Software ,Computer architecture ,law ,Parallelism (grammar) ,Computer software ,State (computer science) ,Graphics ,Field-programmable gate array ,business - Abstract
Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, available software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.
- Published
- 2010
- Full Text
- View/download PDF
23. Real-time online camera synchronization for volume carving on GPU
- Author
-
Anna Kim, Torkel Andreas Haufmann, André R. Brodtkorb, and A. Berge
- Subjects
Carving ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Volume (computing) ,Iterative reconstruction ,Synchronization ,Set (abstract data type) ,Computer graphics ,Computer graphics (images) ,Calibration ,Computer vision ,Artificial intelligence ,State (computer science) ,business - Abstract
Volume carving is a well-known technique for reconstructing a 3D scene from a set of 2D images, using features detected in individual cameras, and camera parameters. Spatial calibration of the cameras is well understood, but the resulting carved volume is very sensitive to temporal offsets between the cameras. Automatic synchronization between the cameras is therefore desirable. In this paper, we present a highly efficient implementation of volume carving and synchronization on a heterogeneous system fitted with commodity GPUs using an improved version of the algorithm in [1]. An online, real-time synchronization system is described and evaluated on surveillance video of an indoor scene. Improvements to the state of the art CPU-based algorithms are described.
- Published
- 2013
- Full Text
- View/download PDF
24. GPU Computing in Discrete Optimization Part I: Introduction to the GPU
- Author
-
Trond Runar Hagen, André R. Brodtkorb, Geir Hasle, and Christian Schulz
- Subjects
Exploit ,Computer science ,Transportation ,Symmetric multiprocessor system ,Parallel computing ,Management Science and Operations Research ,Supercomputer ,Stream processing ,CUDA ,Computer architecture ,Modeling and Simulation ,Discrete optimization ,Vehicle routing problem ,General-purpose computing on graphics processing units - Abstract
In many cases there is still a large gap between the performance of current optimization technology and the requirements of real world applications. As in the past, performance will improve through a combination of more powerful solution methods and a general performance increase of computers. These factors are not independent. Due to physical limits, hardware development no longer results in higher speed for sequential algorithms, but rather in increased parallelism. Modern commodity PCs include a multi-core CPU and at least one GPU, providing a low cost, easily accessible heterogeneous environment for high performance computing. New solution methods that combine task parallelization and stream processing are needed to fully exploit modern computer architectures and profit from future hardware developments. This paper is the first part of a series of two, where the goal of this first part is to give a tutorial style introduction to modern PC architectures and GPU programming. We start with a short historical account of modern mainstream computer architectures, and a brief description of parallel computing. This is followed by the evolution of modern GPUs, before a GPU programming example is given. Strategies and guidelines for program development are also discussed. Part II gives a broad survey of the existing literature on parallel computing targeted at modern PCs in discrete optimization, with special focus on papers on routing problems. We conclude with lessons learnt, directions for future research, and prospects.
- Published
- 2013
25. Shallow Water Simulations on Multiple GPUs
- Author
-
Martin L. Sætra and André R. Brodtkorb
- Subjects
Waves and shallow water ,CUDA ,Finite volume method ,Computer science ,Computation ,Domain decomposition methods ,Parallel computing ,Scaling ,Shallow water equations ,Computational science ,Flooding (computer networking) - Abstract
We present a state-of-the-art shallow water simulator running on multiple GPUs. Our implementation is based on an explicit high-resolution finite volume scheme suitable for modeling dam breaks and flooding. We use row domain decomposition to enable multi-GPU computations, and perform traditional CUDA block decomposition within each GPU for further parallelism. Our implementation shows near perfect weak and strong scaling, and enables simulation of domains consisting of up-to 235 million cells at a rate of over 1.2 gigacells per second using four Fermi-generation GPUs. The code is thoroughly benchmarked using three different systems, both high-performance and commodity-level systems.
- Published
- 2012
- Full Text
- View/download PDF
26. A Comparison of Three Commodity-Level Parallel Architectures: Multi-core CPU, Cell BE and GPU
- Author
-
Trond Runar Hagen and André R. Brodtkorb
- Subjects
Multi-core processor ,Motion JPEG ,Shared memory ,Computer science ,Discrete cosine transform ,Inpainting ,Parallel computing ,Mandelbrot set ,Graphics ,Direct memory access - Abstract
We explore three commodity parallel architectures: multi-core CPUs, the Cell BE processor, and graphics processing units. We have implemented four algorithms on these three architectures: solving the heat equation, inpainting using the heat equation, computing the Mandelbrot set, and MJPEG movie compression. We use these four algorithms to exemplify the benefits and drawbacks of each parallel architecture.
- Published
- 2010
- Full Text
- View/download PDF
27. The Graphics Processor as a Mathematical Coprocessor in MATLAB
- Author
-
André R. Brodtkorb
- Subjects
Numerical linear algebra ,Coprocessor ,Computer science ,Interface (computing) ,MathematicsofComputing_NUMERICALANALYSIS ,Graphics processing unit ,Parallel computing ,computer.software_genre ,Single-precision floating-point format ,Computational science ,Central processing unit ,Graphics ,MATLAB ,Computer Science::Operating Systems ,computer ,computer.programming_language - Abstract
We present an interface to the graphics processing unit (GPU) from MATLAB, and four algorithms from numerical linear algebra available through this interface; matrix-matrix multiplication, Gauss-Jordan elimination, PLU factorization, and tridiagonal Gaussian elimination. In addition to being a high level abstraction to the GPU, the interface offers background processing, enabling computations to be executed on the CPU simultaneously. The algorithms are shown to be up-to 31 times faster than highly optimized CPU code. The algorithms have only been tested on single precision hardware, but will easily run on new double precision hardware.
- Published
- 2008
- Full Text
- View/download PDF
28. Erratum to 'Efficient shallow water simulations on GPUs: Implementation, visualization, verification and validation' [Comp Fluids 55 (2012) 1–12]
- Author
-
André R. Brodtkorb, Mustafa S. Altinakar, and Martin L. Sætra
- Subjects
Waves and shallow water ,General Computer Science ,Computer science ,General Engineering ,Visualization ,Computational science ,Verification and validation - Published
- 2012
- Full Text
- View/download PDF
29. Simulating the Euler equations on multiple GPUs using Python
- Author
-
André R. Brodtkorb and Martin L. Sætra
- Subjects
GPU computing ,CFD ,conservation laws ,finite-volume methods ,Python ,CUDA ,Physics ,QC1-999 - Abstract
GPUs have become a household name in High Performance Computing (HPC) systems over the last 15 years. However, programming GPUs is still largely a manual and arduous task, which requires expert knowledge of the physics, mathematics, and computer science involved. Even though there have been large advances in automatic parallelization and GPU execution of serial code, it is still difficult to fully utilize the GPU hardware with such approaches. Many core numeric GPU codes are therefore still mostly written using low level C/C++ or Fortran for the host code. Several studies have shown that using higher level languages, such as Python, can make software development faster and with fewer bugs. We have developed a simulator based on PyCUDA and mpi4py in Python for solving the Euler equations on Cartesian grids. Our framework utilizes the GPU, and can automatically run on clusters using MPI as well as on shared-memory systems. Our framework allows the programmer to implement low-level details in CUDA C/C++, which is important to achieve peak performance, whilst still benefiting from the productivity of Python. We show that our framework achieves good weak and strong scaling. Our weak scaling achieves more than 94% efficiency on a shared-memory GPU system and more than 90% efficiency on a distributed-memory GPU system, and our strong scaling is close to perfect on both shared-memory and distributed-memory GPU systems.
- Published
- 2022
- Full Text
- View/download PDF
30. Coastal ocean forecasting on the GPU using a two-dimensional finite-volume scheme
- Author
-
André R. Brodtkorb and HÅvard Heitlo Holm
- Subjects
shallow-water equations ,oceanography ,gpu computing ,realistic use cases ,high-resolution finite-volume methods ,Oceanography ,GC1-1581 ,Meteorology. Climatology ,QC851-999 - Abstract
In this work, we take a modern high-resolution finite-volume scheme for solving the rotational shallow-water equations and extend it with features required to run real-world ocean simulations. Our contributions include a spatially varying north vector and Coriolis term required for large scale domains, moving wet-dry fronts, a static land mask, bottom shear stress, wind forcing, boundary conditions for nesting in a global model, and an efficient model reformulation that makes it well-suited for massively parallel implementations. Our model order is verified using a grid convergence test, and we show numerical experiments using three different sections along the coast of Norway based on data originating from operational forecasts run at the Norwegian Meteorological Institute. Our simulation framework shows perfect weak scaling on a modern P100 GPU, and is capable of providing tidal wave forecasts that are very close to the operational model at a fraction of the cost. All source code and data used in this work are publicly available under open licenses.
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.