Back to Search
Start Over
Algorithmic and language-based optimization of Marsa-LFIB4 pseudorandom number generator using OpenMP, OpenACC and CUDA.
- Source :
-
Journal of Parallel & Distributed Computing . Mar2020, Vol. 137, p238-245. 8p. - Publication Year :
- 2020
-
Abstract
- The aim of this paper is to present new high-performance implementations of Marsa-LFIB4 which is an example of high-quality multiple recursive pseudorandom number generators. We propose an algorithmic approach that combines language-based vectorization techniques together with a new divide-and-conquer parallel method that exploits a special sparse structure of the matrix obtained from the recursive formula that defines the generator. Our portable OpenACC implementation achieves the performance comparable to those achieved by our CUDA-based and OpenMP-based implementations on GPUs and multicore CPUs, respectively. • New algorithmic approach for vectorization and parallelization of recursive PRNGs. • Methodology and guidelines for optimizing recursive computations on GPUs. • Case study how to use shared memory to reduce references to global memory. • Comparison of language-based optimization techniques available in CUDA and OpenACC. • Our CUDA and OpenACC implementations achieve excellent performance on modern GPUs. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 07437315
- Volume :
- 137
- Database :
- Academic Search Index
- Journal :
- Journal of Parallel & Distributed Computing
- Publication Type :
- Academic Journal
- Accession number :
- 141197294
- Full Text :
- https://doi.org/10.1016/j.jpdc.2019.12.004