1. C2CU: a CUDA C program generator for bulk execution of a sequential algorithm.
- Author
-
Takafuji, Daisuke, Nakano, Koji, Ito, Yasuaki, and Bordim, Jacir
- Subjects
PARALLEL computers ,COMPUTER software ,SEQUENTIAL analysis ,ALGORITHMS ,SIGNAL processing ,SORTING (Electronic computers) ,DYNAMIC programming - Abstract
Several important tasks, including matrix computation, signal processing, sorting, dynamic programming, encryption, and decryption, can be performed by oblivious sequential algorithms. A sequential algorithm is oblivious if an address accessed at each time does not depend on the input data. A bulk execution of a sequential algorithm is to execute it for many independent inputs in turn or in parallel. A number of works have been devoted to design and implement parallel algorithms for a single input. However, none of these works evaluated the bulk execution performance of these algorithms. The first contribution of this paper is to present a time-optimal implementation for bulk execution of an oblivious sequential algorithm. Our second contribution is to develop a tool, named C2CU, which automatically generates a CUDA C program for a bulk execution of an oblivious sequential algorithm. The C2CU has been used to generate CUDA C programs for the bulk execution of the bitonic sorting, Floyd-Warshall, and Montgomery modulo multiplication algorithms. Compared to a sequential implementation on a single CPU, the generated CUDA C programs for the above algorithms run, respectively, 199, 54, and 78 times faster. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF