1. Area-Time Efficient Architecture of FFT-Based Montgomery Multiplication
- Author
-
Ray C. C. Cheung, Çetin Kaya Koç, Wangchen Dai, and Donald Donglong Chen
- Subjects
Multiplication algorithm ,Modular arithmetic ,020208 electrical & electronic engineering ,Prime-factor FFT algorithm ,Fast Fourier transform ,02 engineering and technology ,Parallel computing ,020202 computer hardware & architecture ,Theoretical Computer Science ,Cyclotomic fast Fourier transform ,Computational Theory and Mathematics ,Split-radix FFT algorithm ,Hardware and Architecture ,Rader's FFT algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Hardware_ARITHMETICANDLOGICSTRUCTURES ,Software ,Twiddle factor ,Mathematics - Abstract
The modular multiplication operation is the most time-consuming operation for number-theoretic cryptographic algorithms involving large integers, such as RSA and Diffie-Hellman. Implementations reveal that more than 75 percent of the time is spent in the modular multiplication function within the RSA for more than 1,024-bit moduli. There are fast multiplier architectures to minimize the delay and increase the throughput using parallelism and pipelining. However such designs are large in terms of area and low in efficiency. In this paper, we integrate the fast Fourier transform (FFT) method into the McLaughlin’s framework, and present an improved FFT-based Montgomery modular multiplication (MMM) algorithm achieving high area-time efficiency. Compared to the previous FFT-based designs, we inhibit the zero-padding operation by computing the modular multiplication steps directly using cyclic and nega-cyclic convolutions. Thus, we reduce the convolution length by half. Furthermore, supported by the number-theoretic weighted transform, the FFT algorithm is used to provide fast convolution computation. We also introduce a general method for efficient parameter selection for the proposed algorithm. Architectures with single and double butterfly structures are designed obtaining low area-latency solutions, which we implemented on Xilinx Virtex-6 FPGAs. The results show that our work offers a better area-latency efficiency compared to the state-of-the-art FFT-based MMM architectures from and above 1,024-bit operand sizes. We have obtained area-latency efficiency improvements up to 50.9 percent for 1,024-bit, 41.9 percent for 2,048-bit, 37.8 percent for 4,096-bit and 103.2 percent for 7,680-bit operands. Furthermore, the operating latency is also outperformed with high clock frequency for length-64 transform and above.
- Published
- 2017
- Full Text
- View/download PDF