1. An Efficient Hardware/Software Co-Design for FALCON on Low-End Embedded Systems
- Author
-
Yongseok Lee, Jonghee Youn, Kevin Nam, Heon Hui Jung, Myunghyun Cho, Jimyung Na, Jong-Yeon Park, Seungsu Jeon, Bo Gyeong Kang, Hyunyoung Oh, and Yunheung Paek
- Subjects
Post quantum cryptography ,digital signature algorithm ,cryptography ,SW/HW co-design ,FALCON ,accelerator ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
We propose in this paper an efficient FALCON accelerator called EFX based on a HW/SW co-design where FALCON is a post-quantum cryptographic (PQC) scheme tailored as a digital signature algorithm (DSA). Our findings reveal that FALCON exhibits unique characteristics and structures which distinguish it from other PQC-DSAs. A key finding is that, unlike its counterparts, FALCON doesn’t prioritize a single, time-consuming task; instead, it processes a variety of tasks with comparable execution times. Consequently, the conventional methods focusing on accelerating dominant few tasks, which are generally effective for other algorithms, prove less efficient for FALCON, especially concerning the minimization of the silicon area used. To overcome this, we strategically focus on the granular optimization of lower-level operations rather than on broader functional segments, aiming to boost performance while conserving hardware space. Moreover, to mitigate the potential degradation due to limitation of hardware resources, we have implemented a pipelined execution strategy for the FALCON functions and refined the sampling function–a critical task that is challenging to accelerate due to inherent sequential algorithm–enabling it to run concurrently on both software and hardware, thus reducing latency. Our hardware design, synthesized at $300MHz$ using Samsung’s $28nm$ and $45nm$ process technologies, demonstrates superior performance in generating FALCON signatures, with a $3.58 \times $ improvement in clock cycles over an existing hardware accelerator. EFX occupies 38K $um ^{2}$ and 74K $um ^{2}$ for $28nm$ and $45nm$ processes, respectively, comparatively small compared to other PQC accelerators.
- Published
- 2024
- Full Text
- View/download PDF