1. OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection
- Author
-
Cui, Fan, Yin, Chenyang, Zhou, Kexing, Xiao, Youwei, Sun, Guangyu, Xu, Qiang, Guo, Qipeng, Song, Demin, Lin, Dahua, Zhang, Xingcheng, Yun, and Liang
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Recent studies have demonstrated the significant potential of Large Language Models (LLMs) in generating Register Transfer Level (RTL) code, with notable advancements showcased by commercial models such as GPT-4 and Claude3-Opus. However, these proprietary LLMs often raise concerns regarding privacy and security. While open-source LLMs offer solutions to these concerns, they typically underperform commercial models in RTL code generation tasks, primarily due to the scarcity of high-quality open-source RTL datasets. To address this challenge, we introduce OriGen , a fully open-source framework that incorporates self-reflection capabilities and a novel dataset augmentation methodology for generating high-quality, large-scale RTL code. Our approach employs a code-tocode augmentation technique to enhance the quality of open-source RTL code datasets. Furthermore, OriGen can rectify syntactic errors through a self-reflection process that leverages compiler feedback. Experimental results demonstrate that OriGen significantly outperforms other open-source alternatives in RTL code generation. It surpasses the previous best-performing open-source LLM by 12.8% and even exceeds GPT-4 Turbo in the pass@1 metric on the VerilogEval-Human benchmark. Moreover, OriGen exhibits superior capabilities in self-reflection and error correction, outperforming GPT-4 by 19.9% on a benchmark designed to evaluate self-reflection capabilities.
- Published
- 2024