Back to Search Start Over

Enhancing Software Code Vulnerability Detection Using GPT-4o and Claude-3.5 Sonnet: A Study on Prompt Engineering Techniques.

Authors :
Bae, Jaehyeon
Kwon, Seoryeong
Myeong, Seunghwan
Source :
Electronics (2079-9292); Jul2024, Vol. 13 Issue 13, p2657, 16p
Publication Year :
2024

Abstract

This study investigates the efficacy of advanced large language models, specifically GPT-4o, Claude-3.5 Sonnet, and GPT-3.5 Turbo, in detecting software vulnerabilities. Our experiment utilized vulnerable and secure code samples from the NIST Software Assurance Reference Dataset (SARD), focusing on C++, Java, and Python. We employed three distinct prompting techniques as follows: Concise, Tip Setting, and Step-by-Step. The results demonstrate that GPT-4o and Claude-3.5 Sonnet significantly outperform GPT-3.5 Turbo in vulnerability detection. GPT-4o showed the highest improvement with the Step-by-Step prompt, achieving an F1 score of 0.9072. Claude-3.5 Sonnet exhibited consistent high performance across all prompt types, with its Step-by-Step prompt yielding the best overall results (F1 score: 0.8933, AUC: 0.74). In contrast, GPT-3.5 Turbo showed minimal performance changes across prompts, with the Tip Setting prompt performing best (AUC: 0.65, F1 score: 0.6772), yet significantly lower than the other models. Our findings highlight the potential of advanced models in enhancing software security and underscore the importance of prompt engineering in optimizing their performance. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20799292
Volume :
13
Issue :
13
Database :
Complementary Index
Journal :
Electronics (2079-9292)
Publication Type :
Academic Journal
Accession number :
178412758
Full Text :
https://doi.org/10.3390/electronics13132657