Back to Search Start Over

Investigating large language models capabilities for automatic code repair in Python.

Authors :
Omari, Safwan
Basnet, Kshitiz
Wardat, Mohammad
Source :
Cluster Computing; Nov2024, Vol. 27 Issue 8, p10717-10731, 15p
Publication Year :
2024

Abstract

Developers often encounter challenges with their introductory programming tasks as part of the development process. Unfortunately, rectifying these mistakes manually can be time-consuming and demanding. Automated program repair (APR) techniques offer a potential solution by synthesizing fixes for such errors. Previous research has investigated the utilization of both symbolic and neural techniques within the APR domain. However, these approaches typically demand significant engineering efforts or extensive datasets and training. In this paper, we explore the potential of using a large language model trained on code, specifically, we assess ChatGPT's capability to detect and repair bugs in simple Python programs. The experimental evaluation encompasses two benchmarks: QuixBugs and Textbook. Each benchmark consists of simple Python functions that implement well-known algorithms and each function contains a single bug. To gauge repair performance in various settings, several benchmark variations were introduced including addition of plain English documentation and code obfuscation. Based on thorough experiments, we found that ChatGPT was able to correctly detect and fix about 50% of the methods, when code is documented. Repair performance drops to 25% when code is obfuscated, and 15% when documentation is removed and code is obfuscated. Furthermore, when compared to existing APR systems, ChatGPT considerably outperformed them. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13867857
Volume :
27
Issue :
8
Database :
Complementary Index
Journal :
Cluster Computing
Publication Type :
Academic Journal
Accession number :
179535440
Full Text :
https://doi.org/10.1007/s10586-024-04490-8