Back to Search Start Over

Exploring the Impact of Vocabulary Techniques on Code Completion: A Comparative Approach.

Authors :
Hussain, Yasir
Huang, Zhiqiu
Zhou, Yu
Khan, Izhar Ahmed
Source :
International Journal of Software Engineering & Knowledge Engineering; May2024, Vol. 34 Issue 5, p705-727, 23p
Publication Year :
2024

Abstract

Integrated Development Environments (IDEs) are pivotal in enhancing productivity with features like code completion in modern software development. Recent advancements in Natural Language Processing (NLP) have empowered neural language models for code completion. In this study, we present an extensive investigation of the impact of open and closed vocabulary systems on the task of code completion. Specifically, we compare open and closed vocabulary systems with various vocabulary sizes to observe their impact on code completion performance. We experiment with three different open vocabulary systems: byte pair encoding (BPE), WordPiece and Unigram to compare them with closed-vocabulary systems to analyze their modeling performance. We also conduct experiments with different context sizes to study their impact on code completion performance. We have experimented using various prominent language models, including one from recurrent neural networks and five from transformers. Our results indicate that vocabulary size significantly impacts modeling performance and can artificially boost the accuracy of code completion models, especially in the case of a closed-vocabulary system. Moreover, we find that different vocabulary systems have varying impacts on token coverage, whereas open-vocabulary systems exhibit better token coverage. Our findings offer valuable insights for building effective code completion models, aiding researchers and practitioners in this field. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02181940
Volume :
34
Issue :
5
Database :
Complementary Index
Journal :
International Journal of Software Engineering & Knowledge Engineering
Publication Type :
Academic Journal
Accession number :
177481364
Full Text :
https://doi.org/10.1142/S0218194023500687