Back to Search
Start Over
Exploring the Impact of Vocabulary Techniques on Code Completion: A Comparative Approach.
- Source :
- International Journal of Software Engineering & Knowledge Engineering; May2024, Vol. 34 Issue 5, p705-727, 23p
- Publication Year :
- 2024
-
Abstract
- Integrated Development Environments (IDEs) are pivotal in enhancing productivity with features like code completion in modern software development. Recent advancements in Natural Language Processing (NLP) have empowered neural language models for code completion. In this study, we present an extensive investigation of the impact of open and closed vocabulary systems on the task of code completion. Specifically, we compare open and closed vocabulary systems with various vocabulary sizes to observe their impact on code completion performance. We experiment with three different open vocabulary systems: byte pair encoding (BPE), WordPiece and Unigram to compare them with closed-vocabulary systems to analyze their modeling performance. We also conduct experiments with different context sizes to study their impact on code completion performance. We have experimented using various prominent language models, including one from recurrent neural networks and five from transformers. Our results indicate that vocabulary size significantly impacts modeling performance and can artificially boost the accuracy of code completion models, especially in the case of a closed-vocabulary system. Moreover, we find that different vocabulary systems have varying impacts on token coverage, whereas open-vocabulary systems exhibit better token coverage. Our findings offer valuable insights for building effective code completion models, aiding researchers and practitioners in this field. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 02181940
- Volume :
- 34
- Issue :
- 5
- Database :
- Complementary Index
- Journal :
- International Journal of Software Engineering & Knowledge Engineering
- Publication Type :
- Academic Journal
- Accession number :
- 177481364
- Full Text :
- https://doi.org/10.1142/S0218194023500687