Back to Search
Start Over
COCLUBERT : Clustering Machine Learning Source Code
- Publication Year :
- 2021
-
Abstract
- Nowadays, we can find machine learning (ML) applications in nearly every aspect of modern life, and we see that more developers are engaged in the field than ever. In order to facilitate the development of new ML applications, it would be beneficial to provide services that enable developers to share, access, and search for source code easily. A step towards making such a service is to cluster source code by functionality. In this work, we present COCLUBERT, a BERT-based model for source code embedding based on their functionality and clustering them accordingly. We build COCLUBERT using CuBERT, a variant of BERT pre-trained on source code, and present three ways to fine-tune it for the clustering task. In the experiments, we compare COCLUBERT with a baseline model, where we cluster source code using CuBERT embedding without fine-tuning. We show that COCLUBERT significantly outperforms the baseline model by increasing the Dunn Index metric by a factor of 141, the Silhouette Score metric by a factor of two, and the Adjusted Rand Index metric by a factor of 11.<br />QC 20220530Part of proceedings ISBN 978-1-6654-4337-1
Details
- Database :
- OAIster
- Notes :
- English
- Publication Type :
- Electronic Resource
- Accession number :
- edsoai.on1372233639
- Document Type :
- Electronic Resource
- Full Text :
- https://doi.org/10.1109.ICMLA52953.2021.00031