Back to Search Start Over

Enhanced Multi-Label Question Tagging on Stack Overflow: A Two-Stage Clustering and DeBERTa-Based Approach

Authors :
Isun Chehreh
Farzaneh Saadati
Ebrahim Ansari
Bahram Sadeghi Bigham
Source :
Proceedings of the XXth Conference of Open Innovations Association FRUCT, Vol 36, Iss 2, Pp 858-863 (2024)
Publication Year :
2024
Publisher :
FRUCT, 2024.

Abstract

This paper introduces a novel method for automatically classifying questions with multiple labels, using data specifically sourced from Stack Overflow. Traditional tagging methods frequently face challenges due to the complexity and semantic diversity of these questions, resulting in inconsistent and sometimes inaccurate results. The process starts with preprocessing to remove any unwanted elements. Next, we convert the questions into meaningful representations using SMPNet. The semantic vectors obtained are then processed using UMAP to help us understand the overall structure of the data and make it easier to cluster similar items. After dimensionality reduction with UMAP, we use the K-Means method to group the questions into clusters, with the best number of groups determined by the Silhouette Score. Finally, a fine-tuned DeBERTa model is trained for each cluster to accurately predict the appropriate tags. Our approach significantly outperforms traditional methods, achieving 2% improvement over the best baseline. This strategy improves model efficiency by narrowing the focus to specific subsets of data.

Details

Language :
English
ISSN :
23057254 and 23430737
Volume :
36
Issue :
2
Database :
Directory of Open Access Journals
Journal :
Proceedings of the XXth Conference of Open Innovations Association FRUCT
Publication Type :
Academic Journal
Accession number :
edsdoj.52d7bf9ad4dc49158dbc791d3d248fa3
Document Type :
article
Full Text :
https://doi.org/10.5281/zenodo.14166315