Start Over

Code Comment Inconsistency Detection Based on Confidence Learning

Authors :: Xu, Zhengkang
Guo, Shikai
Wang, Yumiao
Chen, Rong
Li, Hui
Li, Xiaochen
Jiang, He
Source :: IEEE Transactions on Software Engineering; 2024, Vol. 50 Issue: 3 p598-617, 20p
Publication Year :: 2024
Abstract: Code comments are a crucial source of software documentation that captures various aspects of the code. Such comments play a vital role in understanding the source code and facilitating communication between developers. However, with the iterative release of software, software projects become larger and more complex, leading to a corresponding increase in issues such as mismatched, incomplete, or outdated code comments. These inconsistencies in code comments can misguide developers and result in potential bugs, and there has been a steady rise in reports of such inconsistencies over time. Despite numerous methods being proposed for detecting code comment inconsistencies, their learning effect remains limited due to a lack of consideration for issues such as characterization noise and labeling errors in datasets. To overcome these limitations, we propose a novel approach called MCCL that first removes noise from the dataset and then detects inconsistent code comments in a timely manner, thereby enhancing the model's learning ability. Our proposed model facilitates better matching between code and comments, leading to improved development of software engineering projects. MCCL comprises two components, namely method comment detection and confidence learning denoising. The method comment detection component captures the intricate relationships between code and comments by learning their syntactic and semantic structures. It correlates the code and comments through an attention mechanism to identify how changes in the code affect the comments. Furthermore, confidence learning denoising component of MCCL identifies and removes characterization noises and labeling errors to enhance the quality of the datasets. This is achieved by implementing principles such as pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence. By effectively eliminating noise from the dataset, our model is able to more accurately learn inconsistencies between comments and source code. Our experiments on 1,518 open-source projects demonstrate that MCCL can accurately detect inconsistencies, achieving an average F1-score of 82.6%. This result outperforms state-of-the-art methods by 2.4% to 28.0%. Therefore, MCCL is more effective in identifying inconsistent comments based on code changes compared to existing approaches.