Back to Search Start Over

On Precision of Code Clone Detection Tools

Authors :
Farima Farmahinifarahani
Di Yang
Vaibhav Saini
Cristina V. Lopes
Hitesh Sajnani
Source :
SANER
Publication Year :
2019
Publisher :
IEEE, 2019.

Abstract

Precision and recall are the main metrics used to measure the correctness of clone detectors. These metrics require the existence of labeled datasets containing the ground truth – samples of clone and non-clone pairs. For source code clone detectors, in particular, there are some techniques, as well as a concrete framework, for automatically evaluating recall, down to different types of clones. However, evaluating precision is still challenging, because of the intensive and specialized manual effort required to accomplish the task. Moreover, when precision is reported, it is typically done over all types of clones, making it hard to assess the strengths and weaknesses of the corresponding clone detectors.This paper presents systematic experiments to evaluate precision of eight code clone detection tools. Three judges independently reviewed 12,800 clone pairs to compute the undifferentiated and type-based precision of these tools. Besides providing a useful baseline for future research in code clone detection, another contribution of our work is to unveil important considerations to take into account when doing precision measurements and reporting the results. Specifically, our work shows that the reported precision of these tools leads to significantly different conclusions and insights about the tools when different types of clones are taken into account. It also stresses, once again, the importance of reporting inter-rater agreement.

Details

Database :
OpenAIRE
Journal :
2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER)
Accession number :
edsair.doi...........32e69df794b5051aa13ae4ba1e6d3924
Full Text :
https://doi.org/10.1109/saner.2019.8668015