Back to Search
Start Over
On Precision of Code Clone Detection Tools
- Source :
- SANER
- Publication Year :
- 2019
- Publisher :
- IEEE, 2019.
-
Abstract
- Precision and recall are the main metrics used to measure the correctness of clone detectors. These metrics require the existence of labeled datasets containing the ground truth – samples of clone and non-clone pairs. For source code clone detectors, in particular, there are some techniques, as well as a concrete framework, for automatically evaluating recall, down to different types of clones. However, evaluating precision is still challenging, because of the intensive and specialized manual effort required to accomplish the task. Moreover, when precision is reported, it is typically done over all types of clones, making it hard to assess the strengths and weaknesses of the corresponding clone detectors.This paper presents systematic experiments to evaluate precision of eight code clone detection tools. Three judges independently reviewed 12,800 clone pairs to compute the undifferentiated and type-based precision of these tools. Besides providing a useful baseline for future research in code clone detection, another contribution of our work is to unveil important considerations to take into account when doing precision measurements and reporting the results. Specifically, our work shows that the reported precision of these tools leads to significantly different conclusions and insights about the tools when different types of clones are taken into account. It also stresses, once again, the importance of reporting inter-rater agreement.
- Subjects :
- Measure (data warehouse)
Ground truth
Source code
Correctness
Cloning (programming)
Computer science
media_common.quotation_subject
020207 software engineering
02 engineering and technology
computer.software_genre
Software metric
Clone (algebra)
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Data mining
Precision and recall
computer
media_common
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER)
- Accession number :
- edsair.doi...........32e69df794b5051aa13ae4ba1e6d3924
- Full Text :
- https://doi.org/10.1109/saner.2019.8668015