229 results on '"Duplicate code"'
Search Results
2. Design Patterns: Applications and Open Issues
- Author
-
Lano, K., Blackwell, Clive, editor, and Zhu, Hong, editor
- Published
- 2014
- Full Text
- View/download PDF
3. Systematic Translation of Formalizations of Type Theory from Intrinsic to Extrinsic Style
- Author
-
Florian Rabe and Navid Roux
- Subjects
FOS: Computer and information sciences ,Computer Science - Logic in Computer Science ,Modularity (networks) ,Operator (computer programming) ,Type theory ,Theoretical computer science ,Computer science ,Duplicate code ,Type (model theory) ,Translation (geometry) ,Logic in Computer Science (cs.LO) ,Style (sociolinguistics) - Abstract
Type theories can be formalized using the intrinsically (hard) or the extrinsically (soft) typed style. In large libraries of type theoretical features, often both styles are present, which can lead to code duplication and integration issues. We define an operator that systematically translates a hard-typed into the corresponding soft-typed formulation. Even though this translation is known in principle, a number of subtleties make it more difficult than naively expected. Importantly, our translation preserves modularity, i.e., it maps structured sets of hard-typed features to correspondingly structured soft-typed ones. We implement our operator in the MMT system and apply it to a library of type-theoretical features., Comment: In Proceedings LFMTP 2021, arXiv:2107.07376
- Published
- 2021
4. Search of clones in program code
- Author
-
Alisa O. Osadchaya and Ilia V. Isaev
- Subjects
clones in program code ,Programming language ,Computer science ,Mechanical Engineering ,Code reuse ,code duplication ,Static program analysis ,Program code ,computer.software_genre ,lcsh:QA75.5-76.95 ,Atomic and Molecular Physics, and Optics ,Computer Science Applications ,Electronic, Optical and Magnetic Materials ,Code refactoring ,Duplicate code ,code analysis ,lcsh:QC350-467 ,lcsh:Electronic computers. Computer science ,duplicated fragments ,code clone types ,refactoring ,computer ,lcsh:Optics. Light ,Information Systems - Abstract
Subject of Research. The paper presents research of existing approaches and methods for the search of clones in the program code. As a result of the study, a method is developed that implements a semantic approach for the search of duplicated fragments focused on all kinds of clones. Method. The developed method is based on the analysis of the program dependency graph built from the source code files. To detect duplicate fragments, for each source code file dependency program graphs are generated with the nodes hashed on the basis of their content properties. Each pair of nodes is selected from each equivalence class, and two isomorphic subgraphs are identified that include a pair of nodes. If a pair of clones is included into another pair, it is removed from the set of the found pairs of duplicated fragments. A set of clones is generated from the pairs of duplicated fragments that share the same isomorphic subgraphs, that is, the pairs of clones are expanded. Main Results. To evaluate the efficiency of the developed method of searching for clones, the files have been compared for determination of the clone types that the system using this method detects, and the testing has been performed on the real system components. The results of the developed system have been compared to the real ones. Practical Relevance. The proposed algorithm makes it possible to automate the analysis of source files. Detecting of clones in the program code is a priority direction in code analysis, since the detection of duplicate fragments provides for the fight against unscrupulous copying of program code.
- Published
- 2020
5. How C++ Templates Are Used for Generic Programming
- Author
-
Baowen Xu, Lin Chen, Yuming Zhou, Hareton Leung, Di Wu, and Wanwangying Ma
- Subjects
Generic programming ,business.industry ,Computer science ,Programming language ,media_common.quotation_subject ,020207 software engineering ,02 engineering and technology ,Construct (python library) ,computer.software_genre ,01 natural sciences ,010104 statistics & probability ,Software ,Template ,Duplicate code ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,0101 mathematics ,business ,Function (engineering) ,computer ,Language construct ,media_common - Abstract
Generic programming is a key paradigm for developing reusable software components. The inherent support for generic constructs is therefore important in programming languages. As for C++, the generic construct, templates, has been supported since the language was first released. However, little is currently known about how C++ templates are actually used in developing real software. In this study, we conduct an experiment to investigate the use of templates in practice. We analyze 1,267 historical revisions of 50 open source systems, consisting of 566 million lines of C++ code, to collect the data of the practical use of templates. We perform statistical analyses on the collected data and produce many interesting results. We uncover the following important findings: (1) templates are practically used to prevent code duplication, but this benefit is largely confined to a few highly used templates; (2) function templates do not effectively replace C-style generics, and developers with a C background do not show significant preference between the two language constructs; (3) developers seldom convert dynamic polymorphism to static polymorphism by using CRTP (Curiously Recursive Template Pattern); (4) the use of templates follows a power-law distribution in most cases, and C++ developers who prefer using templates are those without other language background; (5) C developer background seems to override C++ project guidelines. These findings are helpful not only for researchers to understand the tendency of template use but also for tool builders to implement better tools to support generic programming.
- Published
- 2020
6. Using machine learning to predict the code size impact of duplication heuristics in a dynamic compiler
- Author
-
Raphael Mosaner, Hanspeter Mössenböck, David Leopoldseder, and Lukas Stadler
- Subjects
Artificial neural network ,Computer science ,Heuristic ,business.industry ,Context (language use) ,computer.software_genre ,Machine learning ,Mode (computer interface) ,Duplicate code ,Code (cryptography) ,Compiler ,Artificial intelligence ,Heuristics ,business ,computer - Abstract
Code duplication is a major opportunity to enable optimizations in subsequent compiler phases. However, duplicating code prematurely or too liberally can result in tremendous code size increases. Thus, modern compilers use trade-offs between estimated costs in terms of code size increase and benefits in terms of performance increase. In the context of this ongoing research project, we propose the use of machine learning to provide trade-off functions with accurate predictions for code size impact. To evaluate our approach, we implemented a neural network predictor in the GraalVM compiler and compared its performance against a human-crafted, highly tuned heuristic. First results show promising performance improvements, leading to code size reductions of more than 10% for several benchmarks. Additionally, we present an assistance mode for finding flaws in the human-crafted heuristic, leading to improvements for the duplication optimization itself.
- Published
- 2021
7. On the Interplay of Smells Large Class, Complex Class and Duplicate Code
- Author
-
Elder Vicente de Paulo Sobrinho and Marcelo de Almeida Maia
- Subjects
FOS: Computer and information sciences ,Source code ,business.industry ,Computer science ,media_common.quotation_subject ,Association (object-oriented programming) ,Software maintenance ,computer.software_genre ,Software Engineering (cs.SE) ,Comprehension ,Computer Science - Software Engineering ,Empirical research ,Code refactoring ,Duplicate code ,Code (cryptography) ,Artificial intelligence ,business ,computer ,Natural language processing ,media_common - Abstract
Bad smells have been defined to describe potential problems in code, possibly pointing out refactoring opportunities. Several empirical studies have highlighted that smells have a negative impact on comprehension and maintainability. Consequently, several approaches have been proposed to detect and restructure them. However, studies on the inter-relationship of occurrence of different types of smells in source code are still lacking, especially those focused on the quantification of this inter-relationship. In this work, we aim at understand and quantify the possible the inter-relation of smells Large Class - LC, Complex Class - CC and Duplicate Code - DC. In particular, we investigate patterns of LC and CC regarding the presence or absence of duplicate code. We conduct a quantitative study on five open source projects, and also a qualitative analysis to measure and understand the association of specific smells. As one of the main results, we highlight that there are "occurrence patterns" among these smells, for example: either in Complex Class or in the co-occurrence of Large Class and Complex Class, clones tend to be more prevalent in highly complex classes than less complex classes. The found patterns could be used to improve the performance of detection tools or even help in refactoring tasks., 10 pages
- Published
- 2021
8. Security Requirements as Code: Example from VeriDevOps Project
- Author
-
Khaled Ismaeel, Eduard Paul Enoiu, Andrey Sadovykh, Dragos Truscan, Alexandr Naumchev, and Cristina Seceleanu
- Subjects
Object-oriented programming ,Software ,Requirements engineering ,business.industry ,Duplicate code ,Computer science ,Software development ,Code (cryptography) ,Software requirements specification ,Reuse ,business ,Software engineering - Abstract
This position paper presents and illustrates the concept of security requirements as code – a novel approach to security requirements specification. The aspiration to minimize code duplication and maximize its reuse has always been driving the evolution of software development approaches. Object-Oriented programming (OOP) takes these approaches to the state in which the resulting code conceptually maps to the problem that the code is supposed to solve. People nowadays start learning to program in the primary school. On the other hand, requirements engineers still heavily rely on natural language based techniques to specify requirements. The key idea of this paper is: artifacts produced by the requirements process should be treated as input to the regular object-oriented analysis. Therefore, the contribution of this paper is the presentation of the major concepts for the security requirements as the code method that is illustrated with a real industry example from the VeriDevOps project.
- Published
- 2021
9. Selective Code Duplication for Soft Error Protection on VLIW Architectures
- Author
-
Hyunchoong Kim, Kyoungwoo Lee, Yohan Ko, and Soo-Hwan Kim
- Subjects
Source code ,TK7800-8360 ,Computer Networks and Communications ,Computer science ,media_common.quotation_subject ,Reliability (computer networking) ,02 engineering and technology ,Parallel computing ,soft error ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,VLIW ,Electrical and Electronic Engineering ,Digital signal processing ,media_common ,reliability ,business.industry ,020208 electrical & electronic engineering ,020206 networking & telecommunications ,Fault tolerance ,fault-tolerance ,Soft error ,Hardware and Architecture ,Control and Systems Engineering ,Very long instruction word ,Duplicate code ,Signal Processing ,Electronics ,Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING ,business - Abstract
Very Long Instruction Word, or VLIW, architectures have received much attention in specific-purpose applications such as scientific computation, digital signal processing, and even safety-critical systems. Several compilation techniques for VLIW architectures have been proposed in order to improve the performance, but there is a lack of research to improve reliability against soft errors. Instruction duplication techniques have been proposed by exploiting unused instruction slots (i.e., NOPs) in VLIW architectures. All the instructions cannot be replicated without additional code lines. Additional code lines are required to increase the number of duplicated instructions in VLIW architectures. Our experimental results show that 52% performance overhead as compared to unprotected source code when we duplicate all the instructions. This considerable performance overhead can be inapplicable for resource-constrained embedded systems so that we can limit the number of additional NOP instructions for selective protection. However, the previous static scheme duplicates instructions just in sequential order. In this work, we propose packing-oriented duplication to maximize the number of duplicated instructions within the same peroformance overhead bounds. Our packing-oriented approach can duplicate up to 18% more instructions within the same performance overheads compared to the previous static duplication techniques.
- Published
- 2021
- Full Text
- View/download PDF
10. Pluto: High-Performance IoT-Aware Stream Processing
- Author
-
Taegeon Um, Byung-Gon Chun, and Gyewon Lee
- Subjects
Stream processing ,Computer science ,business.industry ,Duplicate code ,Node (networking) ,Distributed computing ,Server ,Process (computing) ,Code (cryptography) ,Throughput ,Cloud computing ,business - Abstract
Nowadays, large numbers of small IoT stream queries are created from diverse IoT applications and executed on cloud backend servers. However, existing distributed stream processing systems such as Storm and Flink do not efficiently handle the large numbers of IoT stream queries because of their tightly-coupled query/code submission layer and inefficient query execution layer. In this paper, we propose Pluto, a new IoT-aware stream processing system. As a first step for IoT stream processing, this paper focuses on optimizing the execution of many IoT stream queries on a node. Pluto optimizes the end-to-end query processing with a three-phase execution, harnessing IoT-query characteristics. First, Pluto minimizes bottlenecks in the IoT query submission by decoupling the code registration from the query submission process with new APIs, which eliminates duplicate code registration and enables code sharing across queries. Second, in the execution phase, Pluto shares system resources as much as possible and minimizes resource bottlenecks in a machine by exploiting commonalities among IoT stream queries and information exposed in the API. Our evaluations show that Pluto improves the throughput by an order of magnitude compared to other stream processing systems on a 24-core machine, keeping P99 latency less than one second.
- Published
- 2021
11. Code similarity detection through control statement and program features
- Author
-
M. Sudhamani and Lalitha Rangarajan
- Subjects
0209 industrial biotechnology ,Source code ,Programming language ,Computer science ,business.industry ,media_common.quotation_subject ,General Engineering ,02 engineering and technology ,computer.software_genre ,Computer Science Applications ,020901 industrial engineering & automation ,Software ,Artificial Intelligence ,Duplicate code ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,020201 artificial intelligence & image processing ,Clone (computing) ,Software system ,business ,computer ,Software evolution ,media_common ,Reusability - Abstract
Software clone detection is an emerging research area in the field of software engineering. Software systems are subjected to continuous modifications in source code to improve the performance of the software, which may lead to code redundancy. Duplicate code/code clone is a piece of code reworked several times in software programs due to copy paste activity or reusability of existing software. Code clone is a prime subject in software evolution. Detection of software clones at the time of software evolution may improve the performance of software and reduce the maintenance cost and effort. This paper proposes metric based methods to detect code clones, as software clone is a universal problem in large scale programming environment. This paper introduces two metric based approaches to detect code clones by comparing (i) Control Statement Features (ii) Program Features like different types of statements, operators and operands. In order to demonstrate the effectiveness of the proposed approaches, extensive experiments are conducted on two datasets, C projects of Bellon's benchmark dataset and student lab programs (SLP).The methods efficiently identify similar functional clones. Proposed models only find similarity of whole programs but intelligent enough to highlight similar code segments across program files.
- Published
- 2019
12. An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems
- Author
-
Mehdi Bagherzadeh, Raffi Khatchadourian, Ajani Stewart, Rhia Singh, Anita Raja, and Yiming Tang
- Subjects
business.industry ,Computer science ,media_common.quotation_subject ,Software maintenance ,Machine learning ,computer.software_genre ,Variety (cybernetics) ,Empirical research ,Code refactoring ,Duplicate code ,Technical debt ,Debt ,Artificial intelligence ,business ,computer ,Software evolution ,media_common - Abstract
Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today's data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.
- Published
- 2021
13. The Prevalence of Code Smells in Machine Learning projects
- Author
-
Luis Cruz, Maurício Aniche, Arie van Deursen, and Bart van Oort
- Subjects
Computer Science - Machine Learning ,Source code ,Computer Science - Artificial Intelligence ,Computer science ,media_common.quotation_subject ,Static program analysis ,Machine learning ,computer.software_genre ,Machine Learning ,Computer Science - Software Engineering ,dependency management ,Artificial Intelligence ,Code (cryptography) ,media_common ,computer.programming_language ,business.industry ,Code smell ,Python (programming language) ,static code analysis ,Identifier ,Code refactoring ,Duplicate code ,code smells ,Artificial intelligence ,business ,computer ,Python - Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are pervasive in the current computer science landscape. Yet, there still exists a lack of software engineering experience and best practices in this field. One such best practice, static code analysis, can be used to find code smells, i.e., (potential) defects in the source code, refactoring opportunities, and violations of common coding standards. Our research set out to discover the most prevalent code smells in ML projects. We gathered a dataset of 74 open-source ML projects, installed their dependencies and ran Pylint on them. This resulted in a top 20 of all detected code smells, per category. Manual analysis of these smells mainly showed that code duplication is widespread and that the PEP8 convention for identifier naming style may not always be applicable to ML code due to its resemblance with mathematical notation. More interestingly, however, we found several major obstructions to the maintainability and reproducibility of ML projects, primarily related to the dependency management of Python projects. We also found that Pylint cannot reliably check for correct usage of imported dependencies, including prominent ML libraries such as PyTorch., Comment: Submitted and accepted to 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN)
- Published
- 2021
14. Analysis of Source Code Duplication in Ethreum Smart Contracts
- Author
-
Roberto Tonelli, Giuseppe Antonio Pierro, Analyses and Languages Constructs for Object-Oriented Application Evolution (RMOD), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Universita degli Studi di Cagliari [Cagliari], and Università degli Studi di Cagliari = University of Cagliari (UniCa)
- Subjects
Source code ,Blockchain ,[INFO.INFO-PL]Computer Science [cs]/Programming Languages [cs.PL] ,Cloning (programming) ,Computer science ,media_common.quotation_subject ,code duplication ,020206 networking & telecommunications ,020207 software engineering ,02 engineering and technology ,Reuse ,Computer security ,computer.software_genre ,Constant (computer programming) ,Duplicate code ,0202 electrical engineering, electronic engineering, information engineering ,Solidity ,Code (cryptography) ,smart contract ,Ethereum blockchain ,computer ,media_common - Abstract
International audience; The practice of writing smart contracts for the Ethereum blockchain is quite recent and still in development. A blockchain developer should expect constant changes in the security software field, as new bugs and security risks are discovered, and new good practices are developed. Following the security practices accepted in the blockchain community is not enough to ensure the writing of secure smart contracts. The paper aims to study the practice of code cloning among the smart contracts by analyzing two corpora. The first corpus, the "Smart-Corpus", includes smart contracts already deployed in the Ethereum blockchain. The second corpus, the "Open-Zeppelin's Solidity Library", is supervised by a community of developers who constantly take care to increase the security and efficiency of the smart contracts included in the corpus. From the comparative analysis of the corpora, we observe that the smart contracts developers frequently duplicate the code by cloning already existing smart contracts which are not part of the "OpenZeppelin corpus". In particular, we found that 79.1% of smart contracts contain duplicated code and only 18.4% of smart contracts reuse the code by implementing a smart corpus belonging to the OpenZeppelin repository. The paper discusses the advantages and the disadvantages of code duplication in the Ethereum blockchain ecosystem, and suggests to refer to the smart contracts of the OpenZeppelin's Solidity Library. The Ethereum blockchain community can indeed benefit from using the tested code presented in OpenZeppelin's Solidity Library to increase its security.
- Published
- 2021
15. The Role of Duplicated Code in Software Readability and Comprehension
- Author
-
Liao, Xuan, Jiang, Linyao, Liao, Xuan, and Jiang, Linyao
- Abstract
Background. Readability and comprehension are the critical points of software developmentand maintenance. There are many researcher point out that the duplicatecode as a code smell has effect on software maintainability, but lack of research abouthow duplicate code affect software readability and comprehension, which are parts of maintainability. Objectives. In this thesis, we aim to briefly summarize the impact of duplicatecode and typical types of duplicate code according to current works, then our goalis to explore whether duplicate code is a factor to influence readability and comprehension. Methods. In our present research, we did a background survey to asked some background questions from forty-two subjects to help us classify them, and conduct an experiment with subjects to explore the role of duplicate code on perceived readability and comprehension by experiment. The perceived readability and comprehension are measured by perceived readability scale, reading time and the accuracy of cloze test. Results. The experimental data shows code with duplication have higher perceived readability and better comprehension, however, the difference are not significant.And code with duplication cost less reading time than code without duplication,and the difference is significant. But duplication type are strongly associate with perceived readability. For reading time, it is significant associate with duplication type and size of code. While there do not exists significant correlation between programmingexperience of subjects and perceived readability or comprehension, andit also has no significant relation between perceived readability and comprehension,size and CC according to our data results. Conclusions. Code with duplication has higher software readability according tothe results of reading time, which is significant. And code with duplication has highercomprehension than code without duplication, but the difference is not statistically significant according to our
- Published
- 2020
16. Cross-Language Code Search using Static and Dynamic Analyses
- Author
-
George Mathew and Kathryn T. Stolee
- Subjects
FOS: Computer and information sciences ,Information retrieval ,Computer science ,Language code ,Python (programming language) ,Static analysis ,computer.software_genre ,Software Engineering (cs.SE) ,Computer Science - Software Engineering ,Code refactoring ,Duplicate code ,Code (cryptography) ,Language translation ,Precision and recall ,computer ,computer.programming_language - Abstract
As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for refactoring, patch identification for program repair, and language translation. Existing code-to-code search tools rely on static similarity approaches such as the comparison of tokens and abstract syntax trees (AST) to approximate dynamic behavior, leading to low precision. Most tools do not support cross-language code-to-code search, and those that do, rely on machine learning models that require labeled training data. We present Code-to-Code Search Across Languages (COSAL), a cross-language technique that uses both static and dynamic analyses to identify similar code and does not require a machine learning model. Code snippets are ranked using non-dominated sorting based on code token similarity, structural similarity, and behavioral similarity. We empirically evaluate COSAL on two datasets of 43,146Java and Python files and 55,499 Java files and find that 1) code search based on non-dominated ranking of static and dynamic similarity measures is more effective compared to single or weighted measures; and 2) COSAL has better precision and recall compared to state-of-the-art within-language and cross-language code-to-code search tools. We explore the potential for using COSAL on large open-source repositories and discuss scalability to more languages and similarity metrics, providing a gateway for practical,multi-language code-to-code search., Comment: Accepted at FSE 2021; 13 pages, 4 figures, 8 tables
- Published
- 2021
- Full Text
- View/download PDF
17. Preparing Ginkgo for AMD GPUs – A Testimonial on Porting CUDA Code to HIP
- Author
-
Terry Cojean, Tobias Ribizel, Hartwig Anzt, and Yuhsiang M. Tsai
- Subjects
Computer science ,020206 networking & telecommunications ,02 engineering and technology ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,computer.software_genre ,Supercomputer ,Porting ,CUDA ,Software portability ,Duplicate code ,Mathematical software ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Operating system ,020201 artificial intelligence & image processing ,computer ,Scope (computer science) - Abstract
With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effort from CUDA, the extension of the HIP framework to add missing features such as cooperative groups, the performance price of compiling HIP code for AMD architectures, and the design of a library providing native backends for NVIDIA and AMD GPUs while minimizing code duplication by using a shared code base.
- Published
- 2021
18. Why Developers Refactor Source Code: A Mining-based Study
- Author
-
Massimiliano Di Penta, Fiorella Zampetti, Valentina Piantadosi, Gabriele Bavota, Jevgenija Pantiuchina, Simone Scalabrino, and Rocco Oliveto
- Subjects
FOS: Computer and information sciences ,Source code ,Refactoring ,Computer science ,Process (engineering) ,media_common.quotation_subject ,02 engineering and technology ,computer.software_genre ,Computer Science - Software Engineering ,Software_SOFTWAREENGINEERING ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Quality (business) ,Product (category theory) ,media_common ,business.industry ,020207 software engineering ,Software Engineering (cs.SE) ,empirical software engineering ,Code refactoring ,Duplicate code ,020201 artificial intelligence & image processing ,Software engineering ,business ,computer ,Software - Abstract
Refactoring aims at improving code non-functional attributes without modifying its external behavior. Previous studies investigated the motivations behind refactoring by surveying developers. With the aim of generalizing and complementing their findings, we present a large-scale study quantitatively and qualitatively investigating why developers perform refactoring in open source projects. First, we mine 287,813 refactoring operations performed in the history of 150 systems. Using this dataset, we investigate the interplay between refactoring operations and process (e.g., previous changes/fixes) and product (e.g., quality metrics) metrics. Then, we manually analyze 551 merged pull requests implementing refactoring operations and classify the motivations behind the implemented refactorings (e.g., removal of code duplication). Our results led to (i) quantitative evidence of the relationship existing between certain process/product metrics and refactoring operations and (ii) a detailed taxonomy, generalizing and complementing the ones existing in the literature, of motivations pushing developers to refactor source code., Comment: Accepted to the ACM Transactions on Software Engineering and Methodology
- Published
- 2021
- Full Text
- View/download PDF
19. Jupyter Notebooks on GitHub : Characteristics and Code Clones
- Author
-
Källén, Malin, Sigvardsson, Ulf, and Wrigstad, Tobias
- Subjects
FOS: Computer and information sciences ,Source lines of code ,Programvaruteknik ,Computer science ,Beräkningsmatematik ,computer.software_genre ,Jupyter notebooks ,Computer Science - Software Engineering ,System programming ,Code (cryptography) ,Mining software repositories ,computer.programming_language ,Information retrieval ,Computer Science - Programming Languages ,Cloning (programming) ,Code cloning ,Software analytics ,Software Engineering ,Snippet ,Python (programming language) ,Software Engineering (cs.SE) ,Computational Mathematics ,Duplicate code ,Scripting language ,computer ,Programming Languages (cs.PL) - Abstract
Jupyter notebooks has emerged as a standard tool for data science programming. Programs in Jupyter notebooks are different from typical programs as they are constructed by a collection of code snippets interleaved with text and visualisation. This allows interactive exploration and snippets may be executed in different order which may give rise to different results due to side-effects between snippets. Previous studies have shown the presence of considerable code duplication -- code clones -- in sources of traditional programs, in both so-called systems programming languages and so-called scripting languages. In this paper we present the first large-scale study of code cloning in Jupyter notebooks. We analyse a corpus of 2.7 million Jupyter notebooks hosted on GitHJub, representing 37 million individual snippets and 227 million lines of code. We study clones at the level of individual snippets, and study the extent to which snippets are recurring across multiple notebooks. We study both identical clones and approximate clones and conduct a small-scale ocular inspection of the most common clones. We find that code cloning is common in Jupyter notebooks -- more than 70% of all code snippets are exact copies of other snippets (with possible differences in white spaces), and around 50% of all notebooks do not have any unique snippet, but consists solely of snippets that are also found elsewhere. In notebooks written in Python, at least 80% of all snippets are approximate clones and the prevalence of code cloning is higher in Python than in other languages. We further find that clones between different repositories are far more common than clones within the same repository. However, the most common individual repository from which a Jupyter notebook contains clones is the repository in which itself resides.
- Published
- 2021
20. Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks
- Author
-
Rahul Purandare, Nikita Mehrotra, Navdha Agarwal, David Lo, Saket Anand, and Piyush Gupta
- Subjects
FOS: Computer and information sciences ,Source code ,business.industry ,Computer science ,media_common.quotation_subject ,Deep learning ,Software maintenance ,computer.software_genre ,Semantics ,Software Engineering (cs.SE) ,Computer Science - Software Engineering ,Code refactoring ,Duplicate code ,Abstract syntax ,Code (cryptography) ,Artificial intelligence ,business ,computer ,Software ,Natural language processing ,media_common - Abstract
Code clones are duplicate code fragments that share (nearly) similar syntax or semantics. Code clone detection plays an important role in software maintenance, code refactoring, and reuse. A substantial amount of research has been conducted in the past to detect clones. A majority of these approaches use lexical and syntactic information to detect clones. However, only a few of them target semantic clones. Recently, motivated by the success of deep learning models in other fields, including natural language processing and computer vision, researchers have attempted to adopt deep learning techniques to detect code clones. These approaches use lexical information (tokens) and(or) syntactic structures like abstract syntax trees (ASTs) to detect code clones. However, they do not make sufficient use of the available structural and semantic information hence, limiting their capabilities. This paper addresses the problem of semantic code clone detection using program dependency graphs and geometric neural networks, leveraging the structured syntactic and semantic information. We have developed a prototype tool HOLMES, based on our novel approach, and empirically evaluated it on popular code clone benchmarks. Our results show that HOLMES performs considerably better than the other state-of-the-art tool, TBCCD. We also evaluated HOLMES on unseen projects and performed cross dataset experiments to assess the generalizability of HOLMES. Our results affirm that HOLMES outperforms TBCCD since most of the pairs that HOLMES detected were either undetected or suboptimally reported by TBCCD., Under Review IEEE Transactions on Software Engineering
- Published
- 2020
21. A Code Similarity Detection Algorithm Based on Maximum Common Subtree Optimization
- Author
-
Li Lin and Zhikai Lin
- Subjects
Longest common subsequence problem ,Tree structure ,Semantic similarity ,Similarity (network science) ,Duplicate code ,Computer science ,String (computer science) ,Code (cryptography) ,Graph (abstract data type) ,Algorithm - Abstract
The code similarity detection is different from the traditional text duplication checking. The former has a lot of the same syntax content in the code. There are two code duplication detection algorithms. One is realized by extracting and counting characteristic attributes, which can result in a lack of the logical relationship between code structures. The other is realized by abstracting code into a string, tree structure or graph structure, which can lead to a lack of codes' semantic features. To rectify these deficiencies, an optimization algorithm based on maximum common subtree is proposed. First of all, the structural information based on the largest common subtree is extracted to calculate the structural similarity of codes. After this, the semantic information based on the longest common subsequence is extracted to calculate the semantic similarity of codes. Finally, the semantic similarity and structural similarity are assigned different weights using TF-IDF algorithm. Experimental results show that, the optimization for code similarity detection based on the maximum common subtree is able to reduce the non-plagiarism similarity of codes and keeps more feature information than the traditional code duplication checking algorithm.
- Published
- 2020
22. ASPDup: AST-Sequence-based Progressive Duplicate Code Detection Tool for Onsite Programming Code
- Author
-
Shao Yichao, Zhiqiu Huang, Yu Zhou, Yu Yaoshen, and Li Weiwei
- Subjects
Sequence ,Source code ,Computer science ,Programming language ,media_common.quotation_subject ,computer.software_genre ,Microsoft Visual Studio ,Fragment (logic) ,Duplicate code ,Code (cryptography) ,Plug-in ,Abstract syntax tree ,computer ,media_common - Abstract
Duplicate code is an example of bad smells, which are usually been refactored after the detection to improve the quality of programs. Locate the duplicate code at the programming phase may reduce the cost of maintenance, but the challenge is it need to detect duplicate code between an incomplete code fragment with complete files, which the existing tools are hard to be applied to this scenario. In this paper, we propose an AST-sequence-based duplicate code detection approach for onsite programming code. The abstract syntax tree (AST) is extracted from source code and then is transformed into an encoded sequence. A local sequence alignment algorithm is used to find highly similar subsequences. After the post-processing, similar regions will be found between two code fragments according to the subsequences. We have developed a prototype tool as a plugin for Visual Studio Code. Experimental results indicate that our approach is effective in finding highly similar regions between cross-granularity code fragments, which can facilitate duplicate code detection for incomplete onsite programming code.
- Published
- 2020
23. LCCSS
- Author
-
Lucas Pereira da Silva and Patricia Vilain
- Subjects
Identification (information) ,Similarity (network science) ,Code refactoring ,Computer science ,Duplicate code ,Metric (mathematics) ,Maintainability ,Code (cryptography) ,Data mining ,computer.software_genre ,computer ,Task (project management) - Abstract
Test code maintainability is a common concern in software testing. In order to achieve good maintainability, test methods should be clearly structured, well named, small in size, and, mainly, test code duplication should be avoided. Several strategies exist to avoid test code duplication, such as implicit setup and delegated setup. However, prior to applying these strategies, first it is necessary to identify the duplicate code, which can be a time-consuming task. To address this problem, we automate the identification of duplicate test code through the application of code similarity metrics. We propose a novel similarity metric, called Longest Common Contiguous Start Sub-Sequence (LCCSS), to identify refactoring candidates. LCCSS is a metric used to measure similarity between pairs of tests. The most similar pairs are reported as strong candidates to be refactored through the implicit setup strategy. We also develop a framework, called Roza, that can use different similarity metrics to identify test code duplication. An experiment shows that LCCSS and Simian, a clone detection tool, have both identified pairs of tests to be refactored through the implicit setup strategy with maximum precision in all the eleven standard recall levels. But, unlike Simian, LCCSS does not need to be calibrated for each project.
- Published
- 2020
24. Reducing accidental clones using instant clone search in automatic code review
- Author
-
Vipin Balachandran
- Subjects
Code review ,Computer science ,Programming language ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Workflow ,Software_SOFTWAREENGINEERING ,Duplicate code ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,Clone (computing) ,sense organs ,skin and connective tissue diseases ,computer ,Codebase - Abstract
Accidental clones occur when developers are not familiar with the codebase. We propose changes in the developer code review workflow to leverage online clone detection to identify duplicate code during code review. A developer survey’s responses indicate that the proposed workflow change will increase the usage of clone detection tools and can reduce accidental clones.
- Published
- 2020
25. Feature-Oriented Control Programming
- Author
-
Niklas Fors, Alfred Theorin, Görel Hedin, and Sven Gestegard Robertz
- Subjects
Function block diagram ,Computer science ,Programming language ,business.industry ,020208 electrical & electronic engineering ,020206 networking & telecommunications ,02 engineering and technology ,Reuse ,Modular design ,Wizard ,computer.software_genre ,Feature model ,Inheritance (object-oriented programming) ,Feature (computer vision) ,Duplicate code ,0202 electrical engineering, electronic engineering, information engineering ,business ,computer - Abstract
Managing variability in control programs often requires code duplication or that all variants are anticipated in advance. In this paper, we present a new approach to obtaining modular functionality reuse across variants. Using the language mechanisms diagram inheritance and connection interception, a feature model and an interactive feature-selection wizard can be automatically derived from the control program.
- Published
- 2020
26. Code Duplication and Reuse in Jupyter Notebooks
- Author
-
Andreas P. Koenzen, Margaret-Anne Storey, and Neil A. Ernst
- Subjects
FOS: Computer and information sciences ,Data exploration ,business.industry ,Computer science ,05 social sciences ,Code reuse ,Computer Science - Human-Computer Interaction ,020207 software engineering ,Sample (statistics) ,02 engineering and technology ,Reuse ,Human-Computer Interaction (cs.HC) ,Software Engineering (cs.SE) ,Computer Science - Software Engineering ,Software ,Duplicate code ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,0501 psychology and cognitive sciences ,Exploratory programming ,Jupyter, computational notebooks, code duplication, code clones, code reuse, data analysis, data exploration, exploratory programming ,Software engineering ,business ,050107 human factors - Abstract
Duplicating one's own code makes it faster to write software. This expediency is particularly valuable for users of computational notebooks. Duplication allows notebook users to quickly test hypotheses and iterate over data. In this paper, we explore how much, how and from where code duplication occurs in computational notebooks, and identify potential barriers to code reuse. Previous work in the area of computational notebooks describes developers' motivations for reuse and duplication but does not show how much reuse occurs or which barriers they face when reusing code. To address this gap, we first analyzed GitHub repositories for code duplicates contained in a repository's Jupyter notebooks, and then conducted an observational user study of code reuse, where participants solved specific tasks using notebooks. Our findings reveal that repositories in our sample have a mean self-duplication rate of 7.6%. However, in our user study, few participants duplicated their own code, preferring to reuse code from online sources., Accepted as a full paper at the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2020
- Published
- 2020
27. A Fast Detecting Method for Clone Functions Using Global Alignment of Token Sequences
- Author
-
SangUn Park, Uram Ko, Hwan-Gue Cho, Ibrahim Aitkazin, Haesung Tak, and Da-Young Lee
- Subjects
Source code ,business.industry ,Computer science ,media_common.quotation_subject ,020208 electrical & electronic engineering ,Code reuse ,Static program analysis ,Sequence alignment ,02 engineering and technology ,Security token ,computer.software_genre ,Software ,Duplicate code ,Clone (algebra) ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Data mining ,business ,computer ,media_common - Abstract
In large software projects, proper source code reuse can make development more efficient, but a lot of duplicate code and error code reuse can be a major cause of difficult system maintenance. Efficient clone code detection for large project can help manage the project. However, most of the clone detection methods are difficult to perform on adaptive analysis that adjusts specificity or sensitivity according to the type of clone to be detected. Therefore, when a user wants to find a particular type of clone in a large project, they must analyze it repeatedly using various tools to adjust the options. In this study, we propose a clone detection system based on the global sequence alignment. Lex based token analysis models and global alignment algorithm-based clone detection models were able to detect not only exact matches but also various types of clones by setting lower bound scores. Using features of the global alignment score calculation method to eliminate functions that cannot be clone candidates in advance, alignment analysis was possible even for large projects, and the execution time was predicted. For clone functions, we visualized the matching area, which is the result of alignment analysis, to represent clone information more efficiently.
- Published
- 2020
28. Code Duplication on Stack Overflow
- Author
-
Sebastian Baltes and Christoph Treude
- Subjects
FOS: Computer and information sciences ,Computer science ,business.industry ,Code reuse ,Maintainability ,020207 software engineering ,02 engineering and technology ,Software maintenance ,Software Engineering (cs.SE) ,Computer Science - Software Engineering ,Software ,Duplicate code ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Clone (computing) ,Software engineering ,business ,Software evolution - Abstract
Despite the unarguable importance of Stack Overflow (SO) for the daily work of many software developers and despite existing knowledge about the impact of code duplication on software maintainability, the prevalence and implications of code clones on SO have not yet received the attention they deserve. In this paper, we motivate why studies on code duplication within SO are needed and how existing studies on code reuse differ from this new research direction. We present similarities and differences between code clones in general and code clones on SO and point to open questions that need to be addressed to be able to make data-informed decisions about how to properly handle clones on this important platform. We present results from a first preliminary investigation, indicating that clones on SO are common and diverse. We further point to specific challenges, including incentives for users to clone successful answers and difficulties with bulk edits on the platform, and conclude with possible directions for future work., Comment: 4 pages, 2 figures, 42nd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER 2020), ACM, 2020
- Published
- 2020
- Full Text
- View/download PDF
29. Optimizing Program Size Using Multi-result Supercompilation
- Author
-
Dimitur Nikolaev Krustev
- Subjects
FOS: Computer and information sciences ,Theoretical computer science ,Computer Science - Programming Languages ,Generalization ,Computer science ,Duplicate code ,Program transformation ,Programming Languages (cs.PL) - Abstract
Supercompilation is a powerful program transformation technique with numerous interesting applications. Existing methods of supercompilation, however, are often very unpredictable with respect to the size of the resulting programs. We consider an approach for controlling result size, based on a combination of multi-result supercompilation and a specific generalization strategy, which avoids code duplication. The current early experiments with this method show promising results - we can keep the size of the result small, while still performing powerful optimizations., Comment: In Proceedings VPT/HCVS 2020, arXiv:2008.02483. arXiv admin note: identical to arXiv:2006.02204, which has added appendices
- Published
- 2020
- Full Text
- View/download PDF
30. Deep learning application on code clone detection: A review of current knowledge
- Author
-
Hao Li, Namrata Aundhkar, Maggie Lei, Dae-Kyoo Kim, and Ji Li
- Subjects
Computer science ,business.industry ,Deep learning ,Existential quantification ,Maintainability ,Software quality ,Software ,Hardware and Architecture ,Duplicate code ,Code (cryptography) ,Artificial intelligence ,Software engineering ,business ,Information Systems ,Reusability - Abstract
Bad smells in code are indications of low code quality representing potential threats to the maintainability and reusability of software. Code clone is a type of bad smells caused by code fragments that have the same functional semantics with syntactic variations. In the recent years, the research on duplicate code has been dramatically geared up by deep learning techniques powered by advances in computing power. However, there exists little work studying the current state-of-art and future prospects in the area of applying deep learning to code clone detection. In this paper, we present a systematic review of the literature on the application of deep learning on code clone detection. We aim to find and study the most recent work on the subject, discuss their limitations and challenges, and provide insights on the future work.
- Published
- 2022
31. LDMBL: An architecture for reducing code duplication in heavyweight binary instrumentations
- Author
-
Mehdi Kharrazi and Behnam Momeni
- Subjects
Duplicate code ,Computer science ,0202 electrical engineering, electronic engineering, information engineering ,Binary number ,020207 software engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Parallel computing ,Architecture ,Software - Published
- 2018
32. Instruction duplication
- Author
-
Lucian Cojocar, Kostas Papagiannopoulos, Niek Timmers, Computer Systems, Systems and Network Security, and Network Institute
- Subjects
Computer science ,business.industry ,Fault tolerance ,02 engineering and technology ,Fault injection ,Hardware_PERFORMANCEANDRELIABILITY ,Software countermeasures ,020202 computer hardware & architecture ,Software ,Duplicate code ,Gene duplication ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,Computer network - Abstract
Fault injection attacks alter the intended behavior of micro-controllers, compromising their security. These attacks can be mitigated using software countermeasures. A widely-used software-based solution to deflect fault attacks is instruction duplication and n -plication. We explore two main limitations with these approaches: first, we examine the effect of instruction duplication under fault attacks, demonstrating that as fault tolerance mechanism, code duplication does not provide a strong protection in practice. Second, we show that instruction duplication increases side-channel leakage of sensitive code regions using a multivariate exploitation technique both in theory and in practice.
- Published
- 2018
33. Understanding the use of lambda expressions in Java
- Author
-
Nikolaos Tsantalis, Danny Dig, Davood Mazinanian, and Ameya Ketkar
- Subjects
Functional programming ,Source code ,Java ,Programming language ,Computer science ,media_common.quotation_subject ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Empirical research ,Duplicate code ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Currying ,Lazy evaluation ,Safety, Risk, Reliability and Quality ,computer ,Know-how ,Software ,media_common ,computer.programming_language - Abstract
Java 8 retrofitted lambda expressions, a core feature of functional programming, into a mainstream object-oriented language with an imperative paradigm. However, we do not know how Java developers have adapted to the functional style of thinking, and more importantly, what are the reasons motivating Java developers to adopt functional programming. Without such knowledge, researchers miss opportunities to improve the state of the art, tool builders use unrealistic assumptions, language designers fail to improve upon their designs, and developers are unable to explore efficient and effective use of lambdas. We present the first large-scale, quantitative and qualitative empirical study to shed light on how imperative programmers use lambda expressions as a gateway into functional thinking. Particularly, we statically scrutinize the source code of 241 open-source projects with 19,770 contributors, to study the characteristics of 100,540 lambda expressions. Moreover, we investigate the historical trends and adoption rates of lambdas in the studied projects. To get a complementary perspective, we seek the underlying reasons on why developers introduce lambda expressions, by surveying 97 developers who are introducing lambdas in their projects, using the firehouse interview method. Among others, our findings revealed an increasing trend in the adoption of lambdas in Java: in 2016, the ratio of lambdas introduced per added line of code increased by 54% compared to 2015. Lambdas were used for various reasons, including but not limited to (i) making existing code more succinct and readable, (ii) avoiding code duplication, and (iii) simulating lazy evaluation of functions. Interestingly, we found out that developers are using Java's built-in functional interfaces inefficiently, i.e., they prefer to use general functional interfaces over the specialized ones, overlooking the performance overheads that might be imposed. Furthermore, developers are not adopting techniques from functional programming, e.g., currying. Finally, we present the implications of our findings for researchers, tool builders, language designers, and developers.
- Published
- 2017
34. Enhancing Abstraction in App Inventor with Generic Event Handlers
- Author
-
Evan W. Patton, Audrey Seo, and Franklyn Turbak
- Subjects
Code refactoring ,Programming language ,Computer science ,Duplicate code ,Component (UML) ,Programming patterns ,Code (cryptography) ,Target audience ,Code smell ,computer.software_genre ,computer ,Abstraction (linguistics) - Abstract
Work on code smells (undesirable programming patterns) in blocks languages has found that programmers often duplicate blocks code rather than abstracting over common patterns of computation using procedure-like features. For example, previous analyses of over a million MIT App Inventor projects have revealed that procedures are used surprisingly rarely in the wild and that many users miss opportunities for using procedural abstraction to avoid code duplication in their projects.In this work, we use data analysis to explain how particular features of App Inventor create barriers to abstracting over event handlers. In many cases, duplicated code in event handlers cannot be extracted into a procedure without using so-called generic blocks that abstract over a particular component (e.g., a label). Generic blocks are rarely used in practice, possibly because programmers do not know about them or find them difficult to use. But even proceduralization with generic blocks does not remove the need for duplicating the event handlers themselves.We address these issues with two enhancements to App Inventor. First, we add generic event handlers, a new form of abstraction that allows specifying a single handler for all components of a particular type. Second, we add a way to easily convert between specific and generic blocks to facilitate genericization, that is, abstracting actions over a particular component to apply to a group of components of that type.We also discuss related design choices and ways to encourage programmers to use the new features to avoid code duplication. Our work is an example of data-informed programming language design, in which the creation or modification of features is informed by the analysis of large datasets of programs from the language’s target audience.
- Published
- 2019
35. Refactoring Code Clone Detection
- Author
-
Zhala Sarkawt Othman and Mehmet Kaya
- Subjects
Source code ,Cloning (programming) ,Computer science ,Programming language ,media_common.quotation_subject ,Software maintenance ,computer.software_genre ,Software quality ,Code refactoring ,Software_SOFTWAREENGINEERING ,Duplicate code ,Code (cryptography) ,Clone (computing) ,computer ,media_common - Abstract
Refactoring duplicate code is an important issue and is one of the most important smells in software maintenance. There is an important relationship between clones and code quality. Most programmers use clones because they are cheaper and faster than typing the program code. A cloning code is created by copying and pasting the existing code fragments of the source code with or without slight modifications. A major part (5% to 10%) of the source code for large computer programs consists of copy codes. Since cloning is believed to reduce the possibility of software maintenance, many techniques and cloning detection tools have been recommended for this purpose. The basic goal of clone detection is to identify the clone code and replace it with a single call to the function, where the function simulates the behavior of one instance of the clone group. This research provides an overview of the refactoring IDE. The aspects of cloning and detection of cloning are explained. In the copy detection algorithm, the source code is created in XML format.
- Published
- 2019
36. Design and Development of Software Tool for Code Clone Search, Detection, and Analysis
- Author
-
Uma Maheswari B and Dethe Tukaram
- Subjects
Source code ,Cloning (programming) ,Syntax (programming languages) ,Computer science ,Programming language ,media_common.quotation_subject ,computer.software_genre ,Data type ,Duplicate code ,Code (cryptography) ,Pattern matching ,Programmer ,computer ,media_common - Abstract
Replicating a piece of code with or without making minor changes in their code during project development is a normal practice adopted by the developers to reduce their production time which results in code/program clones. Clones could also occur in their existing file without any prior knowledge of the programmer. Though code/program cloning helps to increase the productivity initially, later it may increase the maintenance cost. Hence code/program cloning needs to be identified thoroughly during the development in a frequent interval of time. The main objective of this work involves in depth analysis of source code produced by the programmer for code cloning and to indicate how much percentage of code is copied in that their file. The proposed work detects clones in terms of syntax, function, method, variables, data types, keywords, spaces, author name, imports, commented lines, number of lines, and last-modified-date and shows the similarity percentage in graphically.
- Published
- 2019
37. Characterizing Duplicate Code Snippets between Stack Overflow and Tutorials
- Author
-
Agnieszka Ciborowska, Kostadin Damevski, and Manziba Akanda Nishi
- Subjects
Information retrieval ,business.industry ,Computer science ,Software development ,020207 software engineering ,02 engineering and technology ,Reuse ,Duplicate code ,Ask price ,Block (programming) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Web resource ,business ,License - Abstract
Developers are usually unaware of the quality and lineage of information available on popular Web resources, leading to potential maintenance problems and license violations when reusing code snippets from these resources. In this paper, we study the duplication of code snippets between two popular sources of software development information: the Stack Overflow QA a significant number (31%) of answers that contained a duplicate code block were chosen as the accepted answer. Qualitative analysis reveals that developers commonly use Stack Overflow to ask clarifying questions about code they reused from tutorials, and copy code snippets from tutorials to provide answers to questions.
- Published
- 2019
38. A Review on Software Code Clones and Tools and Techniques Available to Handle Them
- Author
-
Nishtha Kesswani, Amita Sharma, and U. Devi
- Subjects
business.industry ,Programming language ,Computer science ,Code smell ,computer.software_genre ,Software ,Fragment (logic) ,Code refactoring ,Software_SOFTWAREENGINEERING ,Duplicate code ,Factor (programming language) ,Code (cryptography) ,Clone (computing) ,business ,computer ,computer.programming_language - Abstract
Code duplication or code fragment copying and using it again by either pasting it through modification or without modification is a known form of code smell in the maintenance of software. This is known as code clones and is a factor which makes maintenance of software much difficult. A noteworthy drawback which is major, is of this duplicated fragment, if searching of a bug is done in a fragment of code, several former alike type of fragments to it should then be investigated so as to verify possible form of existence as the similar bug in same fragments. Refactoring of code of duplicated form is some other prime problem in maintenance of software although many kind of studies claim that few clones refactoring aren’t advantageous also there is a risk present. In this paper, we survey the state of art in research for detection of clone, various ideas, ways, tools for clone detection, research related to it on the case study of code clone.
- Published
- 2019
39. Clone Detection vs. Pattern Mining: The Battle
- Author
-
Deknop, Céline, Baars, Simon, Mens, Kim, Oprescu, Ana, Fabry, Johan, The 18th Belgium-Netherlands Software Evolution Workshop, and UCL - SST/ICTM/INGI - Pôle en ingénierie informatique
- Subjects
type 3 clones ,clone detection ,frequent subtree mining ,duplicate code ,code clones ,pattern mining - Abstract
In this paper we compare two approaches to discover recurrent fragments in source code: clone detection and frequent subtree mining. We apply both approaches to a medium-sized Java case and compare qualitatively and quantitatively their results in terms of what types of code fragments are detected, as well as their size, relevance, coverage, and level of detail. We conclude that both approaches are complementary, while existing overlap may be used for cross-validation of the approaches.
- Published
- 2019
40. A Novel Unsupervised Learning Approach for Assessing Web Services Refactoring
- Author
-
Cristian Mateos, Brian Hammer, Sanjay Misra, Luciano Listorti, and Guillermo Rodríguez
- Subjects
Computer science ,business.industry ,Maintainability ,020206 networking & telecommunications ,020207 software engineering ,Cohesion (computer science) ,02 engineering and technology ,computer.software_genre ,Machine learning ,Software ,Code refactoring ,Duplicate code ,0202 electrical engineering, electronic engineering, information engineering ,Unsupervised learning ,Artificial intelligence ,Web service ,Cluster analysis ,business ,computer - Abstract
During the last years, the development of Service-Oriented applications has become a trend. Given the characteristics and challenges posed by current systems, it has become essential to adopt this solution since it provides a great performance in distributed and heterogeneous environments. At the same time, the necessity of flexibility and great capacity of adaptation introduce a process of constant modifications and growth. Thus, developers easily make mistakes such as code duplication or unnecessary code, generating a negative impact on quality attributes such as performance and maintainability. Refactoring is considered a technique that greatly improves the quality of software and provides a solution to this issue. In this context, our work proposes an approach for comparing manual service groupings and automatic groupings that allows analyzing, evaluating and validating clustering techniques applied to improve service cohesion and fragmentation. We used V-Measure with homogeneity and completeness as the evaluation metrics. Additionally, we have performed improvements in existing clustering techniques of a previous work, VizSOC, that reach 20% of gain regarding the aforementioned metrics. Moreover, we added an implementation of the COBWEB clustering algorithm yielding fruitful results.
- Published
- 2019
41. Classes of arbitrary kind
- Author
-
Serrano, Alejandro, Miraldo, Victor Cacciari, Alferes, José Júlio, Johansson, Moa, Sub Softw.Techn. for Learning and Teach., Sub Software Technology, Software Technology for Learning and Teaching, and Software Technology
- Subjects
050101 languages & linguistics ,Generic programming ,Programming language ,Computer science ,Serialization ,05 social sciences ,02 engineering and technology ,Type (model theory) ,computer.software_genre ,Theoretical Computer Science ,Consistency (database systems) ,Duplicate code ,Haskell ,Type classes ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Programmer ,computer ,Abstraction (linguistics) ,computer.programming_language ,Computer Science(all) - Abstract
The type class system in the Haskell Programming language provides a useful abstraction for a wide range of types, such as those that support comparison, serialization, ordering, between others. This system can be extended by the programmer by providing custom instances to one’s custom types. Yet, this is often a monotonous task. Some notions, such as equality, are very regular regardless if it is being encoded for a ground type or a type constructor. In this paper we present a technique that unifies the treatment of ground types and type constructors whenever possible. This reduces code duplication and improves consistency. We discuss the encoding of several classes in this form, including the generic programming facility in GHC.
- Published
- 2019
42. Slicing Based Code Recommendation for Type Based Instance Retrieval
- Author
-
Rui Sun, Hui Liu, and Leping Li
- Subjects
Information retrieval ,Computer science ,business.industry ,Duplicate code ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Code (cryptography) ,The Internet ,Reuse ,Precision and recall ,business ,Base (topology) ,Slicing ,Implementation - Abstract
It is common for developers to retrieve an instance of a certain type from another instance of other types. However, it is quite often that developers do not exactly know how to retrieve the instance although they know exactly what they need (the instance to re retrieved, also known as the target instance) and where it could be retrieved (i.e., the source instance). Such kind of instance retrieval is popular and thus their implementations, in different forms, are often publicly available on the Internet. Consequently, a number of approaches have been proposed to retrieve such implementations (code snippets) and release developers from reinventing such snippets. However, the performance of such approaches deserves further improvement. To this end, in this paper, we propose a slicing based approach to recommending code snippets that could retrieve the target instance from the source instance. The approach works as follows. First, from a large code base, it retrieves methods that contain the source instance and the target instance. Second, for each of these methods, it locates the target instances, and extracts related code snippets that generate the target instances by backward code slicing. Third, from the extracted code snippets, it removes those that do not contain the source instance. Fourth, it merges code snippets whose corresponding target instances are at parallel execution paths. Fifth, it removes duplicate code snippets. Finally, it ranks the resulting code snippets, and presents the top ones. We implement the approach as an Eclipse plugin called TIRSnippet. We also evaluate it with real type based instance retrieval queries. Evaluation results suggest that compared to the state-of-the-art approaches, the proposed approach improves the precision and recall by 8.8%, and 25%, respectively.
- Published
- 2019
43. Identifying Redundancies in Fork-based Development
- Author
-
Luyao Ren, Shurui Zhou, Andrzej Wasowski, and Christian Kästner
- Subjects
Software bug ,Computer science ,Duplicate code ,business.industry ,Software maintainer ,Redundancy (engineering) ,Redundant code ,Software engineering ,business ,Maintenance engineering ,Fork (software development) - Abstract
Fork-based development is popular and easy to use, but makes it difficult to maintain an overview of the whole community when the number of forks increases. This may lead to redundant development where multiple developers are solving the same problem in parallel without being aware of each other. Redundant development wastes effort for both maintainers and developers. In this paper, we designed an approach to identify redundant code changes in forks as early as possible by extracting clues indicating similarities between code changes, and building a machine learning model to predict redundancies. We evaluated the effectiveness from both the maintainer’s and the developer’s perspectives. The result shows that we achieve 57–83% precision for detecting duplicate code changes from maintainer’s perspective, and we could save developers’ effort of 1.9–3.0 commits on average. Also, we show that our approach significantly outperforms existing state-of-art.
- Published
- 2019
44. A Liskov Substitution Based Hoarse Rules for Efficient Code Duplication on Object Oriented Systems
- Author
-
R. V. Sivabalan and R. S. Anoop Sreekumar
- Subjects
Computational Mathematics ,Object-oriented programming ,Theoretical computer science ,Computer science ,Programming language ,Duplicate code ,General Materials Science ,General Chemistry ,Electrical and Electronic Engineering ,Condensed Matter Physics ,Liskov substitution principle ,computer.software_genre ,computer - Published
- 2016
45. Semantic code clone detection for Internet of Things applications using reaching definition and liveness analysis
- Author
-
Rajkumar Tekchandani, Rajesh Bhatia, and Maninder Singh
- Subjects
Dead code ,business.industry ,Computer science ,Programming language ,Liveness ,020207 software engineering ,02 engineering and technology ,Reaching definition ,computer.software_genre ,Theoretical Computer Science ,Software ,Control flow ,Semantic equivalence ,Hardware and Architecture ,Duplicate code ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,Unreachable code ,business ,computer ,Information Systems ,Data-flow analysis - Abstract
Knowledge extraction from existing software resources for maintenance, re-engineering and bug removal through code clone detection is an integral part of most of the internet-enabled devices. Similar code fragments which are live at different locations are called code clones. These Internet-enabled devices are used for knowledge sharing and data extraction to execute various applications related to code clone detection. However, most of the existing semantic code clone detection techniques are unable to provide heuristic solution for problems such as statement reordering, inversion of control predicates and insertion of irrelevant statements which may cause a performance bottleneck in this environment. To address these issues, we propose a novel approach that finds semantic code clones in a program or procedure using data flow analysis on the basis of reaching definition and liveness analysis. The algorithm based on reaching definition and liveness analysis is designed to find similar code fragments which are structurally divergent, but semantically equivalent. The results obtained demonstrate that the proposed approach using reaching definition and liveness analysis is effective in detection of semantic code clones for various applications running on the Internet-enabled devices. We have found 5831 semantically equivalent clone pairs on subject systems taken from DeCapo benchmark after elimination of 29,029 dead codes/statements having 2,16,579 line of code (LOC).
- Published
- 2016
46. AUTOMATIC DETECTING AND REMOVAL DUPLICATE CODES CLONES
- Author
-
Shahenda Sarhan, Samir Elmougy, and Z. Al-Saffar
- Subjects
Source code ,Source lines of code ,Programming language ,Computer science ,media_common.quotation_subject ,Code smell ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Software maintenance ,computer.software_genre ,Code refactoring ,Duplicate code ,Plagiarism detection ,Software system ,computer ,media_common - Abstract
Code clones is considered now an important part of improving the overall design of software structure and software maintenance through making the source code more readable and more capable for maintenance. To remove code clones from a written code, refactoring technique could be used. Copying and pasting fragments of codes is a type of code clones that should be handled and has many practical applications such as software and project plagiarism detection clones and copyright infringements. To overcome this problem, we propose a computerized refactoring system to remove duplicate code clones. The simulation results of applying the proposed system showed that it increases the maintainability and quality of software system based on the total lines of code, blank lines and total methods count for the four used Java open source projects.
- Published
- 2016
47. The Evolution of Imperative Programming Paradigms as a Search for New Ways to Reduce Code Duplication
- Author
-
T Avacheva and A Prutzkow
- Subjects
Computer science ,Process (engineering) ,business.industry ,Imperative programming ,Duplicate code ,Paradigm shift ,Factor (programming language) ,Programming paradigm ,Software system ,Impossibility ,Software engineering ,business ,computer ,computer.programming_language - Abstract
The cause for imperative programming paradigm shift is the impossibility of developing software systems of a new level of complexity. We consider the evolution of programming paradigms: structured, procedural, and object-oriented. We demonstrate new ways of code duplication reducing have appeared with the shift of paradigm. We conclude the factor of code duplication reducing determines the direction of programming paradigm evolution. We discover the constraints, which were introduced in the paradigms, simplify the development of software systems. We conclude the new constraints allow the development of more complex software systems. The main reason for the code duplication is the low qualification of programmers. Therefore, in the process of learning programming, one should pay attention to code duplication and ways to reduce it.
- Published
- 2020
48. Refactorings for replacing dynamic instructions with static ones
- Author
-
Leonardo Montecchi, Ricardo Terra, Raphael Winckler de Bettio, Rafael S. Durelli, and Elder Rodrigues
- Subjects
Flexibility (engineering) ,Computer science ,Programming language ,business.industry ,Maintainability ,computer.software_genre ,Readability ,Software ,Code refactoring ,Duplicate code ,Code (cryptography) ,Programmer ,business ,computer - Abstract
Dynamic features offered by programming languages provide greater flexibility to the programmer (e.g., dynamic constructions of classes and methods) and reduction of duplicate code snippets. However, the unnecessary use of dynamic features may detract from the code in many ways, such as readability, comprehension, and maintainability of software. Therefore, this paper proposes 20 refactorings that replace dynamic instructions with static ones. In an evaluation on 28 open-source Ruby systems, we could refactor 743 of 1,651 dynamic statements (45%).
- Published
- 2018
49. A Study on the Method of Removing Code Duplication Using Code Template
- Author
-
Woochang Shin
- Subjects
Statement (computer science) ,Source code ,Programming language ,Computer science ,business.industry ,media_common.quotation_subject ,Software development ,020207 software engineering ,02 engineering and technology ,Program quality ,computer.software_genre ,Reduction (complexity) ,Code refactoring ,Duplicate code ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,business ,computer ,media_common - Abstract
In software development, it is common to use similar code redundantly in many places. However, source code duplication has been reported to adversely affect program quality and maintenance costs. In particular, when writing a program that reflects various conditions, an excessive number of branch (“if-else” or “switch”) statements is used; therefore, many of the statements executed for each condition are duplicated. In the present study, we propose a refactoring method for finding duplicate codes used in branch statement and removing them. Based on the proposed method, we also develop and test a prototype tool. The results of the tool test in case studies show that refactoring of the source code written by unskilled developers with the developed tool yields on average 10% reduction in the source code.
- Published
- 2018
50. Reducing Code Duplication by Identifying Fresh Domain Abstractions
- Author
-
Steven Klusener, Jeroen Ketema, Arjan J. Mooij, and Hans van Wezep
- Subjects
Reverse engineering ,Iterative methods ,Computer science ,Industrial application report ,02 engineering and technology ,computer.software_genre ,Original structures ,Codes ,Software ,Variation points ,Abstracting ,0202 electrical engineering, electronic engineering, information engineering ,Reverse engineering and re engineering ,Implementation ,C++ ,Application programs ,business.industry ,Programming language ,Adapter (computing) ,Reference design ,Software renovation ,Software evolution ,Software component ,Domain abstraction ,020207 software engineering ,020202 computer hardware & architecture ,Model based software engineering ,Duplicate code ,Computer software maintenance ,Component-based software engineering ,Reference designs ,business ,computer - Abstract
When software components are developed iteratively, code frequently evolves in an inductive manner: a unit (class, method, etc.) is created and is then copied and modified many times. Such development often happens when variation points and, hence, proper domain abstractions are initially unclear. As a result, there may be substantial amounts of code duplication, and the code may be difficult to understand and maintain, warranting a redesign. We apply a model-based process to semi-automatically redesign an inductively-evolved industrial adapter component written in C++: we use reverse engineering to obtain models of the component, and generate redesigned code from the models. Based on our experience, we propose to use three models to help recover understanding of inductively-evolved components, and transform the components into redesigned implementations. Guided by a reference design, a component's code is analyzed and a legacy model is extracted that captures the component's functionality in a form close to its original structure. The legacy model is then unfolded, creating a flat model which eliminates design decisions by focusing on functionality in terms of external interfaces. Analyzing the variation points of the flat model yields a redesigned model and fresh domain abstractions to be used in the new design of the component.
- Published
- 2018
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.