Back to Search Start Over

Software plagiarism detection in multiprogramming languages using machine learning approach.

Authors :
Ullah, Farhan
Wang, Junfeng
Farhan, Muhammad
Habib, Masood
Khalid, Shehzad
Source :
Concurrency & Computation: Practice & Experience; Feb2021, Vol. 33 Issue 4, p1-12, 12p
Publication Year :
2021

Abstract

Summary: The Software plagiarism, which arises the problem of software piracy is a growing major concern nowadays. It is a serious risk to the software industry that gives huge economic damages every year. The customers may develop a modified version of the original software in other types of programming languages. Furthermore, the plagiarism detection in different types of source codes is a challenging task because each source code may have specific syntax rules. In this paper, we proposed a methodology for software plagiarism detection in multiprogramming languages based on machine learning approaches. The Principal Component Analysis (PCA) is applied for features extraction from source codes without losing the actual information. It extracts features by factor analysis and converts the dataset into normalized linear principal components which are further useful for predictions analysis. Then, the multinomial logistic regression model (MLR) is applied to these components to classify the source codes documents based on predictions. It gives the generalization of logistic regression to handle multiclass problems. Further, the predictors' performance in MLR is evaluated by 2 tailed z test. To apply the experiment, the dataset is collected in five different and popular languages, ie, C, C++, Java, C#, and Python. Each programming language taken in two different case studies, ie, binary search and Stack. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15320626
Volume :
33
Issue :
4
Database :
Complementary Index
Journal :
Concurrency & Computation: Practice & Experience
Publication Type :
Academic Journal
Accession number :
148305897
Full Text :
https://doi.org/10.1002/cpe.5000