51. Meta-Heuristic Guided Feature Optimization for Enhanced Authorship Attribution in Java Source Code
- Author
-
Bilal Al-Ahmad, Nailah Al-Madi, Abdullah Alzaqebah, Rami S. Alkhawaldeh, Khaled Aldebei, Md. Faisal Kabir, Ismail Altaharwa, Mua'ad Abu-Faraj, and Ibrahim Aljarah
- Subjects
Evolutionary computation ,data mining ,feature selection ,java ,source code ,authorship attribution ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Source code authorship attribution is the task of identifying who develops the code based on learning based on the programmer style. It is one of the critical activities which used extensively in different aspects such as computer security, computer law, and plagiarism. This paper attempts to investigate source code authorship attribution by capturing natural language aspects of the code rather than only using minimal set of syntactic and stylistic code features as explored in the previous literature. It proposes an evolutionary feature selection model to improve the accuracy of authorship attribution by implementing two language models (uni-gram and bi-gram). The proposed approach uses K-Nearest Neighbor as a classifier and Genetic Algorithm as a feature selection technique. Two experiments have been demonstrated on a public Authorship Attribution dataset on GitHub, the experiments include various evolutionary feature selection models. Notably, the obtained results in both experiments were compared with the related studies, and show a significant improvement in terms of accuracy.
- Published
- 2023
- Full Text
- View/download PDF