1. Code Plagiarism Detection Method Based on Code Similarity and Student Behavior Characteristics
- Author
-
Guozheng Fang, Xuezhi Song, and Qiubo Huang
- Subjects
0209 industrial biotechnology ,Computer science ,business.industry ,Feature extraction ,Value (computer science) ,02 engineering and technology ,computer.software_genre ,020901 industrial engineering & automation ,Similarity (network science) ,ComputingMilieux_COMPUTERSANDEDUCATION ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,020201 artificial intelligence & image processing ,Plagiarism detection ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
We proposed a plagiarism detection approach based on code similarity and student behavior characteristics in educational scenarios. The traditional plagiarism check is based on the code only, which enables that students can escape inspection by modifying a small amount of code. We proposed that if the behavioral characteristics of students when submitting code can be considered, the suspected plagiarism can be more accurately identified. We proposed the concept of code similarity concentration (SCD) with reference to the Gini coefficient idea. SCD can reflect the similarity distribution between all the codes submitted by a student and others' codes. A large value of SCD means that a student's codes are always the most similar to the codes of some particular classmates. In addition, we also extracted other features to help detection. Finally, we classify the plagiarism detection problem as a binary classification problem and use LightGBM to make decisions. The experimental results show that the accuracy is close to 99% and f1-score is close to 98%.
- Published
- 2020
- Full Text
- View/download PDF