HyCode: A Code Similarity Assessment Tool Utilizing Recurrent Neural Networks
DOI:
https://doi.org/10.58459/icce.2024.4877Abstract
Academic dishonesty, particularly source-code plagiarism, poses significant ethical challenges in educational institutions and online coding platforms. It undermines the integrity of the learning and teaching process as well as the credibility of students and institutions. To address these challenges, this study developed a code similarity assessment tool utilizing deep neural networks, specifically character-level recurrent neural networks (char-RNN) and long short-term memory networks (LSTM), to detect source-code plagiarism. It leverages the strengths of both models while minimizing their weaknesses, allowing it to learn and capture both low-level and short-term patterns as well as complex and long-term dependencies in the source code. The dataset used was mainly from the "IR Plag Dataset", and data augmentation and various preprocessing techniques were performed. The final model configuration of the hybrid neural network architecture resulted in training and validation accuracy of 99% and 90% , respectively. Its evaluation was conducted using various metrics such as precision, recall, and Fl -score. The hybrid neural network architecture achieved a precision of 0.94, a recall of 0.935, an Fl -score of 0.94, and a final accuracy of 93.75% In addition, the tool was also evaluated on real-world data and discovered to be capable of identifying a range of code similarities, providing assurance that the tool can effectively differentiate authentic or original work from work that may have been plagiarized. However, the evaluation also revealed the presence of false positives and negatives, which leaves room for improvement.