HyCode: A Code Similarity Assessment Tool Utilizing Recurrent Neural Networks

James Marcel A. ABAWAG; Aleczia S. TORDILLA; Joshua C. MARTINEZ

doi:10.58459/icce.2024.4877

Authors

James Marcel A. ABAWAG Department of Computer Science, Ateneo de Naga University Author
Aleczia S. TORDILLA Department of Computer Science, Ateneo de Naga University Author
Joshua C. MARTINEZ Department of Computer Science, Ateneo de Naga University Author

DOI:

https://doi.org/10.58459/icce.2024.4877

Abstract

Academic dishonesty, particularly source-code plagiarism, poses significant ethical challenges in educational institutions and online coding platforms. It undermines the integrity of the learning and teaching process as well as the credibility of students and institutions. To address these challenges, this study developed a code similarity assessment tool utilizing deep neural networks, specifically character-level recurrent neural networks (char-RNN) and long short-term memory networks (LSTM), to detect source-code plagiarism. It leverages the strengths of both models while minimizing their weaknesses, allowing it to learn and capture both low-level and short-term patterns as well as complex and long-term dependencies in the source code. The dataset used was mainly from the "IR Plag Dataset", and data augmentation and various preprocessing techniques were performed. The final model configuration of the hybrid neural network architecture resulted in training and validation accuracy of 99% and 90% , respectively. Its evaluation was conducted using various metrics such as precision, recall, and Fl -score. The hybrid neural network architecture achieved a precision of 0.94, a recall of 0.935, an Fl -score of 0.94, and a final accuracy of 93.75% In addition, the tool was also evaluated on real-world data and discovered to be capable of identifying a range of code similarities, providing assurance that the tool can effectively differentiate authentic or original work from work that may have been plagiarized. However, the evaluation also revealed the presence of false positives and negatives, which leaves room for improvement.

Downloads

Download data is not yet available.

HyCode: A Code Similarity Assessment Tool Utilizing Recurrent Neural Networks

Authors

DOI:

Abstract

Downloads

Downloads

Published

Conference Proceedings Volume

Section

How to Cite