Learning Algorithm Implementation Structures for Multilabel Classification via CodeBERT

Authors

  • Karl Frederick ROLDAN Undergraduate, Ateneo de Naga University, Philippines Author
  • Gerd Lowell JANA Undergraduate, Ateneo de Naga University, Philippines Author
  • John Kenneth LESABA Undergraduate, Ateneo de Naga University, Philippines Author
  • Joshua MARTINEZ Thesis Advisor, Ateneo de Naga University, Philippines Author

Abstract

Task constraint feedback is the collective name for any kind of feedback system that checks whether problem-defined constraints were fulfilled by students upon submission of work. This can be as simple as checking if certain programming constructs exist, or if a specific algorithm or data structure required by the problem is fulfilled. Most of these systems use static analysis (Fischer, 2006; Gotel, 2008) or natural language processing techniques (Lane, 2005) to generate feedback. A transformer is a neural network for sequence processing, such as natural languages. Previous work has shown that transformers can be generalized for programming language tasks such as code summarization. In this study, we used the CodeBERT transformer to classify or tag algorithms implemented in some code snippets to check constraint satisfaction. Using a custom dataset containing source code aiming to implement algorithms, we show that CodeBERT is capable of learning structures of how code is implemented regardless of how a programmer names the code. Averaging each label’s f1-score, the model was able to obtain an average of 0.85, which showed promising results in the dataset.

Downloads

Download data is not yet available.

Downloads

Published

2022-11-28

How to Cite

Learning Algorithm Implementation Structures for Multilabel Classification via CodeBERT. (2022). International Conference on Computers in Education. https://library.apsce.net/index.php/ICCE/article/view/4457