Learning Algorithm Implementation Structures for Multilabel Classification via CodeBERT

Karl Frederick ROLDAN; Gerd Lowell JANA; John Kenneth LESABA; Joshua MARTINEZ

Authors

Karl Frederick ROLDAN Undergraduate, Ateneo de Naga University, Philippines Author
Gerd Lowell JANA Undergraduate, Ateneo de Naga University, Philippines Author
John Kenneth LESABA Undergraduate, Ateneo de Naga University, Philippines Author
Joshua MARTINEZ Thesis Advisor, Ateneo de Naga University, Philippines Author

Abstract

Task constraint feedback is the collective name for any kind of feedback system that checks whether problem-defined constraints were fulfilled by students upon submission of work. This can be as simple as checking if certain programming constructs exist, or if a specific algorithm or data structure required by the problem is fulfilled. Most of these systems use static analysis (Fischer, 2006; Gotel, 2008) or natural language processing techniques (Lane, 2005) to generate feedback. A transformer is a neural network for sequence processing, such as natural languages. Previous work has shown that transformers can be generalized for programming language tasks such as code summarization. In this study, we used the CodeBERT transformer to classify or tag algorithms implemented in some code snippets to check constraint satisfaction. Using a custom dataset containing source code aiming to implement algorithms, we show that CodeBERT is capable of learning structures of how code is implemented regardless of how a programmer names the code. Averaging each label’s f1-score, the model was able to obtain an average of 0.85, which showed promising results in the dataset.

Downloads

Download data is not yet available.

Learning Algorithm Implementation Structures for Multilabel Classification via CodeBERT

Authors

Abstract

Downloads

Downloads

Published

Conference Proceedings Volume

Section

How to Cite