Towards Scalable Annotation of Math Word Problems:
Knowledge Component Tagging via LLMs and Sentence Embeddings

Chor Seng Tan; Chengwei Wei; Jung-jae Kim

Authors

Chor Seng Tan Institute for Infocomm Research, A*STAR Author
Chengwei Wei Institute for Infocomm Research, A*STAR Author
Jung-jae Kim Institute for Infocomm Research, A*STAR Author

Abstract

Mathematical word problems are a crucial component of mathematics education, requiring students to integrate multiple reasoning skills. Annotating these problems with knowledge components (KC) enables better personalized learning, adaptive tutoring, and AI-driven educational assessment. However, manual annotation is time-consuming and inconsistent, limiting the scalability of KC-based learning systems. In this work, we introduce a new labeled dataset of MWPs, derived from ASDiv and GSM8K, with KCs aligned to the Common Core mathematics framework. Using this dataset, we benchmark two different methods to perform automatic KC annotation without any labeled examples, namely LLM KC tagging and SBERT sentence embedding similarity scoring. Our results highlight key strengths and limitations of LLMs in this task, revealing challenges in consistency and reasoning alignment with human labels. We then show that SBERT-based similarity scoring underperforms LLM KC tagging, but can be significantly enhanced by combining the two methods, which addresses their respective limitations. This study provides critical insights into the feasibility of automated KC tagging, laying the foundation for future research in AI-assisted curriculum design and intelligent tutoring systems.

Downloads

Download data is not yet available.

Towards Scalable Annotation of Math Word Problems: Knowledge Component Tagging via LLMs and Sentence Embeddings

Authors

Abstract

Downloads

Downloads

Published

Conference Proceedings Volume

Section

How to Cite