LLM-enhanced Math Text Extracting and Keyword-based Hierarchical Labeling for Digital Learning Infrastructure

Taisei YAMAUCHI; Brendan FLANAGAN; Hiroaki OGATA

Authors

Taisei YAMAUCHI Graduate School of Informatics, Kyoto University, Japan Author
Brendan FLANAGAN Institute for Liberal Arts and Sciences, Kyoto University, Japan; Academic Center for Computing and Media Studies, Kyoto University, Japan Author
Hiroaki OGATA Academic Center for Computing and Media Studies, Kyoto University, Japan Author

Abstract

As education becomes increasingly digitized, organizing digital learning materials into curriculum-based units is essential but often burdensome for educators. This study proposes an automatic method for labeling mathematics problems in PDF format using large language models (LLMs) and curriculum keywords. Using OpenAI's o4-mini, we extracted text from MEXT-approved junior high school textbooks and exercise books with high accuracy (98.9% and 99.7%). Unit labels were then assigned by combining keyword-based filtering with embedding similarity (text-embedding-3-small). Compared with a baseline without keyword filtering, expert evaluation favored the keyword-based method (183 vs. 132 cases), confirming that keywords enhance classification accuracy. These results demonstrate that LLM-based extraction is practical for classroom use, requiring only minor manual corrections, and that unit-specific vocabulary contributes to accurate hierarchical labeling. Future work will extend this framework toward content management of LLM-generated materials and unified log analysis to promote personalized learning pathways.

Downloads

Download data is not yet available.

LLM-enhanced Math Text Extracting and Keyword-based Hierarchical Labeling for Digital Learning Infrastructure

Authors

Abstract

Downloads

Downloads

Published

Conference Proceedings Volume

Section

How to Cite