Utilization of Japanese Public Educational Data by Retrieval Augmented Generation for Policy Research

Kyosuke TAKAMI

doi:10.58459/icce.2024.4869

Authors

Kyosuke TAKAMI Education Data Science Center, National Institute for Educational Policy Research Author

DOI:

https://doi.org/10.58459/icce.2024.4869

Abstract

Public educational data, including government-conducted national surveys and research cases, are widely available to the public and intended for use in municipal policymaking. However, some of this data has been published in PDF format and remains underutilized. Therefore, this study leverages new tools in the era of generative Al, such as Large Language Model (LLM) and Retrieval Augmented Generation (RAG), to process 705 public educational document PDF files in Japanese. This process involves extracting text, vectorizing it, and generating responses, thereby presenting a case study of methods for effectively utilizing public educational data. This study revealed that without using the RAG, the outputs from GPT-3.5 and GPT-4 were verbose, while the use of the RAG led to more specific answers based on the retrieval results. Furthermore, GPT-4 can be used to evaluate the quality of retrieval results. These results demonstrate that LLMs can be applied to local educational knowledge in countries with local languages, such as Japanese, and suggest that previously underutilized educational data can be leveraged to aid in formulating educational policies.

Downloads

Download data is not yet available.

Utilization of Japanese Public Educational Data by Retrieval Augmented Generation for Policy Research

Authors

DOI:

Abstract

Downloads

Downloads

Published

Conference Proceedings Volume

Section

How to Cite