Measuring Understanding in Video-Based Learning
Abstract
Measuring students' mental states, such as their understanding during class, helps improve learning efficiency. Automatic approaches implement this idea without interrupting the class by sensing students' reactions through wearable sensors or cameras and applying machine learning models to analyze the data. However, most of the previous works lack adequate annotations of understanding based on students' reactions compared to the number of concepts conveyed during lessons. This paper proposes a scalable framework for efficiently constructing and annotating datasets. Additionally, we have collected a dataset consisting of posture, facial expression, and eye movement features, and benchmarked it for measuring understanding. The results show promising accuracy of 80% even in cases where not all features are available, demonstrating the potential for widespread adoption of the proposed framework.