The Effect of Feature Reliability on the Generalization of Machine Learning Models in Educational Data
DOI:
https://doi.org/10.58459/icce.2024.4818Abstract
Reliability quantifies the extent of measurement errors in the observed feature scores and is an important quality indicator of measurements in educational research. However, the effect of feature reliability is underexplored in educational studies that use machine learning techniques. Understanding this effect is critical because the most of common features in education are contaminated by measurement errors. Recent research has revealed that the low reliability of features damages the prediction accuracy of machine learning models. The current study proposes that feature reliability also influences the generalization of machine learning models. This paper provides mathematical proof for the notion and further supports it via analyses on two empirical educational datasets. The results of data analyses also indicated that the effect of feature reliability on model generalization was moderated by the model complexity but not related to the model accuracy. Approaches to mitigate the impact of feature reliability are discussed.