A Comparative Study of Missing Value Imputation Methods for Education Data

Authors

  • Phimmarin KEERIN Faculty of Science and Technology, Pibulsongkram Rajabhat University, Thailand Author

Abstract

Missing data are often inevitable in real-world problems and indeed affect the overall result of research. Similar to other domains, missing values occurring in education data require a solid imputation to arrive at valid findings. As such, the objective of this research paper is to provide better understanding of the aforementioned issue as well as imputation methods, and to assess performance of benchmark alternatives on actual data. In particular, it aims to provide a comparative study, using various techniques of mean imputation, K-nearest neighbor (KNN) Imputation, Cluster-K-nearest neighbor (CKNN) Imputation, Local Least Square (LLS) Imputation, Cluster-base Local Least Square (CLLS) imputation, Iterated Local Least Square (ILLS) imputation and Bayesian Principal Component Analysis (BPCA) Imputation. The comparison is conducted on five real datasets of same sizes, under a missing completely at random (MCAR) assumption, and based on the evaluation metric of normalized root mean square error (NRMSE). The corresponding result suggests that BPCA and ILLS are two most effective imputation methods for these small-size datasets.

Downloads

Download data is not yet available.

Downloads

Published

2021-11-22

How to Cite

A Comparative Study of Missing Value Imputation Methods for Education Data. (2021). International Conference on Computers in Education. https://library.apsce.net/index.php/ICCE/article/view/4233