A Comparative Study of Missing Value Imputation Methods for Education Data
Abstract
Missing data are often inevitable in real-world problems and indeed affect the overall result of research. Similar to other domains, missing values occurring in education data require a solid imputation to arrive at valid findings. As such, the objective of this research paper is to provide better understanding of the aforementioned issue as well as imputation methods, and to assess performance of benchmark alternatives on actual data. In particular, it aims to provide a comparative study, using various techniques of mean imputation, K-nearest neighbor (KNN) Imputation, Cluster-K-nearest neighbor (CKNN) Imputation, Local Least Square (LLS) Imputation, Cluster-base Local Least Square (CLLS) imputation, Iterated Local Least Square (ILLS) imputation and Bayesian Principal Component Analysis (BPCA) Imputation. The comparison is conducted on five real datasets of same sizes, under a missing completely at random (MCAR) assumption, and based on the evaluation metric of normalized root mean square error (NRMSE). The corresponding result suggests that BPCA and ILLS are two most effective imputation methods for these small-size datasets.Downloads
Download data is not yet available.
Downloads
Published
2021-11-22
Conference Proceedings Volume
Section
Articles
How to Cite
A Comparative Study of Missing Value Imputation Methods for Education Data. (2021). International Conference on Computers in Education. https://library.apsce.net/index.php/ICCE/article/view/4233