Identifying At-risk Students from Course-specific Predictive Analytics
DOI:
https://doi.org/10.58459/icce.2019.524Abstract
Identifying at-risk students in a large class of an engineering mathematics course during the delivery of teaching and learning activities is not an easy task to be accomplished by many instructors, particularly in the first few weeks of their studies. In the paper, course-specific predictive analytics, called the multiple linear regression model, the logistic regression model and the classification and regression tree (CART) model are trained, tested and compared with the use of LMS data in the first semester of the academic year 2017-18 such as the level of achievements in online class activities, the mini-project, the mid-term test, assignments, and the final examination for classifying at-risk students as early as possible during the course of study. A feature selection method is used to select statistically significant variables in the development of multiple linear regression and logistic regression models for enhancing the generalizability of both models. It is found that 3 key variables such as the level of achievements in the 6th online class activity, the mid-term test and assignment 2, which may have pedagogically meaningful information, are crucial for classifying at-risk students. Despite the highest accuracy of the CART model, the logistic regression model significantly outperforms the multiple linear regression and the CART models in terms of the recall and f-measure of the testing set. Instead of selecting 3 key variables, the present logistic regression model which only comprises 2 statistically significant variables such as the level of achievements in the 6th online class activity and the mid-term test can be employed to identify at-risk students for early intervention of their studies once the results of the mid-term test and the 6 th online class activity are made available at the end of week 7.