Methods of Balancing Model Explainability and Performance in Identifying At-Risk Students
DOI:
https://doi.org/10.58459/icce.2024.4972Abstract
This study will explore and experiment with various combinations of methods to handle data imbalance in order to address the common issue of insufficient minority samples in at-risk student prediction. Additionally, we will examine the purpose of applying computer tools to educational issues and emphasize the necessity of adhering to models with high transparency and explainability, ensuring that the decision-making process can be transparent and comprehensive in the context of learning analytics. After comparing model performance, we selected the logistic regression model combined with correlation analysis and threshold adjustment, which showed outstanding performance in UAR, G-means, and other evaluation metrics. We will analyze the reasons behind students' academic performance based on the feature importance ranking from the model, thereby establishing a high-performance and high- transparency benchmark model for the LBLS593 dataset.