A Human-AI Collaborative Assessment of AI-Generated vs. Human-Created MCQ Distractors
Abstract
Multiple-choice questions (MCQs) are a widely used and effective assessment method, with the quality of distractors being crucial for their effectiveness. Recent studies have explored the use of large language models (LLMs) to generate distractors. However, generating high-quality distractors remains challenging in subjects such as computer science and mathematics that requiring strong reasoning. While some research has investigated AI-generated distractors in these fields, evaluating their quality is difficult due to existing metrics primarily focusing on surface semantics and failing to capture the necessary reasoning. This study introduces a human-AI collaborative assessment approach to evaluate distractor quality. We applied this method to compare AI-generated and human-created distractors in two high school courses: programming (349 MCQs) and statistics (576 MCQs). The findings suggest that AI-generated distractors can be competitive with human-created ones in programming courses, but significant differences exist in the understand, analyze, and evaluate types of MCQs in statistics. This study provides a practical and scalable solution for integrating and evaluating AI-generated distractors in educational assessments.Downloads
Download data is not yet available.
Downloads
Published
2025-12-01
Conference Proceedings Volume
Section
Articles
How to Cite
A Human-AI Collaborative Assessment of AI-Generated vs.
Human-Created MCQ Distractors. (2025). International Conference on Computers in Education. https://library.apsce.net/index.php/ICCE/article/view/5621