A Human-AI Collaborative Assessment of AI-Generated vs. Human-Created MCQ Distractors

Authors

  • Zifeng Liu University of Florida Author
  • Priyadharshini Ganapathy Prasad University of Florida Author
  • Bach Ngo The Frazer School Author
  • Xinyue Jiao New York University Author
  • Wanli Xing University of Florida Author

Abstract

Multiple-choice questions (MCQs) are a widely used and effective assessment method, with the quality of distractors being crucial for their effectiveness. Recent studies have explored the use of large language models (LLMs) to generate distractors. However, generating high-quality distractors remains challenging in subjects such as computer science and mathematics that requiring strong reasoning. While some research has investigated AI-generated distractors in these fields, evaluating their quality is difficult due to existing metrics primarily focusing on surface semantics and failing to capture the necessary reasoning. This study introduces a human-AI collaborative assessment approach to evaluate distractor quality. We applied this method to compare AI-generated and human-created distractors in two high school courses: programming (349 MCQs) and statistics (576 MCQs). The findings suggest that AI-generated distractors can be competitive with human-created ones in programming courses, but significant differences exist in the understand, analyze, and evaluate types of MCQs in statistics. This study provides a practical and scalable solution for integrating and evaluating AI-generated distractors in educational assessments.

Downloads

Download data is not yet available.

Downloads

Published

2025-12-01

How to Cite

A Human-AI Collaborative Assessment of AI-Generated vs. Human-Created MCQ Distractors. (2025). International Conference on Computers in Education. https://library.apsce.net/index.php/ICCE/article/view/5621