A Human-AI Collaborative Assessment of AI-Generated vs.
Human-Created MCQ Distractors

Zifeng Liu; Priyadharshini Ganapathy Prasad; Bach Ngo; Xinyue Jiao; Wanli Xing

Authors

Zifeng Liu University of Florida Author
Priyadharshini Ganapathy Prasad University of Florida Author
Bach Ngo The Frazer School Author
Xinyue Jiao New York University Author
Wanli Xing University of Florida Author

Abstract

Multiple-choice questions (MCQs) are a widely used and effective assessment method, with the quality of distractors being crucial for their effectiveness. Recent studies have explored the use of large language models (LLMs) to generate distractors. However, generating high-quality distractors remains challenging in subjects such as computer science and mathematics that requiring strong reasoning. While some research has investigated AI-generated distractors in these fields, evaluating their quality is difficult due to existing metrics primarily focusing on surface semantics and failing to capture the necessary reasoning. This study introduces a human-AI collaborative assessment approach to evaluate distractor quality. We applied this method to compare AI-generated and human-created distractors in two high school courses: programming (349 MCQs) and statistics (576 MCQs). The findings suggest that AI-generated distractors can be competitive with human-created ones in programming courses, but significant differences exist in the understand, analyze, and evaluate types of MCQs in statistics. This study provides a practical and scalable solution for integrating and evaluating AI-generated distractors in educational assessments.

Downloads

Download data is not yet available.

A Human-AI Collaborative Assessment of AI-Generated vs. Human-Created MCQ Distractors

Authors

Abstract

Downloads

Downloads

Published

Conference Proceedings Volume

Section

How to Cite