Learning with Caution: Assessing LLM Performance on Answerable and Unanswerable Questions
Abstract
In the age of Artificial Intelligence, Large Language Models (LLMs) have become mighty question-answering (QA) tools, which increasingly shape the way students find and use information. Despite their outstanding performance on a wide range of domains, their propensity to "hallucinate" or make assertive but wrong answers turns their dependability in an educational setting into a point of worry. This research examines whether students can rely on LLMs when asking academic queries. We focus on the SQuAD 2.0 dataset, which incorporates both answerable and explicitly unanswerable queries, to assess the capability of state-of-the-art open-source LLMs in distinguishing correct answers from instances where no valid response is available. Particularly, experiments with various state-of-the-art 7-8 billion parameter models on representative validation samples from the SQuAD 2.0 dataset show strengths as well as limitations in state-of-the-art practices. Our results underscore the need for ethical and interpretable AI in learning, where avoiding dissemination of erroneous information is as vital as furnishing accurate responses. This effort helps toward developing guidelines that support safe LLM deployment within student learning environments.Downloads
Download data is not yet available.
Downloads
Published
2025-12-01
Conference Proceedings Volume
Section
Articles
How to Cite
Learning with Caution: Assessing LLM Performance on Answerable and Unanswerable Questions. (2025). International Conference on Computers in Education. https://library.apsce.net/index.php/ICCE/article/view/5666