Calculating Test Item Similarity Using Latent Dirichlet Allocation
DOI:
https://doi.org/10.58459/icce.2013.450Abstract
In previous studies, we proposed methods for calculating similarity between test items to automatically retrieve similar test items in e-testing, and conducted experiments and evaluations of those methods. Test item similarity data is applicable to tasks such as automatically retrieving similar test items, automatically constructing item banks, visualizing structure between test items, optimizing amounts of test information, estimating the difficulty of unanswered test items, conducting computer adaptive testing, and creating test items. To improve the accuracy of retrieving similar test items , we propose a new method for calculating test item similarity that applies latent Dirichlet allocation (LDA), a generative probabilistic document model. We assume that each test item is represented by a vector using topics estimated by LDA, and the similarity between test items is calculated by cosine similarity. Applying LDA to calculate similarity between test items lowers the number of retrieved dissimilar test items, and creates vectors based on the relation between extracted terms. To accurately estimate topics in each test item, we perform preprocessing by identifying where important terms occur and enhancing the co-occurrence relation between terms. We use 250 test items from the Systems Administrator Examination to test the effectiveness of retrieving similar test items. The results indicate the effectiveness of the preprocessing steps, and of applying LDA to calculating test item similarity. We furthermore demonstrate the improvement in accuracy of retrieving similar test items by the proposed method in comparison with existing methods.