Multimodal Trait Scoring for Video Interviews Using Neural Models with Handcrafted Features and Trait-Attention

Authors

  • Taichi Kitajima The University of Electro-Communications Author
  • Masaki Uto The University of Electro-Communications Author

Abstract

Interview examinations are widely used in various educational assessments, including entrance exams, qualification tests, and job placement processes, to evaluate students' interpersonal skills, including communication and expressiveness. However, manual evaluation poses significant challenges, including a dependency on rater characteristics and substantial time and cost requirements. As a result, automated scoring methods that predict scores from video recordings of interviews using artificial intelligence technologies have recently attracted considerable attention. The primary limitations of traditional methods are twofold. First, they depend solely on either handcrafted or neural features, even though these two types of features are potentially complementary. Second, although traditional methods are typically designed as trait-scoring models, they overlook inter-trait correlations that could improve prediction accuracy. To address these limitations, this study proposes a trait-scoring model for interview examinations that predicts multiple trait scores by incorporating inter-trait correlations and combining handcrafted features with neural features derived from pre-trained language and computer vision models.

Downloads

Download data is not yet available.

Downloads

Published

2025-12-01

How to Cite

Multimodal Trait Scoring for Video Interviews Using Neural Models with Handcrafted Features and Trait-Attention. (2025). International Conference on Computers in Education. https://library.apsce.net/index.php/ICCE/article/view/5567