Multimodal Trait Scoring for Video Interviews Using Neural
Models with Handcrafted Features and Trait-Attention

Taichi Kitajima; Masaki Uto

Authors

Taichi Kitajima The University of Electro-Communications Author
Masaki Uto The University of Electro-Communications Author

Abstract

Interview examinations are widely used in various educational assessments, including entrance exams, qualification tests, and job placement processes, to evaluate students' interpersonal skills, including communication and expressiveness. However, manual evaluation poses significant challenges, including a dependency on rater characteristics and substantial time and cost requirements. As a result, automated scoring methods that predict scores from video recordings of interviews using artificial intelligence technologies have recently attracted considerable attention. The primary limitations of traditional methods are twofold. First, they depend solely on either handcrafted or neural features, even though these two types of features are potentially complementary. Second, although traditional methods are typically designed as trait-scoring models, they overlook inter-trait correlations that could improve prediction accuracy. To address these limitations, this study proposes a trait-scoring model for interview examinations that predicts multiple trait scores by incorporating inter-trait correlations and combining handcrafted features with neural features derived from pre-trained language and computer vision models.

Downloads

Download data is not yet available.

Multimodal Trait Scoring for Video Interviews Using Neural Models with Handcrafted Features and Trait-Attention

Authors

Abstract

Downloads

Downloads

Published

Conference Proceedings Volume

Section

How to Cite