Learning with Virtual Avatars: Insights into Performance and Resource Needs
DOI:
https://doi.org/10.58459/icce.2024.5034Abstract
This paper presents a novel method for generating speaking avatars from text, images, and custom audio, designed to enhance digital communication and virtual interaction. Method involves two primary components: image processing and audio synthesis. It starts with a static image of a person and applying facial recognition and animation algorithms to create a dynamic, lifelike avatar capable of realistic mouth movements and expressions. Concurrently, it utilizes text-to-speech (TTS) technology to convert written text into natural-sounding speech, tailored to match the avatar's identity and intended emotional tone. By integrating these two elements, it is ensured that the avatar's lip movements are synchronized with the generated audio, resulting in a seamless and engaging user experience. Additionally, by leveraging powerful CPUs and GPUs, it is demonstrated how current technology enables efficient and sophisticated video and audio generation. The findings underscore the need for optimized resource management to achieve balanced processing speed and output quality. As a potential direction of future research, it is proposed to test virtual avatars in real educational environments, with an emphasis on evaluating the effect on learning, motivation and engagement of students.