SEED Research & Announcements Blogs Publications Careers Contact Us Research & Announcements Blogs Publications Careers Contact Us

SCA 2022: Voice2Face Audio-Driven Facial and Tongue Rig Animations with cVAEs

This is a research presentation from the Eurographics Symposium on Computer Animation (SCA 2022). Authors: Mónica Villanueva Aylagas, Héctor Anadon Leon, Mattias Teye, and Konrad Tollmar.

Download the full research paper. (5.9 MB PDF)

In this paper, we present Voice2Face, a tool that generates facial and tongue animations directly from recorded speech using machine learning.

Our approach consists of two steps: a conditional Variational Autoencoder generates mesh animations from speech, while a separate module maps the animations to rig controller space. Our contributions include an automated method for speech style control, a method to train a model with data from multiple quality levels, and a method for animating the tongue. 

Unlike previous works, our model generates animations without speaker-dependent characteristics, while allowing speech style control.

We demonstrate through a user study that Voice2Face significantly outperforms a comparative state-of-the-art model, and our quantitative evaluation suggests that Voice2Face yields more accurate lip closure in speech with bilabials through our speech style optimization. Both evaluations also show that our data quality conditioning scheme outperforms both an unconditioned model and a model trained with a smaller high-quality dataset. Finally, the user study shows a preference for animations including the tongue. 

Evaluating Data-Driven Co-Speech Gestures of Embodied Conversational Agents through Real-Time Interaction

Related News

Evaluating Gesture Generation in a Large-Scale Open Challenge

SEED
May 9, 2024
This paper, published in Transactions on Graphics, reports on the second GENEA Challenge, a project to benchmark data-driven automatic co-speech gesture generation.

Filter-Adapted Spatio-Temporal Sampling for Real-Time Rendering

SEED
May 1, 2024
This paper, presented at I3D 2024, discusses how to tailor the frequencies of rendering noise to improve image denoising in real time rendering.

From Photo to Expression: Generating Photorealistic Facial Rigs

SEED
Apr 25, 2024
This presentation from GDC 2024 discusses how machine learning can improve facial animation.