Abstract
We present a novel approach to speech-driven facial animation using a non-parametric switching state space model based on Gaussian processes. The model is an extension of the shared Gaussian process dynamical model, augmented with switching states. Audio and visual data from a talking head corpus are jointly modelled using the proposed method. The switching states are found using variable length Markov models trained on labelled phonetic data. We also propose a synthesis technique that takes into account both previous and future phonetic context, thus accounting for coarticulatory effects in speech. © 2010 ACM.
Original language | English |
---|---|
Title of host publication | International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010|Int. Conf. Multimodal Interfaces Workshop Mach. Learn. Multimodal Interact., ICMI-MLMI |
Publisher | Association for Computing Machinery |
ISBN (Print) | 9781450304146 |
DOIs | |
Publication status | Published - 2010 |
Event | 1st International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010 - Beijing Duration: 1 Jul 2010 → … |
Conference
Conference | 1st International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010 |
---|---|
City | Beijing |
Period | 1/07/10 → … |
Keywords
- artificial talking head
- speech-driven facial animation
- visual speech synthesis