Visual speech synthesis by modelling coarticulation dynamics using a non-parametric switching state-space model

Salil Deena, Shaobo Hou, Aphrodite Galata

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We present a novel approach to speech-driven facial animation using a non-parametric switching state space model based on Gaussian processes. The model is an extension of the shared Gaussian process dynamical model, augmented with switching states. Audio and visual data from a talking head corpus are jointly modelled using the proposed method. The switching states are found using variable length Markov models trained on labelled phonetic data. We also propose a synthesis technique that takes into account both previous and future phonetic context, thus accounting for coarticulatory effects in speech. © 2010 ACM.
    Original languageEnglish
    Title of host publicationInternational Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010|Int. Conf. Multimodal Interfaces Workshop Mach. Learn. Multimodal Interact., ICMI-MLMI
    PublisherAssociation for Computing Machinery
    ISBN (Print)9781450304146
    DOIs
    Publication statusPublished - 2010
    Event1st International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010 - Beijing
    Duration: 1 Jul 2010 → …

    Conference

    Conference1st International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, ICMI-MLMI 2010
    CityBeijing
    Period1/07/10 → …

    Keywords

    • artificial talking head
    • speech-driven facial animation
    • visual speech synthesis

    Fingerprint

    Dive into the research topics of 'Visual speech synthesis by modelling coarticulation dynamics using a non-parametric switching state-space model'. Together they form a unique fingerprint.

    Cite this