Extracting speaker-specific information with a regularized siamese deep network

Ke Chen, Ahmad Salman

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    152 Downloads (Pure)

    Abstract

    Speech conveys different yet mixed information ranging from linguistic to speaker-specific components, and each of them should be exclusively used in a specific task. However, it is extremely difficult to extract a specific information component given the fact that nearly all existing acoustic representations carry all types of speech information. Thus, the use of the same representation in both speech and speaker recognition hinders a system from producing better performance due to interference of irrelevant information. In this paper, we present a deep neural architecture to extract speaker-specific information from MFCCs. As a result, a multi-objective loss function is proposed for learning speaker-specific characteristics and regularization via normalizing interference of non-speaker related information and avoiding information loss. With LDC benchmark corpora and a Chinese speech corpus, we demonstrate that a resultant speaker-specific representation is insensitive to text/languages spoken and environmental mismatches and hence outperforms MFCCs and other state-of-the-art techniques in speaker recognition. We discuss relevant issues and relate our approach to previous work.
    Original languageEnglish
    Title of host publicationAdvances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011|Adv. Neural Inf. Process. Syst.: Annu. Conf. Neural Inf. Process. Syst., NIPS
    Place of PublicationMA: Cambridage, U.S.A.
    PublisherMIT Press
    ISBN (Print)9781618395993
    Publication statusPublished - 2011
    Event25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011 - Granada
    Duration: 1 Jul 2011 → …

    Conference

    Conference25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
    CityGranada
    Period1/07/11 → …

    Fingerprint

    Dive into the research topics of 'Extracting speaker-specific information with a regularized siamese deep network'. Together they form a unique fingerprint.

    Cite this