Exploring speaker-specific characteristics with deep learning

Ahmad Salman, Ke Chen

    Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

    Abstract

    Speech signals convey different types of information which vary from linguistic to speaker-specific and should be used in different tasks. However, it is hard to extract a special type of information such that nearly all acoustic representations of speech present all kinds of information as a whole. The use of the same representation in different tasks creates a difficulty in achieving good performance in either speech or speaker recognition. In this paper, we present a deep neural architecture to explore speaker-specific characteristics from popular Mel-frequency cepstral coefficients. For learning, we propose an objective function consisting of contrastive cost in terms of speaker similarity and dissimilarity as well as data reconstruction cost used as regularization to normalize non-speaker related information. Learning deep architecture is done by a greedy layerwise local unsupervised training for initialization and a global supervised discriminative training for extracting a speaker-specific representation. By means of two narrow-band benchmark corpora, we demonstrate that our deep architecture generates a robust overcomplete speech representation in characterizing various speakers and the use of this new representation yields a favorite performance in speaker verification. © 2011 IEEE.
    Original languageEnglish
    Title of host publicationProceedings of the International Joint Conference on Neural Networks|Proc Int Jt Conf Neural Networks
    PublisherIEEE
    Pages103-110
    Number of pages7
    ISBN (Print)9781457710865
    DOIs
    Publication statusPublished - 2011
    Event2011 International Joint Conference on Neural Network, IJCNN 2011 - San Jose, CA
    Duration: 1 Jul 2011 → …
    http://dx.doi.org/10.1109/IJCNN.2011.6033393

    Conference

    Conference2011 International Joint Conference on Neural Network, IJCNN 2011
    CitySan Jose, CA
    Period1/07/11 → …
    Internet address

    Fingerprint

    Dive into the research topics of 'Exploring speaker-specific characteristics with deep learning'. Together they form a unique fingerprint.

    Cite this