Domain adaptation via adversarial learning for speech emotion recognition

  • Hao Zhou

Student thesis: Phd

Abstract

Speech emotion recognition plays an important role in creating more intelligent agents and systems. A lack of suitable speech emotion data, however, often occurs and hinders building the practical systems. In order to tackle this issue, knowledge transfer or domain adaptation, has emerged as a promising solution, which features leveraging a related information-rich source to help optimize the performance on the target task. In spite of the great progress of adversarial learning based domain adaptation techniques in computer vision, so far rare works have attempted to apply these advanced techniques on speech emotion recognition. This project explores whether and how adversarial learning can be used to eliminate the divergence or domain shift that exists in speech emotion data. We particularly address the scenario of supervised domain adaptation (SDA), where only very limited labelled data from the target domain are available. We propose Class-wise Adversarial Domain Adaptation (CADA) to reduce the domain shift for all common classes between the target and source domains via adversarial learning. Different from general practices, CADA combines the class discriminator and domain discriminator into one architecture, and the training algorithm is straightforward with either multi-layer perceptrons or deep neural networks as the basis of the model. We also extend CADA to the unsupervised scenario when only a few unlabelled target-domain data are available. We systematically estimate CADA with real-world speech emotion datasets under many different practical settings and demonstrate the effectiveness of CADA with an advantage over ordinary fine-tuning technique and the state-of-the-art adversarial-learning based domain adaptation approach.
Date of Award8 Aug 2021
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorXiaojun Zeng (Co Supervisor) & Ke Chen (Main Supervisor)

Keywords

  • adversarial learning
  • domain adaptation
  • speech emotion recognition
  • domain shift

Cite this

'