Abstract
In Emotion Recognition in Conversations (ERC), the emotions of target utterances are closely dependent on their context. Therefore, existing works train the model to generate the response of the target utterance, which aims to recognise emotions leveraging contextual information. However, adjacent response generation ignores long-range dependencies and provides limited affective information in many cases. In addition, most ERC models learn a unified distributed representation for each utterance, which lacks interpretability and robustness. To address these issues, we propose a VAD -disentangled V ariational A uto E ncoder (VAD-VAE), which first introduces a target utterance reconstruction task based on Variational Autoencoder, then disentangles three affect representations Valence-Arousal-Dominance (VAD) from the latent space. We also enhance the disentangled representations by introducing VAD supervision signals from a sentiment lexicon and minimising the mutual information between VAD distributions. Experiments show that VAD-VAE outperforms the state-of-the-art model on two datasets. Further analysis proves the effectiveness of each proposed module and the quality of disentangled VAD representations. The code is available at https://github.com/SteveKGYang/VAD-VAE .
Original language | English |
---|---|
Number of pages | 12 |
Journal | IEEE Transactions on Affective Computing |
DOIs | |
Publication status | E-pub ahead of print - 25 May 2023 |
Keywords
- Context modeling
- Decoding
- Disentangled Representations
- Emotion Recognition in Conversations
- Emotion recognition
- Gaussian distribution
- Hidden Markov models
- Oral communication
- Task analysis
- Valence-Arousal-Dominance
- Variational Autoencoder