Facial expressions are one of the most practical and straightforward ways to communicate emotions. Facial Expression Recognition has been used in lots of fields such as human behaviour understanding and health monitoring. Deep learning models can achieve excellent performance in facial expression recognition tasks. As these deep neural networks have very complex nonlinear structures, when the model makes a prediction, it is not easy for human users to understand what is the basis for the model’s prediction. Specifically, we do not know which facial units contribute to the classification more or less. Developing affective computing models with more explainable and transparent feedback for human interactors is essential for a trustworthy human-robot interaction. Comparing to “white-box” approaches, “black-box” approaches using deep neural networks, which have advantages in terms of overall accuracy but lack reliability and explainability. In this work, we introduce a multimodal affective human-robot interaction framework, with visualbased and verbal-based explanation, by Layer Wise Relevance Propagation (LRP) and Local Intepretable Mode-Agnostic Explanation (LIME). The proposed framework has been tested on the KDEF dataset, and in human-robot interaction experiments with the Pepper robot. This experimental evaluation shows the benefits of linking deep learning emotion recognition systems with explainable strategies.
|Title of host publication||International Conference on Social Robotics 2022|
|Publication status||Accepted/In press - 14 Oct 2022|