N-MTTL SI Model: Non-Intrusive Multi-Task Transfer Learning-Based Speech Intelligibility Prediction Model with Scenery Classification

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

249 Downloads (Pure)

Abstract

The application of speech enhancement algorithms for hearing
aids may not always be beneficial to increasing speech
intelligibility. Therefore, a prior environment classification
could be important. However, previous speech intelligibility
models do not provide any additional information regarding
the reason for a decrease in speech intelligibility. We propose
a unique non-intrusive multi-task transfer learning-based
speech intelligibility prediction model with scenery
classification (N-MTTL SI model). The solution combines a
Mel-spectrogram analysis of the degraded speech signal with
transfer learning and multi-task learning to provide
simultaneous speech intelligibility prediction (task 1) and
scenery classification of ten real-world noise conditions (task
2). The model utilises a pre-trained ResNet architecture as an
encoder for feature extraction. The prediction accuracy of the
N-MTTL SI model for both tasks is high. Specifically, RMSE
of speech intelligibility predictions for seen and unseen
conditions is 3.76% and 4.06%. The classification accuracy is
98%. In addition, the proposed solution demonstrates the
potential of using pre-trained deep learning models in the
domain of speech intelligibility prediction.
Original languageEnglish
Title of host publicationInterspeech
Publication statusPublished - 1 Sept 2021

Fingerprint

Dive into the research topics of 'N-MTTL SI Model: Non-Intrusive Multi-Task Transfer Learning-Based Speech Intelligibility Prediction Model with Scenery Classification'. Together they form a unique fingerprint.

Cite this