aids may not always be beneficial to increasing speech
intelligibility. Therefore, a prior environment classification
could be important. However, previous speech intelligibility
models do not provide any additional information regarding
the reason for a decrease in speech intelligibility. We propose
a unique non-intrusive multi-task transfer learning-based
speech intelligibility prediction model with scenery
classification (N-MTTL SI model). The solution combines a
Mel-spectrogram analysis of the degraded speech signal with
transfer learning and multi-task learning to provide
simultaneous speech intelligibility prediction (task 1) and
scenery classification of ten real-world noise conditions (task
2). The model utilises a pre-trained ResNet architecture as an
encoder for feature extraction. The prediction accuracy of the
N-MTTL SI model for both tasks is high. Specifically, RMSE
of speech intelligibility predictions for seen and unseen
conditions is 3.76% and 4.06%. The classification accuracy is
98%. In addition, the proposed solution demonstrates the
potential of using pre-trained deep learning models in the
domain of speech intelligibility prediction.
|Title of host publication||Interspeech|
|Publication status||Published - 1 Sep 2021|
FingerprintDive into the research topics of 'N-MTTL SI Model: Non-Intrusive Multi-Task Transfer Learning-Based Speech Intelligibility Prediction Model with Scenery Classification'. Together they form a unique fingerprint.
Munro, K., Millman, R., Lamb, W., Dawes, P., Plack, C., Stone, M., Kluk-De Kort, K., Moore, D., Morton, C., Prendergast, G., Couth, S., Schlittenlacher, J., Chilton, H., Visram, A., Dillon, H., Guest, H., Heinrich, A., Jackson, I., Littlejohn, J., Jones, L., Lough, M., Morgan, R., Perugia, E., Roughley, A., Short, A., Whiston, H., Wright, C., Saunders, G. & Kelly, C.