TY - JOUR
T1 - Acoustic analysis and digital signal processing for the assessment of voice quality
AU - Jalali-najafabadi, Farideh
AU - Gadepalli, Chaitanya
AU - Jarchi, Delaram
AU - Cheetham, Barry M.G.
N1 - Funding Information:
Farideh Jalali-najafabadi’s research is supported by an Medical Research Council (MRC)/University of Manchester Skills Development Fellowship, UK (grant number MR/R016615) . The authors acknowledge the contributions of Mr Jarrod Homer, Ms Frances Ascott, the SLT raters and the participants in voice recording sessions at Manchester Royal Infirmary. We also acknowledge help and advice from Prof. Gavin Brown and Prof. Mikel Lujan in the Department of Computer Science, Manchester University, UK.
Funding Information:
Farideh Jalali-najafabadi's research is supported by an Medical Research Council (MRC)/University of Manchester Skills Development Fellowship, UK (grant number MR/R016615). The authors acknowledge the contributions of Mr Jarrod Homer, Ms Frances Ascott, the SLT raters and the participants in voice recording sessions at Manchester Royal Infirmary. We also acknowledge help and advice from Prof. Gavin Brown and Prof. Mikel Lujan in the Department of Computer Science, Manchester University, UK.
Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/9/1
Y1 - 2021/9/1
N2 - Purpose: This paper addresses the application of digital signal processing (DSP) techniques to the robust measurement of acoustical features of the human voice. It then addresses the use of regression based techniques for the estimation of grade, roughness, breathiness, asthenia and strain, from these acoustical features. These five properties of voice are the basis of the widely used ‘GRBAS’ characterisation of voice disorders. Method: A well-known cross-correlation technique has been enhanced for more reliably measuring the fundamental frequency of vowels which is crucial for the derivation of acoustic features such as the harmonic-to-noise-ratio, jitter and shimmer. Regression techniques including K-Nearest Neighbour Regression and Multiple Linear Regression are employed for derivation of GRBAS properties. Results: Validation of the enhanced cross-correlation technique against well established published or commercially available techniques has been carried out by analysing synthetic sustained vowels. It was found that the enhanced method is capable of producing more reliable and robust measurements, in the context of our experiments, than the well-established Praat technique and Multi-Dimensional-Voice-Program (MDVP) software, especially in cases where the signal to noise ratio is low. Estimation of GRBAS components using our methods has been found to be in good agreement with traditional GRBAS scoring by speech and language therapists (SLTs). Conclusion: Voice analysis using DSP to extract acoustic features has the potential for objective and computerised GRBAS voice assessment. Such assessment can usefully augment GRBAS assessment as traditionally carried out subjectively by SLTs.
AB - Purpose: This paper addresses the application of digital signal processing (DSP) techniques to the robust measurement of acoustical features of the human voice. It then addresses the use of regression based techniques for the estimation of grade, roughness, breathiness, asthenia and strain, from these acoustical features. These five properties of voice are the basis of the widely used ‘GRBAS’ characterisation of voice disorders. Method: A well-known cross-correlation technique has been enhanced for more reliably measuring the fundamental frequency of vowels which is crucial for the derivation of acoustic features such as the harmonic-to-noise-ratio, jitter and shimmer. Regression techniques including K-Nearest Neighbour Regression and Multiple Linear Regression are employed for derivation of GRBAS properties. Results: Validation of the enhanced cross-correlation technique against well established published or commercially available techniques has been carried out by analysing synthetic sustained vowels. It was found that the enhanced method is capable of producing more reliable and robust measurements, in the context of our experiments, than the well-established Praat technique and Multi-Dimensional-Voice-Program (MDVP) software, especially in cases where the signal to noise ratio is low. Estimation of GRBAS components using our methods has been found to be in good agreement with traditional GRBAS scoring by speech and language therapists (SLTs). Conclusion: Voice analysis using DSP to extract acoustic features has the potential for objective and computerised GRBAS voice assessment. Such assessment can usefully augment GRBAS assessment as traditionally carried out subjectively by SLTs.
KW - Acoustic
KW - Fundamental frequency (f )
KW - HNR
KW - Jitter
KW - MDVP
KW - Praat
KW - SNR
KW - Shimmer
KW - Speech
UR - https://doi.org/10.1016/j.bspc.2021.103018
U2 - 10.1016/j.bspc.2021.103018
DO - 10.1016/j.bspc.2021.103018
M3 - Article
SN - 1746-8094
VL - 70
JO - Biomedical Signal Processing and Control
JF - Biomedical Signal Processing and Control
M1 - 103018
ER -