TY - GEN
T1 - Lip Reading for robust speech recognition on embedded devices
AU - Guitarte Pérez, Jesús F.
AU - Frangi, Alejandro F.
AU - Solano, Eduardo Lleida
AU - Lukas, Klaus
PY - 2005
Y1 - 2005
N2 - In this article a complete audio-visual speech recognition system suitable for embedded devices is presented. As visual feature extraction algorithms Active Shape Models (ASM) and Discrete Cosine transformation (DCT) have been investigated and discussed for an embedded implementation. The audio-visual information integration has also been designed by taking into account device limitations. It is well known that the use of visual cues improves the recognition results especially in scenarios with high level of acoustical noise. We wanted to compare the performance of Lip Reading and the conventional Noise Reduction systems in these degraded scenarios, as well as the combination of both kinds of solutions. Important improvements are obtained especially for non-stationary background noises like voice interference, car accelerations or indicators clicks. For this kind of noises Lip Reading outperforms the results obtained with conventional Noise Reduction technologies.
AB - In this article a complete audio-visual speech recognition system suitable for embedded devices is presented. As visual feature extraction algorithms Active Shape Models (ASM) and Discrete Cosine transformation (DCT) have been investigated and discussed for an embedded implementation. The audio-visual information integration has also been designed by taking into account device limitations. It is well known that the use of visual cues improves the recognition results especially in scenarios with high level of acoustical noise. We wanted to compare the performance of Lip Reading and the conventional Noise Reduction systems in these degraded scenarios, as well as the combination of both kinds of solutions. Important improvements are obtained especially for non-stationary background noises like voice interference, car accelerations or indicators clicks. For this kind of noises Lip Reading outperforms the results obtained with conventional Noise Reduction technologies.
UR - http://www.scopus.com/inward/record.url?scp=33646794355&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2005.1415153
DO - 10.1109/ICASSP.2005.1415153
M3 - Conference contribution
AN - SCOPUS:33646794355
SN - 0780388747
SN - 9780780388741
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 473
EP - 476
BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PB - IEEE
T2 - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Y2 - 18 March 2005 through 23 March 2005
ER -