TY - GEN
T1 - AV@CAR
T2 - 4th International Conference on Language Resources and Evaluation, LREC 2004
AU - Ortega, Alfonso
AU - Sukno, Federico
AU - Lleida, Eduardo
AU - Frangi, Alejandro
AU - Miguel, Antonio
AU - Buera, Luis
AU - Zacur, Ernesto
PY - 2004
Y1 - 2004
N2 - This paper describes the acquisition of the multichannel multimodal database AV@CAR for automatic audio-visual speech recognition in cars. Automatic speech recognition (ASR) plays an important role inside vehicles to keep the driver away from distraction. It is also known that visual information (lip-reading) can improve accuracy in ASR under adverse conditions as those within a car. The corpus described here is intended to provide training and testing material for several classes of audiovisual speech recognizers including isolated word system, word-spotting systems, vocabulary independent systems, and speaker dependent or speaker independent systems for a wide range of applications. The audio database is composed of seven audio channels including, clean speech (captured using a close talk microphone), noisy speech from several microphones placed on the overhead of the cabin, noise only signal coming from the engine compartment and information about the speed of the car. For the video database, a small video camera sensible to the visible and the near infrared bands is placed on the windscreen and used to capture the face of the driver. This is done under different light conditions both during the day and at night. Additionally, the same individuals are recorded in laboratory, under controlled environment conditions to obtain noise free speech signals, 2D images and 3D + texture face models.
AB - This paper describes the acquisition of the multichannel multimodal database AV@CAR for automatic audio-visual speech recognition in cars. Automatic speech recognition (ASR) plays an important role inside vehicles to keep the driver away from distraction. It is also known that visual information (lip-reading) can improve accuracy in ASR under adverse conditions as those within a car. The corpus described here is intended to provide training and testing material for several classes of audiovisual speech recognizers including isolated word system, word-spotting systems, vocabulary independent systems, and speaker dependent or speaker independent systems for a wide range of applications. The audio database is composed of seven audio channels including, clean speech (captured using a close talk microphone), noisy speech from several microphones placed on the overhead of the cabin, noise only signal coming from the engine compartment and information about the speed of the car. For the video database, a small video camera sensible to the visible and the near infrared bands is placed on the windscreen and used to capture the face of the driver. This is done under different light conditions both during the day and at night. Additionally, the same individuals are recorded in laboratory, under controlled environment conditions to obtain noise free speech signals, 2D images and 3D + texture face models.
UR - http://www.scopus.com/inward/record.url?scp=85026319444&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85026319444
T3 - Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004
SP - 763
EP - 766
BT - Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004
A2 - Xavier, Maria Francisca
A2 - Costa, Rute
A2 - Ferreira, Fatima
A2 - Lino, Maria Teresa
A2 - Silva, Raquel
PB - European Language Resources Association
Y2 - 26 May 2004 through 28 May 2004
ER -