AV@CAR: A Spanish multichannel multimodal corpus for in-vehicle automatic audio-visual speech recognition

Alfonso Ortega, Federico Sukno, Eduardo Lleida, Alejandro Frangi, Antonio Miguel, Luis Buera, Ernesto Zacur

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

This paper describes the acquisition of the multichannel multimodal database AV@CAR for automatic audio-visual speech recognition in cars. Automatic speech recognition (ASR) plays an important role inside vehicles to keep the driver away from distraction. It is also known that visual information (lip-reading) can improve accuracy in ASR under adverse conditions as those within a car. The corpus described here is intended to provide training and testing material for several classes of audiovisual speech recognizers including isolated word system, word-spotting systems, vocabulary independent systems, and speaker dependent or speaker independent systems for a wide range of applications. The audio database is composed of seven audio channels including, clean speech (captured using a close talk microphone), noisy speech from several microphones placed on the overhead of the cabin, noise only signal coming from the engine compartment and information about the speed of the car. For the video database, a small video camera sensible to the visible and the near infrared bands is placed on the windscreen and used to capture the face of the driver. This is done under different light conditions both during the day and at night. Additionally, the same individuals are recorded in laboratory, under controlled environment conditions to obtain noise free speech signals, 2D images and 3D + texture face models.

Original languageEnglish
Title of host publicationProceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004
EditorsMaria Francisca Xavier, Rute Costa, Fatima Ferreira, Maria Teresa Lino, Raquel Silva
PublisherEuropean Language Resources Association
Pages763-766
Number of pages4
ISBN (Electronic)2951740816, 9782951740815
Publication statusPublished - 2004
Event4th International Conference on Language Resources and Evaluation, LREC 2004 - Lisbon, Portugal
Duration: 26 May 200428 May 2004

Publication series

NameProceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004

Conference

Conference4th International Conference on Language Resources and Evaluation, LREC 2004
Country/TerritoryPortugal
CityLisbon
Period26/05/0428/05/04

Fingerprint

Dive into the research topics of 'AV@CAR: A Spanish multichannel multimodal corpus for in-vehicle automatic audio-visual speech recognition'. Together they form a unique fingerprint.

Cite this