Back to Search Start Over

Towards Speaker and Environmental Robustness in ASR: The HIWIRE Project

Authors :
Potamianos, Alexandros
Bouselmi, Ghazi
Dimitriadis, Dimitrios
Fohr, Dominique
Gemello, Roberto
Illina, Irina
Mana, Franco
Maragos, Petros
Matassoni, M.
Pitsikalis, Vassilis
Ramirez, J.
Sanchez-Soto, E.
Segura, J.
Svaizer, P.
Department of Electronic and Computer Engineering [Crete] (E.C.E)
Technical University of Crete [Chania]
Analysis, perception and recognition of speech (PAROLE)
INRIA Lorraine
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)
School of of Electrical and Computer Engineering [Athens] (School of E.C.E)
National Technical University of Athens [Athens] (NTUA)
LOQUENDO (LOQUENDO)
LOQUENDO
Istituto Trentino di Cultura (ITC)
Istituto Trentino di Cultura
Universidad de Granada = University of Granada (UGR)
HIWIRE
University of Granada [Granada]
Source :
SRIV'06 ITRW on Speech Recognition and Intrinsic Variation, SRIV'06 ITRW on Speech Recognition and Intrinsic Variation, May 2006, Toulouse, France
Publication Year :
2006
Publisher :
HAL CCSD, 2006.

Abstract

In this paper, we present algorithms for dealing with variability and mismatch in speech recognition due to environmental conditions and non-native speaker populations. The proposed algorithms cover a broad spectrum of ideas including robust feature extraction, feature compensation and speech enhancement. Specifically the following algorithms are presented and evaluated: beamforming for multi-microphone speech recognition, robust modulation and fractal features, Teager energy cepstrum coefficients, parametric feature equalization, speech enhancement, and acoustic modeling for non-native speech recognition. Also the problem of feature fusion and voice activity detection are discussed. Evaluation results on the AURORA databases under the auspices of the HIWIRE project show that significant gains can be achieved under adverse or mismatched conditions using these algorithms. Relative error rate reduction of up to 50% was shown for multi-microphone speech recognition, robust feature combination and speech enhancement. 30-40% reduction was shown for parametric feature equalization and non-native acoustic models.

Details

Language :
English
Database :
OpenAIRE
Journal :
SRIV'06 ITRW on Speech Recognition and Intrinsic Variation, SRIV'06 ITRW on Speech Recognition and Intrinsic Variation, May 2006, Toulouse, France
Accession number :
edsair.dedup.wf.001..26e5e4f5d4eef3788eb0dcad4d89954b