Back to Search Start Over

Mask Estimation for Missing Data Speech Recognition Based on Statistics of Binaural Interaction.

Authors :
Harding, Sue
Barker, Jon
Brown, Guy J.
Source :
IEEE Transactions on Audio, Speech & Language Processing; Jan2006, Vol. 14 Issue 1, p58-67, 10p, 2 Charts, 9 Graphs
Publication Year :
2006

Abstract

This paper describes a perceptually motivated computational auditory scene analysis (CASA) system that combines sound separation according to spatial location with the "missing data" approach for robust speech recognition in noise. Missing data time-frequency masks are created using probability distributions based on estimates of interaural time and level differences (ITD and ILD) for mixed utterances in reverberated conditions; these masks indicate which regions of the spectrum constitute reliable evidence of the target speech signal. A number of experiments compare the relative efficacy of the binaural cues when used individually and in combination. We also investigate the ability of the system to generalize to acoustic conditions not encountered during training. Performance on a continuous digit recognition task using this method is found to be good, even in a particularly challenging environment with three concurrent male talkers. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15587916
Volume :
14
Issue :
1
Database :
Complementary Index
Journal :
IEEE Transactions on Audio, Speech & Language Processing
Publication Type :
Academic Journal
Accession number :
23172971
Full Text :
https://doi.org/10.1109/TSA.2005.860354