Start Over

Monaural multi-talker speech recognition using factorial speech processing models.

Authors :: Khademian, Mahdi
Homayounpour, Mohammad Mehdi
Source :: Speech Communication. Apr2018, Vol. 98, p1-16. 16p.
Publication Year :: 2018
Abstract: A Pascal challenge entitled monaural speech separation and recognition challenge was developed, targeting the problem of robust automatic speech recognition against speech-like noises which significantly degrade the performance of automatic speech recognition systems. In this challenge, two competing speakers say a simple command simultaneously and the objective is to recognize speech of the target speaker. Surprisingly, a team from IBM research could achieve performance better than human listeners on this task during the challenge. The IBM system consists of an intermediate speech separation and two single-talker speech recognition modules. This paper reconsiders the recognition task of this challenge based on gain adapted factorial speech processing models. It develops a joint-token passing algorithm for direct joint-decoding of target and masker speakers’ mixed-signals, simultaneously. It uses maximum uncertainty during the joint-decoding, which cannot be used in the two-phased IBM system. This paper provides a detailed derivation of inference on these models based on the general inference procedures of probabilistic graphical models. Additionally, it uses deep neural networks for joint-speaker identification and their gain estimation, which makes these two steps easier than before while producing competitive results for these steps. The proposed method of this work outperforms past super-human results and even the results recently achieved using deep neural networks by Microsoft research. It achieved 5.3% absolute task performance improvement compared to the first super-human system and 2.5% absolute task performance improvement compared to its recent competitor. [ABSTRACT FROM AUTHOR]

Subjects :: *ARTIFICIAL neural networks
*SIGNAL processing
*SPEECH perception
*SPEECH processing systems

Details

Language :: English
ISSN :: 01676393
Volume :: 98
Database :: Academic Search Index
Journal :: Speech Communication
Publication Type :: Academic Journal
Accession number :: 128517417
Full Text :: https://doi.org/10.1016/j.specom.2018.01.007

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Monaural multi-talker speech recognition using factorial speech processing models.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Monaural multi-talker speech recognition using factorial speech processing models.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources