Back to Search
Start Over
Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition
- Source :
- EUSIPCO, 28th European Signal Processing Conference, 28th European Signal Processing Conference, Jan 2021, Amsterdam, Netherlands, EUSIPCO 2020-28th European Signal Processing Conference, EUSIPCO 2020-28th European Signal Processing Conference, Jan 2021, Amsterdam / Virtual, Netherlands, EUSIPCO 2020-28th European Signal Processing Conference, Jan 2021, Amsterdam / Virtual, Netherlands. ⟨10.23919/Eusipco47968.2020.9287541⟩
- Publication Year :
- 2019
- Publisher :
- arXiv, 2019.
-
Abstract
- We investigate the effect of speaker localization on the performance of speech recognition systems in a multispeaker, multichannel environment. Given the speaker location information, speech separation is performed in three stages. In the first stage, a simple delay-and-sum (DS) beamformer is used to enhance the signal impinging from the speaker location which is then used to estimate a time-frequency mask corresponding to the localized speaker using a neural network. This mask is used to compute the second order statistics and to derive an adaptive beamformer in the third stage. We generated a multichannel, multispeaker, reverberated, noisy dataset inspired from the well studied WSJ0-2mix and study the performance of the proposed pipeline in terms of the word error rate (WER). An average WER of $29.4$% was achieved using the ground truth localization information and $42.4$% using the localization information estimated via GCC-PHAT. The signal-to-interference ratio (SIR) between the speakers has a higher impact on the ASR performance, to the extent of reducing the WER by $59$% relative for a SIR increase of $15$ dB. By contrast, increasing the spatial distance to $50^\circ$ or more improves the WER by $23$% relative only<br />Comment: Submitted to ICASSP 2020
- Subjects :
- Multichannel speech separation
WSJ0-2mix reverberated
Signal processing
Noise measurement
Artificial neural network
Computer science
Speech recognition
Word error rate
020206 networking & telecommunications
02 engineering and technology
Speech processing
Signal-to-noise ratio
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
Audio and Speech Processing (eess.AS)
[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]
0202 electrical engineering, electronic engineering, information engineering
FOS: Electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Adaptive beamformer
Electrical Engineering and Systems Science - Audio and Speech Processing
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- EUSIPCO, 28th European Signal Processing Conference, 28th European Signal Processing Conference, Jan 2021, Amsterdam, Netherlands, EUSIPCO 2020-28th European Signal Processing Conference, EUSIPCO 2020-28th European Signal Processing Conference, Jan 2021, Amsterdam / Virtual, Netherlands, EUSIPCO 2020-28th European Signal Processing Conference, Jan 2021, Amsterdam / Virtual, Netherlands. ⟨10.23919/Eusipco47968.2020.9287541⟩
- Accession number :
- edsair.doi.dedup.....4bb4e8839d0cfbc2c42a417127427871
- Full Text :
- https://doi.org/10.48550/arxiv.1910.11114