Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.

Authors :: Williamson DS
Wang D
Source :: IEEE/ACM transactions on audio, speech, and language processing [IEEE/ACM Trans Audio Speech Lang Process] 2017 Jul; Vol. 25 (7), pp. 1492-1501. Date of Electronic Publication: 2017 Apr 20.
Publication Year :: 2017
Abstract: In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.

Full Text Access

Tools