1. On the Compensation Between Magnitude and Phase in Speech Separation.
- Author
-
Wang, Zhong-Qiu, Wichern, Gordon, and Le Roux, Jonathan
- Subjects
AUTOMATIC speech recognition ,INTELLIGIBILITY of speech ,PHASE separation ,SPEECH perception ,DEEP learning ,SPEECH enhancement - Abstract
Deep neural network (DNN) based end-to-end optimization in the complex time-frequency (T-F) domain or time domain has shown considerable potential in monaural speech separation. Many recent studies optimize loss functions defined solely in the time or complex domain, without including a loss on magnitude. Although such loss functions typically produce better scores if the evaluation metrics are objective time-domain metrics, they however produce worse scores on speech quality and intelligibility metrics and usually lead to worse speech recognition performance, compared with including a loss on magnitude. While this phenomenon has been experimentally observed by many studies, it is often not accurately explained and there lacks a thorough understanding on its fundamental cause. This letter provides a novel view from the perspective of the implicit compensation between estimated magnitude and phase. Analytical results based on monaural speech separation and robust automatic speech recognition (ASR) tasks in noisy-reverberant conditions support the validity of our view. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF