1. Speech enhancement using deep complex convolutional neural network (DCCNN) model.
- Author
-
Iqbal, Yasir, Zhang, Tao, Fahad, Muhammad, Rahman, Sadiq ur, Iqbal, Anjum, Geng, Yanzhang, and Zhao, Xin
- Abstract
In cases with highly non-stationary noise, single-channel speech enhancement is quite challenging, mainly when the noise includes interfering speech. In this situation, deep learning's success has contributed to speech enhancement to boost intelligibility and perceptual quality. Existing speech enhancement (SE) works in time–frequency domains only aim to improve the magnitude spectrum via neural network learnings; the latest research highlights the significance of phase in perceptual speech quality. Motivated by multi-task machines and deep learning this paper, proposes an effective and novel approach to the task of speech enhancement using an encoder-decoder architecture based on Deep Complex Convolutional Neural Networks. The proposed model takes input from the spectrograms of the noisy speech signals, consisting of real and imaginary components for complex spectral mapping, and it simultaneously enhances the magnitude and phase responses of speech. Considering unseen non-stationary noise categories, which interfere with speech, the proposed model enhances speech quality by approximately, 0.44 MOS points compared to state-of-the-art single-stage techniques. Moreover, it outperforms all reference techniques constantly and improves intelligibility under low-SNR settings. In contrast, against the baselines, we find an incredible enhancement of over 3 dB in SNR, and 0.2 in STOI. In addition, our method outperforms baseline SE techniques in low-SNR conditions in terms of STOI. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF