Back to Search
Start Over
A Multi-scale Subconvolutional U-Net with Time-Frequency Attention Mechanism for Single Channel Speech Enhancement.
- Source :
-
Circuits, Systems & Signal Processing . Sep2024, Vol. 43 Issue 9, p5682-5710. 29p. - Publication Year :
- 2024
-
Abstract
- Recent advancements in deep learning-based speech enhancement models have extensively used attention mechanisms to achieve state-of-the-art methods by demonstrating their effectiveness. This paper proposes a novel time-frequency attention (TFA) for speech enhancement that includes a multi-scale subconvolutional U-Net (MSCUNet). The TFA extracts valuable channels, frequencies, and time information from the feature sets and improves speech intelligibility and quality. Channel attention is first performed in TFA to learn weights representing the channels' importance in the input feature set, followed by frequency and time attention mechanisms that are performed simultaneously, using learned weights, to capture both frequency and time attention. Additionally, a U-Net based multi-scale subconvolutional encoder-decoder model used different kernel sizes to extract local and contextual features from the noisy speech. The MSCUNet uses a feature calibration block acting as a gating network to control the information flow among the layers. This enables the scaled features to be weighted in order to retain speech and suppress the noise. Additionally, central layers are employed to exploit the interdependency among the past, current, and future frames to improve predictions. The experimental results show that the proposed TFAMSCUNet mode outperforms several state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 0278081X
- Volume :
- 43
- Issue :
- 9
- Database :
- Academic Search Index
- Journal :
- Circuits, Systems & Signal Processing
- Publication Type :
- Academic Journal
- Accession number :
- 179041755
- Full Text :
- https://doi.org/10.1007/s00034-024-02721-2