Back to Search
Start Over
Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency Domain Features and a Pre-trained Acoustic Model
- Publication Year :
- 2021
- Publisher :
- arXiv, 2021.
-
Abstract
- We propose a multi-channel speech enhancement approach with a novel two-stage feature fusion method and a pre-trained acoustic model in a multi-task learning paradigm. In the first fusion stage, the time-domain and frequency-domain features are extracted separately. In the time domain, the multi-channel convolution sum (MCS) and the inter-channel convolution differences (ICDs) features are computed and then integrated with the first 2-D convolutional layer, while in the frequency domain, the log-power spectra (LPS) features from both original channels and super-directive beamforming outputs are combined with a second 2-D convolutional layer. To fully integrate the rich information of multi-channel speech, i.e. time-frequency domain features and the array geometry, we apply a third 2-D convolutional layer in the second fusion stage to obtain the final convolutional features. Furthermore, we propose to use a fixed clean acoustic model trained with the end-to-end lattice-free maximum mutual information criterion to enforce the enhanced output to have the same distribution as the clean waveform to alleviate the over-estimation problem of the enhancement task and constrain distortion. On the Task1 development dataset of ConferencingSpeech 2021 challenge, a PESQ improvement of 0.24 and 0.19 is attained compared to the official baseline and a recently proposed multi-channel separation method.<br />Comment: 7 pages, 3 figures, accepted to APSIPA 2021, revised
- Subjects :
- Signal Processing (eess.SP)
FOS: Computer and information sciences
Sound (cs.SD)
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering, electronic engineering, information engineering
Electrical Engineering and Systems Science - Signal Processing
Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....df9c7cfd8f358f15e89eec07b4e6fecb
- Full Text :
- https://doi.org/10.48550/arxiv.2107.11222