Back to Search Start Over

A Dual-Channel End-to-End Speech Enhancement Method Using Complex Operations in the Time Domain

Authors :
Jian Pang
Hongcheng Li
Tao Jiang
Hui Wang
Xiangning Liao
Le Luo
Hongqing Liu
Source :
Applied Sciences, Vol 13, Iss 13, p 7698 (2023)
Publication Year :
2023
Publisher :
MDPI AG, 2023.

Abstract

This study investigates the utilization of complex operations to perform multichannel speech enhancement in the time domain using a neural network. Previous studies have demonstrated the advantages of incorporating complex operations when designing neural networks; however, they have solely focused on frequency-domain enhancement techniques. In contrast, our research study presents an end-to-end approach to perform speech enhancement in the time domain. We used the Hilbert transform to intelligently generate complex time-domain waveforms as inputs to the network. This allowed us to create an end-to-end approach that explores spatial information. To handle the complexity of the inputs, we developed a complex neural adaptive beamformer (CNAB). We utilized complex shared long short-term memory (LSTM), split-LSTM, and complex convolutions to generate the beamforming output. Following this, we developed a complex full convolutional network (CFCN) to enhance the beamforming output. We leveraged complex dilated convolutions to model the long-term temporal dependencies of speech. By cascading the CNAB and CFCN, we created the final end-to-end time-domain enhancement network, named CNABCFCN. We trained and tested CNABCFCN using the deep noise suppression (DNS) challenge dataset. Our results demonstrate the advantages of using complex operations over the baseline model. Furthermore, the proposed CNABCFCN performed better in terms of both objective and subjective measures compared with other networks.

Details

Language :
English
ISSN :
20763417
Volume :
13
Issue :
13
Database :
Directory of Open Access Journals
Journal :
Applied Sciences
Publication Type :
Academic Journal
Accession number :
edsdoj.7c4adcdeeb240db90fe5824d9452229
Document Type :
article
Full Text :
https://doi.org/10.3390/app13137698