Back to Search Start Over

Long-term speech information based threshold for voice activity detection in massive microphone network

Authors :
Xiaoqiang Zhu
Zhihua Lu
Tao Wang
Mengyao Zhu
Xiukun Wu
Source :
Digital Signal Processing. 94:156-164
Publication Year :
2019
Publisher :
Elsevier BV, 2019.

Abstract

Voice activity detection (VAD) is essential for multiple microphone arrays processing, in which massive potential devices, such as microphone devices for far-field voice-based interaction in smart home environments, will be activated when sound sources appear. Therefore, the VAD can save a lot of computing resources in massive microphone arrays processing for the sparsity in sound source activity. However, it may not be feasible to obtain an accurate VAD in harsh environments, such as far-field, time-varying noise field. In this paper, the long-term speech information (LTSI) and the log-energy are modeled for deriving a more accurate VAD. First, the LTSI can be obtained by measuring the differential entropy of long-term smoothed noisy signal spectrum. Then, the LTSI is used to get labeled data for the initialization of a Gaussian mixture model (GMM), which is used to fit the log-energy distribution of noise and (noisy) speech. Finally, combining the LTSI and the GMM parameters of noise and speech distribution, this paper derives an adaptive threshold, which represents a reasonable boundary between noise and speech. Experimental results show that our VAD method has a remarkable improvement for a massive microphone network.

Details

ISSN :
10512004
Volume :
94
Database :
OpenAIRE
Journal :
Digital Signal Processing
Accession number :
edsair.doi...........2b93dbf0ba44031323cc4a782664b0dd
Full Text :
https://doi.org/10.1016/j.dsp.2019.05.012