1. Long-term speech information based threshold for voice activity detection in massive microphone network
- Author
-
Xiaoqiang Zhu, Zhihua Lu, Tao Wang, Mengyao Zhu, and Xiukun Wu
- Subjects
Voice activity detection ,business.industry ,Computer science ,Microphone ,Applied Mathematics ,Speech recognition ,Initialization ,020206 networking & telecommunications ,02 engineering and technology ,Mixture model ,Term (time) ,Differential entropy ,Noise ,Computational Theory and Mathematics ,Computer Science::Sound ,Artificial Intelligence ,Home automation ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,Statistics, Probability and Uncertainty ,business - Abstract
Voice activity detection (VAD) is essential for multiple microphone arrays processing, in which massive potential devices, such as microphone devices for far-field voice-based interaction in smart home environments, will be activated when sound sources appear. Therefore, the VAD can save a lot of computing resources in massive microphone arrays processing for the sparsity in sound source activity. However, it may not be feasible to obtain an accurate VAD in harsh environments, such as far-field, time-varying noise field. In this paper, the long-term speech information (LTSI) and the log-energy are modeled for deriving a more accurate VAD. First, the LTSI can be obtained by measuring the differential entropy of long-term smoothed noisy signal spectrum. Then, the LTSI is used to get labeled data for the initialization of a Gaussian mixture model (GMM), which is used to fit the log-energy distribution of noise and (noisy) speech. Finally, combining the LTSI and the GMM parameters of noise and speech distribution, this paper derives an adaptive threshold, which represents a reasonable boundary between noise and speech. Experimental results show that our VAD method has a remarkable improvement for a massive microphone network.
- Published
- 2019
- Full Text
- View/download PDF