Back to Search
Start Over
Long-term speech information based threshold for voice activity detection in massive microphone network
- Source :
- Digital Signal Processing. 94:156-164
- Publication Year :
- 2019
- Publisher :
- Elsevier BV, 2019.
-
Abstract
- Voice activity detection (VAD) is essential for multiple microphone arrays processing, in which massive potential devices, such as microphone devices for far-field voice-based interaction in smart home environments, will be activated when sound sources appear. Therefore, the VAD can save a lot of computing resources in massive microphone arrays processing for the sparsity in sound source activity. However, it may not be feasible to obtain an accurate VAD in harsh environments, such as far-field, time-varying noise field. In this paper, the long-term speech information (LTSI) and the log-energy are modeled for deriving a more accurate VAD. First, the LTSI can be obtained by measuring the differential entropy of long-term smoothed noisy signal spectrum. Then, the LTSI is used to get labeled data for the initialization of a Gaussian mixture model (GMM), which is used to fit the log-energy distribution of noise and (noisy) speech. Finally, combining the LTSI and the GMM parameters of noise and speech distribution, this paper derives an adaptive threshold, which represents a reasonable boundary between noise and speech. Experimental results show that our VAD method has a remarkable improvement for a massive microphone network.
- Subjects :
- Voice activity detection
business.industry
Computer science
Microphone
Applied Mathematics
Speech recognition
Initialization
020206 networking & telecommunications
02 engineering and technology
Mixture model
Term (time)
Differential entropy
Noise
Computational Theory and Mathematics
Computer Science::Sound
Artificial Intelligence
Home automation
Signal Processing
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Computer Vision and Pattern Recognition
Electrical and Electronic Engineering
Statistics, Probability and Uncertainty
business
Subjects
Details
- ISSN :
- 10512004
- Volume :
- 94
- Database :
- OpenAIRE
- Journal :
- Digital Signal Processing
- Accession number :
- edsair.doi...........2b93dbf0ba44031323cc4a782664b0dd
- Full Text :
- https://doi.org/10.1016/j.dsp.2019.05.012