101. A Simple Adaptive Atomic Decomposition Voice Activity Detector Implemented By Matching Pursuit
- Author
-
Thomas Bryan, Veton Kepuska, and Ivica Kostanic
- Subjects
Voice Activity Detection ,Computer Science::Sound ,Gammatone ,Gabor ,Atomic Decomposition ,Matching Pursuit - Abstract
A simple adaptive voice activity detector (VAD) is implemented using Gabor and gammatone atomic decomposition of speech for high Gaussian noise environments. Matching pursuit is used for atomic decomposition, and is shown to achieve optimal speech detection capability at high data compression rates for low signal to noise ratios. The most active dictionary elements found by matching pursuit are used for the signal reconstruction so that the algorithm adapts to the individual speakers dominant time-frequency characteristics. Speech has a high peak to average ratio enabling matching pursuit greedy heuristic of highest inner products to isolate high energy speech components in high noise environments. Gabor and gammatone atoms are both investigated with identical logarithmically spaced center frequencies, and similar bandwidths. The algorithm performs equally well for both Gabor and gammatone atoms with no significant statistical differences. The algorithm achieves 70% accuracy at a 0 dB SNR, 90% accuracy at a 5 dB SNR and 98% accuracy at a 20dB SNR using 30d B SNR as a reference for voice activity., {"references":["Gabor, D., Theory of communication, J. Inst. Elect. Eng., 93, pp. 429–\n457. 1946","Lobo, A., Loizou, P., Voiced/unvoiced speech discrimination in noise\nusing Gabor atomic decomposition. ICASSP (1) 2003: 820-823","Smith, E., Lewicki, M., Efficient auditory coding. Nature,\n439(7079):978–82, 2006.","R. Patterson I. Nimmo-Smith. An Efficient Auditory Filterbank Based\non the Gammatone Function. Institute of Acoustics on Auditory\nModelling 1987","Slaney, M., (1998) \"Auditory Toolbox Version 2\", Technical Report\n#1998-010, Interval Research Corporation, 1998.","Atlas, L. Decomposition of speech and sound into Modulations and\nCarriers. http://msrvideo.vo.msecnd.net/rmcvideos/173320/dl/\n173320.pdf, Microsoft Research & University of Washington. 2012","Mallat, S., Zhang, Z., Matching Pursuits with Time-Frequency\nDictionaries. IEEE transactions on signal processing, Vol 41. No 12,\n1993","Kressner, A., Anderson, D., Rozell, C. Causal Binary Mask Estimation\nfor Speech Enhancements using Sparsity Constraints. Proceedings on\nMeetings on Acoustics Vol. 9, 055037 2013","Guo, D., Verdu', S., Mutual Information and Minimum Mean-Square\nError in Gaussian Channels. IEEE transactions on information theory,\nVol. 51, No. 4, 2005\n[10] Eargle, J., Handbook of Recording Engineering. 4th Addition. Springer\nScience and Business Media. ISBN 1-4020-7230-9 (HC), 2003."]}
- Published
- 2015
- Full Text
- View/download PDF