Back to Search Start Over

Clean speech/speech with background music classification using HNGD spectrum.

Authors :
Khonglah, Banriskhem
Prasanna, S.
Source :
International Journal of Speech Technology; Dec2017, Vol. 20 Issue 4, p1023-1036, 14p
Publication Year :
2017

Abstract

This work explores the characteristics of speech in terms of the spectral characteristics of vocal tract system for deriving features effective for clean speech and speech with background music classification. A representation of the spectral characteristics of the vocal tract system in the form of Hilbert envelope of the numerator of group delay (HNGD) spectrum is explored for the task. This representation complements the existing methods of computing the spectral characteristics in terms of the temporal resolution. This spectrum has an additive and high resolution property which gives a better representation of the formants especially the higher ones. A feature is extracted from the HNGD spectrum which is known as the spectral contrast across the sub-bands and this feature essentially represents the relative spectral characteristics of the vocal tract system. The vocal tract system is also represented approximately in terms of the mel frequency cepstral coefficients (MFCCs) which represent the average spectral characteristics. The MFCCs and the sum of the spectral contrast on HNGD can be used as features to represent the average and relative spectral characteristics of the vocal tract system, respectively. These features complement each other and can be combined in a multidimensional framework to provide good discrimination between clean speech and speech with background music segments. The spectral contrast on HNGD spectrum is compared to the spectral contrast on discrete fourier transform (DFT) spectrum, which also represents the relative spectral characteristics of the vocal tract system. It is observed that better performances are achieved on the HNGD spectrum than the DFT spectrum. The features are classified using classifiers like Gaussian mixture models and support vector machines. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13812416
Volume :
20
Issue :
4
Database :
Complementary Index
Journal :
International Journal of Speech Technology
Publication Type :
Academic Journal
Accession number :
125968236
Full Text :
https://doi.org/10.1007/s10772-017-9464-7