Back to Search
Start Over
Long-Term Multi-band Frequency-Domain Mean-Crossing Rate (FDMCR): A Novel Feature Extraction Algorithm for Speech/Music Discrimination.
- Source :
- Circuits, Systems & Signal Processing; Nov2023, Vol. 42 Issue 11, p6929-6950, 22p
- Publication Year :
- 2023
-
Abstract
- Multimedia data have increased dramatically today, making the distinction between desirable information and other types of information extremely important. Speech/music discrimination is a field of audio analytics that aims to detect and classify speech and music segments in an audio file. This paper proposes a novel feature extraction method called Long-Term Multi-band Frequency-Domain Mean-Crossing Rate (FDMCR). The proposed feature computes the average frequency-domain mean-crossing rate along the frequency axis for each of the perceptual Mel-scaled frequency bands of the signal power spectrum. In this paper, the class-separation capability of this feature is first measured by well-known divergence criteria such as Maximum Fisher Discriminant Ratio (MFDR), Bhattacharyya divergence, and Jeffreys/Symmetric Kullback–Leibler (SKL) divergence. The proposed feature is then applied to the speech/music discrimination (SMD) process on two well-known speech-music datasets—GTZAN and S &S (Scheirer and Slaney). The results obtained on the two datasets using conventional classifiers, including k-NN, GMM, and SVM, as well as deep learning-based classification methods, including CNN, LSTM, and BiLSTM, show that the proposed feature outperforms other features in speech/music discrimination. [ABSTRACT FROM AUTHOR]
- Subjects :
- DEEP learning
FEATURE extraction
SPEECH
ALGORITHMS
POWER spectra
Subjects
Details
- Language :
- English
- ISSN :
- 0278081X
- Volume :
- 42
- Issue :
- 11
- Database :
- Complementary Index
- Journal :
- Circuits, Systems & Signal Processing
- Publication Type :
- Academic Journal
- Accession number :
- 172805522
- Full Text :
- https://doi.org/10.1007/s00034-023-02440-0