Back to Search
Start Over
Classification of general audio data for content-based retrieval
- Source :
- Pattern Recognition Letters. 22:533-544
- Publication Year :
- 2001
- Publisher :
- Elsevier BV, 2001.
-
Abstract
- In this paper, we address the problem of classification of continuous general audio data (GAD) for content-based retrieval, and describe a scheme that is able to classify audio segments into seven categories consisting of silence, single speaker speech, music, environmental noise, multiple speakers' speech, simultaneous speech and music, and speech and noise. We studied a total of 143 classification features for their discrimination capability. Our study shows that cepstral-based features such as the Mel-frequency cepstral coefficients (MFCC) and linear prediction coefficients (LPC) provide better classification accuracy compared to temporal and spectral features. To minimize the classification errors near the boundaries of audio segments of different type in general audio data, a segmentation–pooling scheme is also proposed in this work. This scheme yields classification results that are consistent with human perception. Our classification system provides over 90% accuracy at a processing speed dozens of times faster than the playing rate.
- Subjects :
- Audio mining
Scheme (programming language)
Computer science
business.industry
Speech recognition
media_common.quotation_subject
Speech coding
Linear prediction
Pattern recognition
Silence
Noise
ComputingMethodologies_PATTERNRECOGNITION
Artificial Intelligence
Perception
Signal Processing
Computer Vision and Pattern Recognition
Artificial intelligence
Mel-frequency cepstrum
business
Environmental noise
computer
Software
media_common
computer.programming_language
Subjects
Details
- ISSN :
- 01678655
- Volume :
- 22
- Database :
- OpenAIRE
- Journal :
- Pattern Recognition Letters
- Accession number :
- edsair.doi...........3ddecc3dc870e355994dccc25a2a3659
- Full Text :
- https://doi.org/10.1016/s0167-8655(00)00119-7