Back to Search Start Over

Polyphonic Sound Event Detection Using Mel-Pseudo Constant Q-Transform and Deep Neural Network.

Authors :
Spoorthy, V
Koolagudi, Shashidhar G.
Source :
IETE Journal of Research. May2024, Vol. 70 Issue 5, p5031-5043. 13p.
Publication Year :
2024

Abstract

The task of identification of sound events in a particular surrounding is known as Sound Event Detection (SED) or Acoustic Event Detection (AED). The occurrence of sound events is unstructured and also displays wide variations in both temporal structure and frequency content. Sound events may be non-overlapped (monophonic) or overlapped (polyphonic) in nature. In real-time scenarios, polyphonic SED is most commonly seen as compared to monophonic SED. In this paper, a Mel-Pseudo Constant Q-Transform (MP-CQT) technique is introduced to perform polyphonic SED to effectively learn both monophonic and polyphonic sound events. A pseudo CQT technique is adapted to extract features from the audio files and their Mel spectrograms. The Mel-scale is believed to broadly simulate human perception system. The classifier used is a Convolutional Recurrent Neural Network (CRNN). Comparison of the performance of the proposed MP-CQT technique along with CRNN is presented and a considerable performance improvement is observed. The proposed method achieved an average error rate of 0.684 and average F1 score of 52.3%. The proposed approach is also analyzed for the robustness by adding an additional noise at different Signal to Noise Ratios (SNRs) to the audio files. The proposed method for SED task has displayed improved performance as compared to state-of-the-art SED systems. The introduction of new feature extraction technique has shown promising improvement in the performance of the polyphonic SED system. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03772063
Volume :
70
Issue :
5
Database :
Academic Search Index
Journal :
IETE Journal of Research
Publication Type :
Academic Journal
Accession number :
179483061
Full Text :
https://doi.org/10.1080/03772063.2023.2253768