Back to Search Start Over

Improving Sound Event Detection Metrics: Insights from DCASE 2020

Improving Sound Event Detection Metrics: Insights from DCASE 2020

Authors :
Francesco Tuveri
Sacha Krstulovic
Cagdas Bilen
Romain Serizel
Juan Azcarreta
Giacomo Ferroni
Nicolas Turpault
Audio Analytic
Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH)
Inria Nancy - Grand Est
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
UL/INRIA’s work for this article was partly supported by the French National Research Agency (project LEAUDS 'Learning to under-stand audioscenes' ANR-18-CE23-0020) and by the French region Grand-Est.
Grid'5000
ANR-18-CE23-0020,LEAUDS,Apprentissage statistique pour la compréhension de scènes audio(2018)
Source :
ICASSP, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, Jun 2021, Toronto/Virtual, Canada. ⟨10.1109/ICASSP39728.2021.9414711⟩
Publication Year :
2020

Abstract

International audience; The ranking of sound event detection (SED) systems may be biased by assumptions inherent to evaluation criteria and to the choice of an operating point. This paper compares conventional event-based and segment-based criteria against the Polyphonic Sound Detection Score (PSDS)'s intersection-based criterion, over a selection of systems from DCASE 2020 Challenge Task 4. It shows that, by relying on collars , the conventional event-based criterion introduces different strictness levels depending on the length of the sound events, and that the segment-based criterion may lack precision and be application dependent. Alternatively, PSDS's intersection-based criterion overcomes the dependency of the evaluation on sound event duration and provides robustness to labelling subjectivity, by allowing valid detections of interrupted events. Furthermore, PSDS enhances the comparison of SED systems by measuring sound event modelling performance independently from the systems' operating points.

Details

Language :
English
Database :
OpenAIRE
Journal :
ICASSP, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, Jun 2021, Toronto/Virtual, Canada. ⟨10.1109/ICASSP39728.2021.9414711⟩
Accession number :
edsair.doi.dedup.....985220cd46b912f9d1043fc67c45e474
Full Text :
https://doi.org/10.1109/ICASSP39728.2021.9414711⟩