1. Improving Sound Event Detection Metrics: Insights from DCASE 2020
- Author
-
Francesco Tuveri, Sacha Krstulovic, Cagdas Bilen, Romain Serizel, Juan Azcarreta, Giacomo Ferroni, Nicolas Turpault, Audio Analytic, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), UL/INRIA’s work for this article was partly supported by the French National Research Agency (project LEAUDS 'Learning to under-stand audioscenes' ANR-18-CE23-0020) and by the French region Grand-Est., Grid'5000, and ANR-18-CE23-0020,LEAUDS,Apprentissage statistique pour la compréhension de scènes audio(2018)
- Subjects
FOS: Computer and information sciences ,Sound (cs.SD) ,Dependency (UML) ,Computer science ,02 engineering and technology ,computer.software_genre ,Computer Science - Sound ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Audio and Speech Processing (eess.AS) ,Robustness (computer science) ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Event (probability theory) ,Operating point ,Signal processing ,Intersection (set theory) ,Sound detection ,segment vs event criteria ,sound event detection ,evaluation metrics ,Ranking ,[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] ,020201 artificial intelligence & image processing ,Data mining ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,computer ,polyphonic sound detection score ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
International audience; The ranking of sound event detection (SED) systems may be biased by assumptions inherent to evaluation criteria and to the choice of an operating point. This paper compares conventional event-based and segment-based criteria against the Polyphonic Sound Detection Score (PSDS)'s intersection-based criterion, over a selection of systems from DCASE 2020 Challenge Task 4. It shows that, by relying on collars , the conventional event-based criterion introduces different strictness levels depending on the length of the sound events, and that the segment-based criterion may lack precision and be application dependent. Alternatively, PSDS's intersection-based criterion overcomes the dependency of the evaluation on sound event duration and provides robustness to labelling subjectivity, by allowing valid detections of interrupted events. Furthermore, PSDS enhances the comparison of SED systems by measuring sound event modelling performance independently from the systems' operating points.
- Published
- 2020
- Full Text
- View/download PDF