Back to Search
Start Over
HAAC: Hierarchical audio augmentation chain for ACCDOA described sound event localization and detection.
- Source :
-
Applied Acoustics . Aug2023, Vol. 211, pN.PAG-N.PAG. 1p. - Publication Year :
- 2023
-
Abstract
- • Propose one hierarchical audio augmentation chain (HAAC) for the ACCDOA-represented SELD. • Generate simulated audio mixtures for SELD and make all synthesis details disclosure. • Conduct SELD experiments with two baseline systems on two benchmark datasets. • Both HAAC and more synthesized simulation audio are helpful to improve the SELD performance. The goal of sound event localization and detection (SELD) is to detect the temporal occurrence activity of a known set of sound events and locate them in the spatial space. We argue that acquiring a large audio dataset is essential for one deep neural network-based SELD system learned as one supervised task. Nonetheless, gathering and annotating such datasets is a costly and time-intensive process. Hence, various data augmentation methods have attracted attention as a solution to increase sample diversity from the limited collections. In this paper, we propose to augment the limited audio samples for the deep neural network-based SELD system in two ways. One is the hierarchical audio augmentation chain (HAAC) proposed for the activity-coupled Cartesian direction of arrival output representation (ACCDOA) described SELD task. It consists of three waveform and spectrogram augmentation techniques, which are exquisitely assembled from the feature map augmentation to audio channel swapping, and finally sample mixup. Second, we propose to augment the training samples by generating more simulated audio samples and making the selected sound events list publicly available to the community. Experiments on the STARSS22 dataset showed that our HAAC audio augmentation chain greatly improved the SELD performance, which increased the sound event detection score by 24% and decreased the localization error by 12.1°. We demonstrate it's one simple yet effective approach, compared to other data augmentation methods. Moreover, with more simulated audio samples, generated by convolving selected sound events with SRIRs, used for training, the SELD performance was improved greatly. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 0003682X
- Volume :
- 211
- Database :
- Academic Search Index
- Journal :
- Applied Acoustics
- Publication Type :
- Academic Journal
- Accession number :
- 170745387
- Full Text :
- https://doi.org/10.1016/j.apacoust.2023.109541