Back to Search
Start Over
Tackling Interpretability in Audio Classification Networks With Non-negative Matrix Factorization
- Source :
- IEEE-ACM Transactions on Audio, Speech, and Language Processing; 2024, Vol. 32 Issue: 1 p1392-1405, 14p
- Publication Year :
- 2024
-
Abstract
- This article tackles two major problem settings for interpretability of audio processing networks, post-hoc and by-design interpretation. For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. This is extended to present an inherently interpretable model with high performance. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, an interpreter is trained to generate a regularized intermediate embedding from hidden layers of a target network, learnt as time-activations of a pre-learnt NMF dictionary. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network's decision. We demonstrate our method's applicability on a variety of classification tasks, including multi-label data for real-world audio and music.
Details
- Language :
- English
- ISSN :
- 23299290
- Volume :
- 32
- Issue :
- 1
- Database :
- Supplemental Index
- Journal :
- IEEE-ACM Transactions on Audio, Speech, and Language Processing
- Publication Type :
- Periodical
- Accession number :
- ejs65551219
- Full Text :
- https://doi.org/10.1109/TASLP.2024.3358049