Back to Search Start Over

MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer

Authors :
Yao, Dong
Zhu, Jieming
Xun, Jiahao
Zhang, Shengyu
Zhao, Zhou
Deng, Liqun
Zhang, Wenqiao
Dong, Zhenhua
Jiang, Xin
Publication Year :
2023

Abstract

Recent research in self-supervised contrastive learning of music representations has demonstrated remarkable results across diverse downstream tasks. However, a prevailing trend in existing methods involves representing equally-sized music clips in either waveform or spectrogram formats, often overlooking the intrinsic part-whole hierarchies within music. In our quest to comprehend the bottom-up structure of music, we introduce MART, a hierarchical music representation learning approach that facilitates feature interactions among cropped music clips while considering their part-whole hierarchies. Specifically, we propose a hierarchical part-whole transformer to capture the structural relationships between music clips in a part-whole hierarchy. Furthermore, a hierarchical contrastive learning objective is crafted to align part-whole music representations at adjacent levels, progressively establishing a multi-hierarchy representation space. The effectiveness of our music representation learning from part-whole hierarchies has been empirically validated across multiple downstream tasks, including music classification and cover song identification.<br />Comment: Short paper accepted by WWW 2024. This is revised and condensed based on the previous version titled "Music-PAW: Learning Music Representations via Hierarchical Part-whole Interaction and Contrast". For more experimental details and discussions, please refer to the original long paper at arXiv:2312.06197v1

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2312.06197
Document Type :
Working Paper