Back to Search Start Over

AlignNet: A Unifying Approach to Audio-Visual Alignment

Authors :
Wang, Jianren
Fang, Zhaoyuan
Zhao, Hang
Publication Year :
2020

Abstract

We present AlignNet, a model that synchronizes videos with reference audios under non-uniform and irregular misalignments. AlignNet learns the end-to-end dense correspondence between each frame of a video and an audio. Our method is designed according to simple and well-established principles: attention, pyramidal processing, warping, and affinity function. Together with the model, we release a dancing dataset Dance50 for training and evaluation. Qualitative, quantitative and subjective evaluation results on dance-music alignment and speech-lip alignment demonstrate that our method far outperforms the state-of-the-art methods. Project video and code are available at https://jianrenw.github.io/AlignNet.<br />Comment: WACV2020. Project video and code are available at https://jianrenw.github.io/AlignNet

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2002.05070
Document Type :
Working Paper