Back to Search Start Over

How Sampling Rate Affects Cross-Domain Transfer Learning for Video Description

Authors :
Shou-De Lin
Yu-Sheng Chou
Pai-Heng Hsiao
Hong-Yuan Mark Liao
Source :
ICASSP
Publication Year :
2018
Publisher :
IEEE, 2018.

Abstract

Translating video to language is very challenging due to diversified video contents originated from multiple activities and complicated integration of spatio-temporal information. There are two urgent issues associated with the video-to-language translation problem. First, how to transfer knowledge learned from a more general dataset to a specific application domain dataset? Second, how to generate stable video captioning (or description) results under different sampling rates? In this paper, we propose a novel temporal embedding method to better retain temporal representation under different video sampling rates. We present a transfer learning method that combines a stacked LSTM encoder-decoder structure and a temporal embedding learning with soft-attention (TELSA) mechanism. We evaluate the proposed approach on two public datasets, including MSR-VTT and MSVD. The promising experimental results confirm the effectiveness of the proposed approach.

Details

Database :
OpenAIRE
Journal :
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Accession number :
edsair.doi...........a715de4b57f2514ee274c0974fadbbf1
Full Text :
https://doi.org/10.1109/icassp.2018.8461899