How Sampling Rate Affects Cross-Domain Transfer Learning for Video Description

Authors :: Shou-De Lin
Yu-Sheng Chou
Pai-Heng Hsiao
Hong-Yuan Mark Liao
Source :: ICASSP
Publication Year :: 2018
Publisher :: IEEE, 2018.
Abstract: Translating video to language is very challenging due to diversified video contents originated from multiple activities and complicated integration of spatio-temporal information. There are two urgent issues associated with the video-to-language translation problem. First, how to transfer knowledge learned from a more general dataset to a specific application domain dataset? Second, how to generate stable video captioning (or description) results under different sampling rates? In this paper, we propose a novel temporal embedding method to better retain temporal representation under different video sampling rates. We present a transfer learning method that combines a stacked LSTM encoder-decoder structure and a temporal embedding learning with soft-attention (TELSA) mechanism. We evaluate the proposed approach on two public datasets, including MSR-VTT and MSVD. The promising experimental results confirm the effectiveness of the proposed approach.

Subjects :: Closed captioning
Computer science
business.industry
Machine learning
computer.software_genre
Domain (software engineering)
Sampling (signal processing)
Embedding
Artificial intelligence
Transfer of learning
business
Representation (mathematics)
computer
Decoding methods

Database :: OpenAIRE
Journal :: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Accession number :: edsair.doi...........a715de4b57f2514ee274c0974fadbbf1
Full Text :: https://doi.org/10.1109/icassp.2018.8461899