Back to Search
Start Over
How Sampling Rate Affects Cross-Domain Transfer Learning for Video Description
- Source :
- ICASSP
- Publication Year :
- 2018
- Publisher :
- IEEE, 2018.
-
Abstract
- Translating video to language is very challenging due to diversified video contents originated from multiple activities and complicated integration of spatio-temporal information. There are two urgent issues associated with the video-to-language translation problem. First, how to transfer knowledge learned from a more general dataset to a specific application domain dataset? Second, how to generate stable video captioning (or description) results under different sampling rates? In this paper, we propose a novel temporal embedding method to better retain temporal representation under different video sampling rates. We present a transfer learning method that combines a stacked LSTM encoder-decoder structure and a temporal embedding learning with soft-attention (TELSA) mechanism. We evaluate the proposed approach on two public datasets, including MSR-VTT and MSVD. The promising experimental results confirm the effectiveness of the proposed approach.
Details
- Database :
- OpenAIRE
- Journal :
- 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Accession number :
- edsair.doi...........a715de4b57f2514ee274c0974fadbbf1
- Full Text :
- https://doi.org/10.1109/icassp.2018.8461899