1. Stochastic Latent Residual Video Prediction
- Author
-
Franceschi, Jean-Yves, Delasalles, Edouard, Chen, Mickaël, Lamprier, Sylvain, Gallinari, Patrick, Machine Learning and Information Access (MLIA), LIP6, Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Criteo AI Lab, Criteo [Paris], This work was granted access to the HPC resources of IDRIS under the allocation 2020-AD011011360 made by GENCI (Grand Equipement National de Calcul Intensif)., International Machine Learning Society, Hal Daumé III, Aarti Singh, Neil Lawrence, Mark Reid, and ANR-15-CE23-0027,LOCUST,Apprentissage de représentations pour modéliser la dynamique des traces d'interaction complexes(2015)
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] ,Machine Learning (stat.ML) ,Variational Autoencoders ,Machine Learning (cs.LG) ,Stochastic Video Prediction ,State-Space Models ,Machine Learning ,Deep Learning ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Statistics - Machine Learning ,ICML ,Generative Models - Abstract
International audience; Designing video prediction models that account for the inherent uncertainty of the future is challenging. Most works in the literature are based on stochastic image-autoregressive recurrent networks, which raises several performance and applicability issues. An alternative is to use fully latent temporal models which untie frame synthesis and temporal dynamics. However, no such model for stochastic video prediction has been proposed in the literature yet, due to design and training difficulties. In this paper, we overcome these difficulties by introducing a novel stochastic temporal model whose dynamics are governed in a latent space by a residual update rule. This first-order scheme is motivated by discretization schemes of differential equations. It naturally models video dynamics as it allows our simpler, more interpretable, latent model to outperform prior state-of-the-art methods on challenging datasets.
- Published
- 2020