Back to Search Start Over

Unsupervised framework for depth estimation and camera motion prediction from video.

Authors :
Yang, Delong
Zhong, Xunyu
Gu, Dongbing
Peng, Xiafu
Hu, Huosheng
Source :
Neurocomputing. Apr2020, Vol. 385, p169-185. 17p.
Publication Year :
2020

Abstract

• Unsupervised framework for depth estimation and camera motion prediction. • Depth CNN and pose CNN are trained jointly and can be used respectively. • The supervision signal with a left-right consistency is constructed by spatial and temporal geometry constraints. • Results outperform previous unsupervised methods and some supervised methods. • A model which is trained on the Euroc dataset is used to test the algorithm's generalization capability. Depth estimation from monocular video plays a crucial role in scene perception. The significant drawback of supervised learning models is the need for vast amounts of manually labeled data (ground truth) for training. To overcome this limitation, unsupervised learning strategies without the requirement for ground truth have achieved extensive attention from researchers in the past few years. This paper presents a novel unsupervised framework for estimating single-view depth and predicting camera motion jointly. Stereo image sequences are used to train the model while monocular images are required for inference. The presented framework is composed of two CNNs (depth CNN and pose CNN) which are trained concurrently and tested independently. The objective function is constructed on the basis of the epipolar geometry constraints between stereo image sequences. To improve the accuracy of the model, a left-right consistency loss is added to the objective function. The use of stereo image sequences enables us to utilize both spatial information between stereo images and temporal photometric warp error from image sequences. Experimental results on the KITTI and Cityscapes datasets show that our model not only outperforms prior unsupervised approaches but also achieving better results comparable with several supervised methods. Moreover, we also train our model on the Euroc dataset which is captured in an indoor environment. Experiments in indoor and outdoor scenes are conducted to test the generalization capability of the model. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09252312
Volume :
385
Database :
Academic Search Index
Journal :
Neurocomputing
Publication Type :
Academic Journal
Accession number :
141940762
Full Text :
https://doi.org/10.1016/j.neucom.2019.12.049