Back to Search
Start Over
TAMDepth: self-supervised monocular depth estimation with transformer and adapter modulation.
- Source :
-
Visual Computer . Oct2024, Vol. 40 Issue 10, p6797-6808. 12p. - Publication Year :
- 2024
-
Abstract
- Self-supervised monocular depth estimation presents a promising result, which utilizes image sequences instead of challenging-to-source ground truth for training. The framework of most current studies on self-supervised depth estimation is based on fully convolutional or transformer architectures, and there is little discussion on the hybrid architecture. In this paper, we proposed TAMDepth, a novel framework that can effectively capture the local and global features of image sequences by combining convolutional blocks and transformer blocks. TAMDepth adopts multi-scale feature fusion convolutional modules capture local details in shallow layers while transformer blocks build the global dependency in higher layers. Furthermore, to enhance the representation of architecture, we introduce an adapter modulation that injects the spatial prior to the transformer blocks through cross-attention, which improves the ability of modeling the scene. Experiments demonstrate that our model exhibits state-of-the-art performance on the KITTI dataset and also shows strong generalization performance on the Make3D dataset. Source code is available at https://github.com/deansaice/TAMDepth. [ABSTRACT FROM AUTHOR]
- Subjects :
- *TRANSFORMER models
*SOURCE code
*MONOCULARS
*GENERALIZATION
Subjects
Details
- Language :
- English
- ISSN :
- 01782789
- Volume :
- 40
- Issue :
- 10
- Database :
- Academic Search Index
- Journal :
- Visual Computer
- Publication Type :
- Academic Journal
- Accession number :
- 180005944
- Full Text :
- https://doi.org/10.1007/s00371-024-03332-3