Back to Search Start Over

Feature‐enhanced representation with transformers for multi‐view stereo.

Authors :
Xiang, Lintao
Yin, Hujun
Source :
IET Image Processing (Wiley-Blackwell); May2024, Vol. 18 Issue 6, p1530-1539, 10p
Publication Year :
2024

Abstract

Most existing multi‐view stereo (MVS) methods fail to consider global context information in the stage of feature extraction and cost aggregation. As transformers have shown remarkable performance on various vision tasks due to their ability to perceive global contextual information, this paper proposes a transformer‐based feature enhancement network (TF‐MVSNet) to facilitate feature representation learning by combining local features (both 2D and 3D) with long‐range contextual information. To reduce memory consumption of feature matching, the cross‐attention mechanism is leveraged to efficiently construct 3D cost volumes under the epipolar constraint. Additionally, a colour‐guided network is designed to refine depth maps at a coarse stage, hence reducing incorrect depth predictions at a fine stage. Extensive experiments were performed on the DTU dataset and Tanks and Temples (T&T) benchmark and results are reported. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
17519659
Volume :
18
Issue :
6
Database :
Complementary Index
Journal :
IET Image Processing (Wiley-Blackwell)
Publication Type :
Academic Journal
Accession number :
176926941
Full Text :
https://doi.org/10.1049/ipr2.13046