Start Over

PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation

Authors :: Wang, Liuyi
Liu, Chengju
He, Zongtao
Li, Shu
Yan, Qingqing
Chen, Huiyi
Chen, Qijun
Source :: [J]. Engineering Applications of Artificial Intelligence, 2024, 128: 107487
Publication Year :: 2023
Abstract: Vision-and-language navigation (VLN) is a crucial but challenging cross-modal navigation task. One powerful technique to enhance the generalization performance in VLN is the use of an independent speaker model to provide pseudo instructions for data augmentation. However, current speaker models based on Long-Short Term Memory (LSTM) lack the ability to attend to features relevant at different locations and time steps. To address this, we propose a novel progress-aware spatio-temporal transformer speaker (PASTS) model that uses the transformer as the core of the network. PASTS uses a spatio-temporal encoder to fuse panoramic representations and encode intermediate connections through steps. Besides, to avoid the misalignment problem that could result in incorrect supervision, a speaker progress monitor (SPM) is proposed to enable the model to estimate the progress of instruction generation and facilitate more fine-grained caption results. Additionally, a multifeature dropout (MFD) strategy is introduced to alleviate overfitting. The proposed PASTS is flexible to be combined with existing VLN models. The experimental results demonstrate that PASTS outperforms all existing speaker models and successfully improves the performance of previous VLN models, achieving state-of-the-art performance on the standard Room-to-Room (R2R) dataset.<br />Comment: 15 pages, 11 figures

Subjects :: Computer Science - Computer Vision and Pattern Recognition

Details

Database :: arXiv
Journal :: [J]. Engineering Applications of Artificial Intelligence, 2024, 128: 107487
Publication Type :: Report
Accession number :: edsarx.2305.11918
Document Type :: Working Paper
Full Text :: https://doi.org/10.1016/j.engappai.2023.107487

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources