1. Topological Planning with Transformers for Vision-and-Language Navigation
- Author
-
Junshen K. Chen, Jo Chuang, Kevin Chen, Marynel Vázquez, and Silvio Savarese
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Backtracking ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Robotics ,Plan (drawing) ,Modular design ,Topology ,Computer Science - Robotics ,Artificial Intelligence (cs.AI) ,Control theory ,Topological map ,Artificial intelligence ,business ,Robotics (cs.RO) ,Computation and Language (cs.CL) ,Natural language ,Transformer (machine learning model) - Abstract
Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. forward, rotate) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.
- Published
- 2021
- Full Text
- View/download PDF