Back to Search Start Over

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks.

Authors :
Lee, Michelle A.
Zhu, Yuke
Zachares, Peter
Tan, Matthew
Srinivasan, Krishnan
Savarese, Silvio
Fei-Fei, Li
Garg, Animesh
Bohg, Jeannette
Source :
IEEE Transactions on Robotics. Jun2020, Vol. 36 Issue 3, p582-596. 15p.
Publication Year :
2020

Abstract

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is nontrivial to manually design a robot controller that combines these modalities, which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to train directly on real robots due to sample complexity. In this article, we use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. Evaluating our method on a peg insertion task, we show that it generalizes over varying geometries, configurations, and clearances, while being robust to external perturbations. We also systematically study different self-supervised learning objectives and representation learning architectures. Results are presented in simulation and on a physical robot. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15523098
Volume :
36
Issue :
3
Database :
Academic Search Index
Journal :
IEEE Transactions on Robotics
Publication Type :
Academic Journal
Accession number :
143721336
Full Text :
https://doi.org/10.1109/TRO.2019.2959445