Back to Search
Start Over
Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks.
- Source :
-
IEEE Transactions on Robotics . Jun2020, Vol. 36 Issue 3, p582-596. 15p. - Publication Year :
- 2020
-
Abstract
- Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is nontrivial to manually design a robot controller that combines these modalities, which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to train directly on real robots due to sample complexity. In this article, we use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. Evaluating our method on a peg insertion task, we show that it generalizes over varying geometries, configurations, and clearances, while being robust to external perturbations. We also systematically study different self-supervised learning objectives and representation learning architectures. Results are presented in simulation and on a physical robot. [ABSTRACT FROM AUTHOR]
- Subjects :
- *REINFORCEMENT learning
*ROBOT design & construction
*DEEP learning
*VISION
*TASKS
Subjects
Details
- Language :
- English
- ISSN :
- 15523098
- Volume :
- 36
- Issue :
- 3
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Robotics
- Publication Type :
- Academic Journal
- Accession number :
- 143721336
- Full Text :
- https://doi.org/10.1109/TRO.2019.2959445