1. GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
- Author
-
Hatch, Kyle B., Balakrishna, Ashwin, Mees, Oier, Nair, Suraj, Park, Seohong, Wulfe, Blake, Itkina, Masha, Eysenbach, Benjamin, Levine, Sergey, Kollar, Thomas, and Burchfiel, Benjamin
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Image and video generative models that are pre-trained on Internet-scale data can greatly increase the generalization capacity of robot learning systems. These models can function as high-level planners, generating intermediate subgoals for low-level goal-conditioned policies to reach. However, the performance of these systems can be greatly bottlenecked by the interface between generative models and low-level controllers. For example, generative models may predict photorealistic yet physically infeasible frames that confuse low-level policies. Low-level policies may also be sensitive to subtle visual artifacts in generated goal images. This paper addresses these two facets of generalization, providing an interface to effectively "glue together" language-conditioned image or video prediction models with low-level goal-conditioned policies. Our method, Generative Hierarchical Imitation Learning-Glue (GHIL-Glue), filters out subgoals that do not lead to task progress and improves the robustness of goal-conditioned policies to generated subgoals with harmful visual artifacts. We find in extensive experiments in both simulated and real environments that GHIL-Glue achieves a 25% improvement across several hierarchical models that leverage generative subgoals, achieving a new state-of-the-art on the CALVIN simulation benchmark for policies using observations from a single RGB camera. GHIL-Glue also outperforms other generalist robot policies across 3/4 language-conditioned manipulation tasks testing zero-shot generalization in physical experiments., Comment: Code, model checkpoints and videos can be found at https://ghil-glue.github.io
- Published
- 2024