1. Vision-Language Models as a Source of Rewards
- Author
-
Baumli, Kate, Baveja, Satinder, Behbahani, Feryal, Chan, Harris, Comanici, Gheorghe, Flennerhag, Sebastian, Gazeau, Maxime, Holsheimer, Kristian, Horgan, Dan, Laskin, Michael, Lyle, Clare, Masoom, Hussain, McKinney, Kay, Mnih, Volodymyr, Neitz, Alexander, Nikulin, Dmitry, Pardo, Fabio, Parker-Holder, Jack, Quan, John, Rocktäschel, Tim, Sahni, Himanshu, Schaul, Tom, Schroecker, Yannick, Spencer, Stephen, Steigerwald, Richie, Wang, Luyu, and Zhang, Lei
- Subjects
Computer Science - Machine Learning - Abstract
Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents., Comment: 10 pages, 5 figures
- Published
- 2023