Author: "Baumli, Kate" / Topic: fos: computer and information sciences - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Baumli, Kate"' showing total 6 results

Start Over Author "Baumli, Kate" Topic fos: computer and information sciences

6 results on '"Baumli, Kate"'

1. Human-Timescale Adaptation in an Open-Ended Task Space

Author: Adaptive Agent Team, Bauer, Jakob, Baumli, Kate, Baveja, Satinder, Behbahani, Feryal, Bhoopchand, Avishkar, Bradley-Schmieg, Nathalie, Chang, Michael, Clay, Natalie, Collister, Adrian, Dasagi, Vibhavari, Gonzalez, Lucy, Gregor, Karol, Hughes, Edward, Kashem, Sheleem, Loks-Thompson, Maria, Openshaw, Hannah, Parker-Holder, Jack, Pathak, Shreya, Perez-Nieves, Nicolas, Rakicevic, Nemanja, Rocktäschel, Tim, Schroecker, Yannick, Sygnowski, Jakub, Tuyls, Karl, York, Sarah, Zacherl, Alexander, and Zhang, Lei
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Abstract: Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a vast space of held-out environment dynamics, our adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. Adaptation emerges from three ingredients: (1) meta-reinforcement learning across a vast, smooth and diverse task distribution, (2) a policy parameterised as a large-scale attention-based memory architecture, and (3) an effective automated curriculum that prioritises tasks at the frontier of an agent's capabilities. We demonstrate characteristic scaling laws with respect to network size, memory length, and richness of the training task distribution. We believe our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.
Published: 2023
Full Text: View/download PDF

2. Relative Variational Intrinsic Control

Author: Baumli, Kate, Warde-Farley, David, Hansen, Steven, and Mnih, Volodymyr
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), ComputingMilieux_THECOMPUTINGPROFESSION, Computer Science - Artificial Intelligence, ComputingMilieux_COMPUTERSANDEDUCATION, General Medicine, Machine Learning (cs.LG)
Abstract: In the absence of external rewards, agents can still learn useful behaviors by identifying and mastering a set of diverse skills within their environment. Existing skill learning methods use mutual information objectives to incentivize each skill to be diverse and distinguishable from the rest. However, if care is not taken to constrain the ways in which the skills are diverse, trivially diverse skill sets can arise. To ensure useful skill diversity, we propose a novel skill learning objective, Relative Variational Intrinsic Control (RVIC), which incentivizes learning skills that are distinguishable in how they change the agent's relationship to its environment. The resulting set of skills tiles the space of affordances available to the agent. We qualitatively analyze skill behaviors on multiple environments and show how RVIC skills are more useful than skills discovered by existing methods when used in hierarchical reinforcement learning., Comment: Accepted by AAAI2021
Published: 2021

3. Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

Author: Zahavy, Tom, Schroecker, Yannick, Behbahani, Feryal, Baumli, Kate, Flennerhag, Sebastian, Hou, Shaobo, and Singh, Satinder
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose DOMiNO, a method for Diversity Optimization Maintaining Near Optimality. We formalize the problem as a Constrained Markov Decision Process where the objective is to find diverse policies, measured by the distance between the state occupancies of the policies in the set, while remaining near-optimal with respect to the extrinsic reward. We demonstrate that the method can discover diverse and meaningful behaviors in various domains, such as different locomotion patterns in the DeepMind Control Suite. We perform extensive analysis of our approach, compare it with other multi-objective baselines, demonstrate that we can control both the quality and the diversity of the set via interpretable hyperparameters, and show that the discovered set is robust to perturbations.
Published: 2022
Full Text: View/download PDF

4. Self-Consistent Models and Values

Author: Farquhar, Gregory, Baumli, Kate, Marinho, Zita, Filos, Angelos, Hessel, Matteo, van Hasselt, Hado, and Silver, David
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment. In particular, models enable planning, i.e. using more computation to improve value functions or policies, without requiring additional environment interactions. In this work, we investigate a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly \emph{self-consistent}. Our approach differs from classic planning methods such as Dyna, which only update values to be consistent with the model. We propose multiple self-consistency updates, evaluate these in both tabular and function approximation settings, and find that, with appropriate choices, self-consistency helps both policy evaluation and control., NeurIPS 2021
Published: 2021

5. Learning more skills through optimistic exploration

Author: Strouse, DJ, Baumli, Kate, Warde-Farley, David, Mnih, Vlad, and Hansen, Steven
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Unsupervised skill learning objectives (Gregor et al., 2016, Eysenbach et al., 2018) allow agents to learn rich repertoires of behavior in the absence of extrinsic rewards. They work by simultaneously training a policy to produce distinguishable latent-conditioned trajectories, and a discriminator to evaluate distinguishability by trying to infer latents from trajectories. The hope is for the agent to explore and master the environment by encouraging each skill (latent) to reliably reach different states. However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective. To combat this inherent pessimism towards exploration, we derive an information gain auxiliary objective that involves training an ensemble of discriminators and rewarding the policy for their disagreement. Our objective directly estimates the epistemic uncertainty that comes from the discriminator not having seen enough training examples, thus providing an intrinsic reward more tailored to the true objective compared to pseudocount-based methods (Burda et al., 2019). We call this exploration bonus discriminator disagreement intrinsic reward, or DISDAIN. We demonstrate empirically that DISDAIN improves skill learning both in a tabular grid world (Four Rooms) and the 57 games of the Atari Suite (from pixels). Thus, we encourage researchers to treat pessimism with DISDAIN., Comment: Accepted at ICLR 2022 (spotlight)
Published: 2021
Full Text: View/download PDF

6. Acme: A Research Framework for Distributed Reinforcement Learning

Author: Hoffman, Matthew W., Shahriari, Bobak, Aslanides, John, Barth-Maron, Gabriel, Momchev, Nikola, Sinopalnikov, Danila, Stańczyk, Piotr, Ramos, Sabela, Raichuk, Anton, Vincent, Damien, Hussenot, Léonard, Dadashi, Robert, Dulac-Arnold, Gabriel, Orsini, Manu, Jacq, Alexis, Ferret, Johan, Vieillard, Nino, Ghasemipour, Seyed Kamyar Seyed, Girgin, Sertan, Pietquin, Olivier, Behbahani, Feryal, Norman, Tamara, Abdolmaleki, Abbas, Cassirer, Albin, Yang, Fan, Baumli, Kate, Henderson, Sarah, Friesen, Abe, Haroun, Ruba, Novikov, Alex, Colmenarejo, Sergio Gómez, Cabi, Serkan, Gulcehre, Caglar, Paine, Tom Le, Srinivasan, Srivatsan, Cowie, Andrew, Wang, Ziyu, Piot, Bilal, and de Freitas, Nando
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce published RL algorithms. To address these concerns this work describes Acme, a framework for constructing novel RL algorithms that is specifically designed to enable agents that are built using simple, modular components that can be used at various scales of execution. While the primary goal of Acme is to provide a framework for algorithm development, a secondary goal is to provide simple reference implementations of important or state-of-the-art algorithms. These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research. In this work we describe the major design decisions made within Acme and give further details as to how its components can be used to implement various algorithms. Our experiments provide baselines for a number of common and state-of-the-art algorithms as well as showing how these algorithms can be scaled up for much larger and more complex environments. This highlights one of the primary advantages of Acme, namely that it can be used to implement large, distributed RL algorithms that can run at massive scales while still maintaining the inherent readability of that implementation. This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme., This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Baumli, Kate"'

1. Human-Timescale Adaptation in an Open-Ended Task Space

2. Relative Variational Intrinsic Control

3. Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

4. Self-Consistent Models and Values

5. Learning more skills through optimistic exploration

6. Acme: A Research Framework for Distributed Reinforcement Learning

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

6 results on '"Baumli, Kate"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources