Back to Search Start Over

Towards data-efficient deployment of reinforcement learning systems

Authors :
Schulze, Sebastian
Osborne, Michael
Whiteson, Shimon
Publication Year :
2021
Publisher :
University of Oxford, 2021.

Abstract

A fundamental concern in the deployment of artificial agents in real-life is their capacity to quickly adapt to their surroundings. Traditional reinforcement learning (RL) struggles with this requirement in two ways. Firstly, iterative exploration of unconstrained environment dynamics yields numerous uninformative updates and consequently slow adaptation. Secondly, final policies have no capacity to adapt to future observations and have to either slowly learn indefinitely or retrain entirely as observations occur. This thesis explores two formulations aimed at addressing these issues. The consideration of entire task distributions in meta-RL evolves policies quickly adapting to specific instances on their own. By forcing agents to specifically request feedback, Active RL enforces selective observations and updates. Both of these formulations reduce to a Bayes-Adaptive setting in which a probabilistic belief over possible environments is maintained. Many existing solutions only provide asymptotic guarantees that are of limited use in practical contexts. We develop a variational approach to approximate belief management and support its validity empirically through a broad range of ablations. We then consider recently successful planning approaches but uncover and discuss obstacles in their application to the discussed settings. An important factor influencing the data requirements and stability of RL systems is the choice of appropriate hyperparameters. We develop a Bayesian optimisation approach exploiting the iterative structure of training processes whose empiric performance exceeds that of existing baselines. A final contribution of this thesis concerns increasing the scalability and expressiveness of Gaussian Processes (GPs). While we make no direct use of the presented framework, GPs have been used to model probabilistic beliefs in closely related settings.

Subjects

Subjects :
Machine learning

Details

Language :
English
Database :
British Library EThOS
Publication Type :
Dissertation/ Thesis
Accession number :
edsble.864733
Document Type :
Electronic Thesis or Dissertation