Back to Search Start Over

Reinforcement learning by incremental patching

Authors :
Sammut, Claude, Computer Science & Engineering, Faculty of Engineering, UNSW
Uther, William, National Information & Communication Technology Australia
Kim, Min Sub, Computer Science & Engineering, Faculty of Engineering, UNSW
Sammut, Claude, Computer Science & Engineering, Faculty of Engineering, UNSW
Uther, William, National Information & Communication Technology Australia
Kim, Min Sub, Computer Science & Engineering, Faculty of Engineering, UNSW
Publication Year :
2007

Abstract

This thesis investigates how an autonomous reinforcement learning agent can improve on an approximate solution by augmenting it with a small patch, which overrides the approximate solution at certain states of the problem. In reinforcement learning, many approximate solutions are smaller and easier to produce than “flat” solutions that maintain distinct parameters for each fully enumerated state, but the best solution within the constraints of the approximation may fall well short of global optimality. This thesis proposes that the remaining gap to global optimality can be efficiently minimised by learning a small patch over the approximate solution.In order to improve the agent’s behaviour, algorithms are presented for learning the overriding patch. The patch is grown around particular regions of the problem where the approximate solution is found to be deficient. Two heuristic strategies are proposed for concentrating resources to those areas where inaccuracies in the approximate solution are most costly, drawing a compromise between solution quality and storage requirements. Patching also handles problems with continuous state variables, by two alternative methods: Kuhn triangulation over a fixed discretisation and nearest neighbour interpolation with a variable discretisation.As well as improving the agent’s behaviour, patching is also applied to the agent’s model of the environment. Inaccuracies in the agent’s model of the world are detected by statistical testing, using a selective sampling strategy to limit storage requirements for collecting data.The patching algorithms are demonstrated in several problem domains, illustrating the effectiveness of patching under a wide range of conditions. A scenario drawn from a real-time strategy game demonstrates the ability of patching to handle large complex tasks.These contributions combine to form a general framework for patching over approximate solutions in reinforcement learning. Compl

Details

Database :
OAIster
Notes :
English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1183379387
Document Type :
Electronic Resource