Back to Search Start Over

Hierarchical Reinforcement Learning with Model-Based Planning for Finding Sparse Rewards

Authors :
Bartley, Travis D.
Shoukry, Yasser1
Kurdahi, Fadi
Bartley, Travis D.
Bartley, Travis D.
Shoukry, Yasser1
Kurdahi, Fadi
Bartley, Travis D.
Publication Year :
2023

Abstract

Reinforcement learning (RL) has proven useful for a wide variety of important applications, including robotics, autonomous vehicles, healthcare, finance, gaming, recommendation systems, and advertising, among many others. In general, RL involves training an agent to make decisions based on a reward signal. One of the major challenges in the field is the sparse reward problem, which occurs when the agent receives rewards only occasionally during the training process. This can make conventional RL algorithms difficult to train since the agent does not receive enough feedback to learn the optimal policy. Model-based planning is one potential solution to the sparse reward problem since it enables an agent to simulate their actions and predict the outcome far into the future. However, planning can be computationally expensive or even intractable when too many time steps are required to be internally simulated, due to combinatorial explosion.To address these challenges, this thesis presents a new RL algorithm that uses a hierarchy of model-based (manager) and model-free (worker) policies to take advantage of the unique advantages of both. The worker takes guidance from the manager in the form of a goal or selected policy. The worker is computationally efficient and can respond to changes or uncertainty in the environment to carry out its task. From the manager’s perspective, this abstracts away the trivially small state transitions, reducing the depth needed for tree search, and greatly improving the efficiency of planning. Two different applications were used for evaluation of the hierarchical agent. The first is a maze navigation environment, with continuous-state dynamics and unique episodes. This makes the environment extremely challenging for both model-based and model-free algorithms. The performance of the agent was evaluated on multiple platforms for the random maze task, including DeepMind Lab. For the second demonstration, the proposed algorithm was compared aga

Details

Database :
OAIster
Notes :
application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1377973408
Document Type :
Electronic Resource