Start Over

Deep Reinforcement Learning in complex environments

Authors :: Nardelli, N
Torr, P
Publication Year :: 2022
Abstract: Deep Reinforcement Learning (DRL), is becoming a popular and mature framework for learning to solve sequential decision making problems. The application of Deep Neural Networks, flexible and powerful function approximators, towards learning policies has effectively enabled RL to solve applications that were thought to be too difficult: from beating professional human players in hard games such as Go, to becoming the foundation for flexible embodied control. We explore what happens when one attempts to learn policies in environments that present complex dynamics and hard and structured tasks. As these environments provide challenges that lie fundamentally at the forefront what most state-of-the-art Reinforcement Learning methods try to tackle, they provide a general view of existing weaknesses, while also providing opportunities for improving the general framework as well as particular algorithms. Firstly, we study and develop methods for Deep Multi-Agent Reinforcement Learning, a setting in which multiple agents are interacting with an (often complex) environment and each other. The presence of multiple agents breaks some of the key assumptions that provide necessary stability to standard learning methods, creating unique and interesting problems. We test these methods by formulating a multi-agent version of the StarCraft micromanagement problem, an extremely complex real-time control and planning problem based on one of the hardest environments currently available in the literature. Secondly, in a single-agent version of the same problem, we investigate how DRL can be used to develop a set of parameter-efficient differentiable planning modules to solve path-planning tasks with complex environment dynamics and variable map sizes. We show that the modules enable learning to plan when the environment also includes stochastic elements, providing a cost-efficient learning system to build low-level size-invariant planners for a variety of interactive, hard navigation problems. Thirdly, and lastly, we present a novel RL benchmark based on one of the oldest and most complex video games ever developed: the NetHack Learning Environment (NLE). NLE provides an environment that is scalable, rich, and challenging for state-of-the-art RL, while maintaining familiarity with standard grid-worlds, and dramatically decreasing the computational requirements compared to existing environments of similar complexity and scope. We believe that this particular intersection of properties will enable the community to employ a single environment both as a debugging tool for increasingly complicated RL agents, and as a target for the next decade of RL research.