1. Predicting before acting: improving policy quality by taking a vision of consequence.
- Author
-
Zhou, Xiaoshu, Zhu, Fei, and Zhao, Peiyao
- Subjects
- *
REINFORCEMENT learning , *LOW vision , *DEEP learning , *LEARNING strategies - Abstract
Deep reinforcement learning has achieved great success in many fields. However, the agent may get trapped during the exploration, lingering around feckless states that pull the agent away from optimal policies. Thus it's worth studying how to improve learning strategies by forecasting future states. To solve the problem, an algorithm that predicts behaviour consequences before taking action, referred to as Thinking about Action Consequence (TAC), based on the combination of the current state and the policy, is proposed. Deep reinforcement learning approaches equipped with TAC take advantage of the intrinsic prediction framework of TAC as well as bad consequence experiences to determine whether to continue exploring the current region or to evade the potential undesired state discriminated by TAC and instead turn to explore other areas. Since TAC will modify the future plan according to the execution of the policy and the changes of the environment, considering the consequences enables the agent to carry out a suitable exploration mechanism, leading to better performance. Compared with traditional Deep Q Learning (DQN), Double DQN, Dueling DQN, Prioritised Experience Replay DQN, and Intrinsic Reward (IR) methods with uncertain stimulus exploration, our TAC method shows better performance in Atari and Box2d environments. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF