Back to Search Start Over

A learning search algorithm with propagational reinforcement learning.

Authors :
Zhang, Wei
Source :
Applied Intelligence; Nov2021, Vol. 51 Issue 11, p7990-8009, 20p
Publication Year :
2021

Abstract

When reinforcement learning with a deep neural network is applied to heuristic search, the search becomes a learning search. In a learning search system, there are two key components: (1) a deep neural network with sufficient expression ability as a heuristic function approximator that estimates the distance from any state to a goal; (2) a strategy to guide the interaction of an agent with its environment to obtain more efficient simulated experience to update the Q-value or V-value function of reinforcement learning. To date, neither component has been sufficiently discussed. This study theoretically discusses the size of a deep neural network for approximating a product function of p piecewise multivariate linear functions. The existence of such a deep neural network with O(n + p) layers and O(dn + dnp + dp) neurons has been proven, where d is the number of variables of the multivariate function being approximated, 𝜖 is the approximation error, and n = O(p + log<subscript>2</subscript>(pd/𝜖)). For the second component, this study proposes a general propagational reinforcement-learning-based learning search method that improves the estimate h(.) according to the newly observed distance information about the goals, propagates the improvement bidirectionally in the search tree, and consequently obtains a sequence of more accurate V-values for a sequence of states. Experiments on the maze problems show that our method increases the convergence rate of reinforcement learning by a factor of 2.06 and reduces the number of learning episodes to 1/4 that of other nonpropagating methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0924669X
Volume :
51
Issue :
11
Database :
Complementary Index
Journal :
Applied Intelligence
Publication Type :
Academic Journal
Accession number :
152853731
Full Text :
https://doi.org/10.1007/s10489-020-02117-0