Back to Search
Start Over
A learning search algorithm with propagational reinforcement learning.
- Source :
- Applied Intelligence; Nov2021, Vol. 51 Issue 11, p7990-8009, 20p
- Publication Year :
- 2021
-
Abstract
- When reinforcement learning with a deep neural network is applied to heuristic search, the search becomes a learning search. In a learning search system, there are two key components: (1) a deep neural network with sufficient expression ability as a heuristic function approximator that estimates the distance from any state to a goal; (2) a strategy to guide the interaction of an agent with its environment to obtain more efficient simulated experience to update the Q-value or V-value function of reinforcement learning. To date, neither component has been sufficiently discussed. This study theoretically discusses the size of a deep neural network for approximating a product function of p piecewise multivariate linear functions. The existence of such a deep neural network with O(n + p) layers and O(dn + dnp + dp) neurons has been proven, where d is the number of variables of the multivariate function being approximated, 𝜖 is the approximation error, and n = O(p + log<subscript>2</subscript>(pd/𝜖)). For the second component, this study proposes a general propagational reinforcement-learning-based learning search method that improves the estimate h(.) according to the newly observed distance information about the goals, propagates the improvement bidirectionally in the search tree, and consequently obtains a sequence of more accurate V-values for a sequence of states. Experiments on the maze problems show that our method increases the convergence rate of reinforcement learning by a factor of 2.06 and reduces the number of learning episodes to 1/4 that of other nonpropagating methods. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 0924669X
- Volume :
- 51
- Issue :
- 11
- Database :
- Complementary Index
- Journal :
- Applied Intelligence
- Publication Type :
- Academic Journal
- Accession number :
- 152853731
- Full Text :
- https://doi.org/10.1007/s10489-020-02117-0