Statistical Mechanics of the Delayed Reward-Based Learning with Node Perturbation

Authors :: Saito, Hiroshi
Katahira, Kentaro
Okanoya, Kazuo
Okada, Masato
Source :: Journal of the Physical Society of Japan; June 2010, Vol. 79 Issue: 6 p064003-064006, 4p
Publication Year :: 2010
Abstract: In reward-based learning, reward is typically given with some delay after a behavior that causes the reward. In machine learning literature, the framework of the eligibility trace has been used as one of the solutions to handle the delayed reward in reinforcement learning. In recent studies, the eligibility trace is implied to be important for difficult neuroscience problem known as the “distal reward problem”. Node perturbation is one of the stochastic gradient methods from among many kinds of reinforcement learning implementations, and it searches the approximate gradient by introducing perturbation to a network. Since the stochastic gradient method does not require a objective function differential, it is expected to be able to account for the learning mechanism of a complex system, like a brain. We study the node perturbation with the eligibility trace as a specific example of delayed reward-based learning, and analyzed it using a statistical mechanics approach. As a result, we show the optimal time constant of the eligibility trace respect to the reward delay and the existence of unlearnable parameter configurations.

Full Text Access

Tools