Back to Search Start Over

TD-learning with exploration.

Authors :
Meyn, Sean P.
Surana, Amit
Source :
IEEE Conference on Decision & Control & European Control Conference; 1/ 1/2011, p148-155, 8p
Publication Year :
2011

Abstract

We introduce exploration in the TD-learning algorithm to approximate the value function for a given policy. In this way we can modify the norm used for approximation, “zooming in” to a region of interest in the state space. We also provide extensions to SARSA to eliminate the need for numerical integration in policy improvement. Construction of the algorithm and its analysis build on recent general results concerning the spectral theory of Markov chains and positive operators. [ABSTRACT FROM PUBLISHER]

Details

Language :
English
ISBNs :
9781612848006
Database :
Complementary Index
Journal :
IEEE Conference on Decision & Control & European Control Conference
Publication Type :
Conference
Accession number :
86615125
Full Text :
https://doi.org/10.1109/CDC.2011.6160851