Discounted UCB1-tuned for Q-learning

Authors :: Katsuhiro Honda
Koki Saito
Akira Notsu
Source :: SCIS&ISIS
Publication Year :: 2014
Publisher :: IEEE, 2014.
Abstract: Discounted UCB1-tuned was proposed as one of the methods to choose the action in a multi-armed bandit problem. This algorithm is an optimized selection method for balancing between the exploration and the exploitation, by using weighted value and weighted variance. In this paper, we proposed the method to apply Discounted UCB1-tuned to Q-learning, and experimentally evaluated its performance in the continuous state spaces shortest path problem.

Subjects :: Computer Science::Machine Learning
Mathematical optimization
Shortest path problem
Q-learning
Value (computer science)
Selection method
State (functional analysis)
Variance (accounting)
Canadian traveller problem
Action (physics)
Mathematics

Database :: OpenAIRE
Journal :: 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS)
Accession number :: edsair.doi...........9f35c413cd139bc6a25c62259760f02e

Tools