A Thompson Sampling Algorithm With Logarithmic Regret for Unimodal Gaussian Bandit.

Authors :: Yang L
Li Z
Hu Z
Ruan S
Pan G
Source :: IEEE transactions on neural networks and learning systems [IEEE Trans Neural Netw Learn Syst] 2023 Sep; Vol. 34 (9), pp. 5332-5341. Date of Electronic Publication: 2023 Sep 01.
Publication Year :: 2023
Abstract: In this article, we propose a Thompson sampling algorithm with Gaussian prior for unimodal bandit under Gaussian reward setting, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, the proposed algorithm makes decisions according to posterior distribution only in the arm's neighborhood with the highest empirical mean estimate. We theoretically prove that the asymptotic regret of our algorithm reaches O(logT) , i.e., it shares the same regret order with asymptotic optimal algorithms, which is comparable to extensive existing state-of-the-art unimodal multiarm bandit (U-MAB) algorithms. Finally, we use extensive experiments to demonstrate the effectiveness of the proposed algorithm on both synthetic datasets and real-world applications.

Full Text Access

Tools