Back to Search
Start Over
UniRank: Unimodal Bandit Algorithm for Online Ranking
- Source :
- ICML 2022-39th International Conference on Machine Learning, ICML 2022-39th International Conference on Machine Learning, Jul 2022, Baltimore, United States. pp.1-31, HAL
- Publication Year :
- 2022
- Publisher :
- HAL CCSD, 2022.
-
Abstract
- International audience; We tackle, in the multiple-play bandit setting, the online ranking problem of assigning $L$ items to $K$ predefined positions on a web page in order to maximize the number of user clicks. We propose a generic algorithm, UniRank, that tackles state-of-the-art click models. The regret bound of this algorithm is a direct consequence of the unimodality-like property of the bandit setting with respect toa graph where nodes are ordered sets of indistinguishable items.The main contribution of UniRank is its $O\left(L/\Delta \log T\right)$ regret for $T$ consecutive assignments, where $\Delta$ relates to the reward-gap between two items.This regret bound is based on the usually implicit condition that two items may not have the same attractiveness.Experiments against state-of-the-art learning algorithms specialized or not for different click models, show that our method has better regret performance than other generic algorithms on real life and synthetic datasets.
- Subjects :
- FOS: Computer and information sciences
Multi-Armed Bandit
Computer Science - Machine Learning
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
Multiple-Play Bandit
Computer Science - Data Structures and Algorithms
Data Structures and Algorithms (cs.DS)
Online Learning to Rank
Unimodal Bandit
Machine Learning (cs.LG)
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- ICML 2022-39th International Conference on Machine Learning, ICML 2022-39th International Conference on Machine Learning, Jul 2022, Baltimore, United States. pp.1-31, HAL
- Accession number :
- edsair.doi.dedup.....c3042ce94d22c04407d90d23a114a1f9