Back to Search
Start Over
SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets
- Source :
- IJCAI
- Publication Year :
- 2019
- Publisher :
- International Joint Conferences on Artificial Intelligence Organization, 2019.
-
Abstract
- Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.
Details
- Database :
- OpenAIRE
- Journal :
- Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
- Accession number :
- edsair.doi...........9848abe5d124b8332994d556a9dba8ec
- Full Text :
- https://doi.org/10.24963/ijcai.2019/360