SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

Authors :: Ritesh Agarwal
Craig Boutilier
Sanmit Narvekar
Eugene Ie
Jing Wang
Rui Wu
Vihan Jain
Tushar Deepak Chandra
Heng-Tze Cheng
Source :: IJCAI
Publication Year :: 2019
Publisher :: International Joint Conferences on Artificial Intelligence Organization, 2019.
Abstract: Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.

Subjects :: Computer science
business.industry
Decomposition (computer science)
Reinforcement learning
Artificial intelligence
business

Database :: OpenAIRE
Journal :: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Accession number :: edsair.doi...........9848abe5d124b8332994d556a9dba8ec
Full Text :: https://doi.org/10.24963/ijcai.2019/360

Full Text Access

Tools