Start Over

Dynamic Assortment Personalization in High Dimensions.

Authors :: Kallus, Nathan
Udell, Madeleine
Source :: Operations Research; Jul/Aug2020, Vol. 68 Issue 4, p1020-1037, 18p
Publication Year :: 2020
Abstract: Learning and Earning in High Dimensions: Assortment Personalization Using Low-Rank Preferences The proliferation of products and users in modern e-commerce presents a substantial challenge. How can a retailer learn each consumer's preference for so many products—and, more importantly, maximize revenue—from sparse transactional data? In "Dynamic Assortment Personalization in High Dimensions," Kallus and Udell show how to use the low rank structure of the matrix of preferences to dynamically choose individualized assortments to offer each consumer. This approach can increase revenue by orders of magnitude by learning across users to avoid excessive exploration. Their new method relies on convex optimization to recover hidden low-dimensional structure from sparse, high-dimensional data. We study the problem of dynamic assortment personalization with large, heterogeneous populations and wide arrays of products, and demonstrate the importance of structural priors for effective, efficient large-scale personalization. Assortment personalization is the problem of choosing, for each individual (type), a best assortment of products, ads, or other offerings (items) so as to maximize revenue. This problem is central to revenue management in e-commerce and online advertising where both items and types can number in the millions. We formulate the dynamic assortment personalization problem as a discrete-contextual bandit with m contexts (types) and exponentially many arms (assortments of the n items). We assume that each type's preferences follow a simple parametric model with n parameters. In all, there are mn parameters, and existing literature suggests that order optimal regret scales as mn. However, the data required to estimate so many parameters is orders of magnitude larger than the data available in most revenue management applications; and the optimal regret under these models is unacceptably high. In this paper, we impose a natural structure on the problem – a small latent dimension, or low rank. In the static setting, we show that this model can be efficiently learned from surprisingly few interactions, using a time- and memory-efficient optimization algorithm that converges globally whenever the model is learnable. In the dynamic setting, we show that structure-aware dynamic assortment personalization can have regret that is an order of magnitude smaller than structure-ignorant approaches. We validate our theoretical results empirically. [ABSTRACT FROM AUTHOR]