Back to Search Start Over

Earning while learning: using Thompson Sampling to maximize rewards from online sales.

Authors :
Ellina, Andria.
Ellina, Andria.
Source :
University of Southampton
Publication Year :
2020

Abstract

The problem of finding the best option amongst a range of suboptimal candidates in an uncertain environment is a challenging task in a number of domains ranging from clinical trials to advertising, website optimization and dynamic pricing. Initially, very little is known about the performance of the different options and the decision maker needs to simultaneously learn about the performance of the different options and earn some reward from the decisions made. This introduces a trade-off between "exploration", the phase where new information is being acquired and "exploitation", where the goal is maximizing rewards or alternatively minimizing total regret. Regret is defined as the difference between the reward of an oracle strategy that selects the best option at each time step and the reward of the option we choose. In this thesis, we develop new algorithms based on Thompson Sampling that improve the overall performance and minimize total regret. Numerical experiments are performed on simulated datasets in order to examine the effect of the algorithms’ hyperparameters, to assess the robustness of the algorithms presented and compare the performance of our new algorithms with current algorithms. We use benchmarking experiments for a fair comparison of the different algorithms on the simulated datasets. An additional complication, especially common in the area of revenue management, is seasonal changes that have an impact on the performance of the different options and consequently affect our decisions. In order to tackle the challenge of non-stationarity we deploy contextual Thompson Sampling to account for seasonality and develop a new algorithm that combines contextual Thompson Sampling with a standard statistical model selection method to solve the problem of unknown seasonality in the reward distribution of the candidate options. Finally, we focus on an application of dynamic pricing in which we develop an algorithm that l

Details

Database :
OAIster
Journal :
University of Southampton
Publication Type :
Electronic Resource
Accession number :
edsoai.on1359210980
Document Type :
Electronic Resource