Author: "Koppe, Benjamin" - Searchworks@Jio Institute Digital Library Search Results

Author: Ryu, J. Jon, Kwon, Jeongyeol, Koppe, Benjamin, and Jun, Kwang-Sung
Subjects: Computer Science - Machine Learning, Computer Science - Information Theory, Statistics - Machine Learning
Abstract: We consider the off-policy selection and learning in contextual bandits where the learner aims to select or train a reward-maximizing policy using data collected by a fixed behavior policy. Our contribution is two-fold. First, we propose a novel off-policy selection method that leverages a new betting-based confidence bound applied to an inverse propensity weight sequence. Our theoretical analysis reveals that our method achieves a significantly better, variance-adaptive guarantee upon prior art. Second, we propose a novel and generic condition on the optimization objective for off-policy learning that strikes a difference balance in bias and variance. One special case that we call freezing tends to induce small variance, which is preferred in small-data regimes. Our analysis shows that they match the best existing guarantee. In our empirical study, our selection method outperforms existing methods, and freezing exhibits improved performance in small-sample regimes., Comment: 36 pages, 8 figures
Published: 2025

Author: Sherbatov, Alissa, primary, Hsiang, Evan, additional, Kilburn, Cassie, additional, Ortiz, Joseph, additional, and Koppe, Benjamin, additional
Published: 2021
Full Text: View/download PDF

Searchworks