644 results on '"multi-armed bandits"'
Search Results
2. Online Automated Imbalanced Learning via Adaptive Thompson Sampling
- Author
-
Wang, Zhaoyang, Wang, Shuo, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Posterior Tracking Algorithm for Multi-objective Classification Bandits
- Author
-
Suzuki, Riku, Nakamura, Atsuyoshi, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Gong, Mingming, editor, Song, Yiliao, editor, Koh, Yun Sing, editor, Xiang, Wei, editor, and Wang, Derui, editor
- Published
- 2025
- Full Text
- View/download PDF
4. Introduction to the Bandit Problems
- Author
-
Lin, Baihan, Celebi, Emre, Series Editor, Chen, Jingdong, Series Editor, Gopi, E. S., Series Editor, Neustein, Amy, Series Editor, Liotta, Antonio, Series Editor, Di Mauro, Mario, Series Editor, and Lin, Baihan
- Published
- 2025
- Full Text
- View/download PDF
5. Reinforcement learning for sequential decision making in population research.
- Author
-
Deliu, Nina
- Subjects
REINFORCEMENT learning ,SEQUENTIAL learning ,HEALTH policy ,POPULATION policy ,DECISION making - Abstract
Reinforcement learning (RL) algorithms have been long recognized as powerful tools for optimal sequential decision making. The framework is concerned with a decision maker, the agent, that learns how to behave in an unknown environment by making decisions and seeing their associated outcome. The goal of the RL agent is to infer, through repeated experience, an optimal decision-making policy, i.e., a sequence of action rules that would lead to the highest, typically long-term, expected utility. Today, a wide range of domains, from economics to education and healthcare, have embraced the use of RL to address specific problems. To illustrate, we used an RL-based algorithm to design a text-messaging system that delivers personalized real-time behavioural recommendations to promote physical activity and manage depression. Motivated by the recent call of the UNECE for government-wide actions to adapt to population ageing, in this work, we argue that the RL framework may provide a set of compelling strategies for supporting population research and informing population policies. After introducing the RL framework, we discuss its potential in three population-study applications: international migration, public health, and fertility. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Sequential query prediction based on multi-armed bandits with ensemble of transformer experts and immediate feedback.
- Author
-
Puthiya Parambath, Shameem A., Anagnostopoulos, Christos, and Murray-Smith, Roderick
- Subjects
LANGUAGE models ,CAUSAL models ,POPULAR literature ,PREDICTION models ,DATA analysis - Abstract
We study the problem of predicting the next query to be recommended in interactive data exploratory analysis to guide users to correct content. Current query prediction approaches are based on sequence-to-sequence learning, exploiting past interaction data. However, due to the resource-hungry training process, such approaches fail to adapt to immediate user feedback. Immediate feedback is essential and considered as a signal of the user's intent. We contribute with a novel query prediction ensemble mechanism, which adapts to immediate feedback relying on multi-armed bandits framework. Our mechanism, an extension to the popular Exp3 algorithm, augments Transformer-based language models for query predictions by combining predictions from experts, thus dynamically building a candidate set during exploration. Immediate feedback is leveraged to choose the appropriate prediction in a probabilistic fashion. We provide comprehensive large-scale experimental and comparative assessment using a popular online literature discovery service, which showcases that our mechanism (i) improves the per-round regret substantially against state-of-the-art Transformer-based models and (ii) shows the superiority of causal language modelling over masked language modelling for query recommendations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Learning Equilibria in Matching Markets with Bandit Feedback.
- Author
-
JAGADEESAN, MEENA, WEI, ALEXANDER, YIXIN WANG, JORDAN, MICHAEL I., and STEINHARDT, JACOB
- Subjects
MARKET equilibrium ,STOCHASTIC learning models ,MACHINE learning - Abstract
Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data. Classical notions of stability (Gale and Shapley, 1962; Shapley and Shubik, 1971) are, unfortunately, of limited value in the learning setting, given that preferences are inherently uncertain and destabilizing while they are being learned. To bridge this gap, we develop a framework and algorithms for learning stable market outcomes under uncertainty. Our primary setting is matching with transferable utilities, where the platform both matches agents and sets monetary transfers between them. We design an incentive-aware learning objective that captures the distance of a market outcome from equilibrium. Using this objective, we analyze the complexity of learning as a function of preference structure, casting learning as a stochastic multi-armed bandit problem. Algorithmically, we show that “optimism in the face of uncertainty,” the principle underlying many bandit algorithms, applies to a primal-dual formulation of matching with transfers and leads to near-optimal regret bounds. Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
8. Multi‐armed bandit based online model selection for concept‐drift adaptation.
- Author
-
Wilson, Jobin, Chaudhury, Santanu, and Lall, Brejesh
- Subjects
- *
ROBBERS , *TRAINING needs , *ALGORITHMS , *FORECASTING - Abstract
Ensemble methods are among the most effective concept‐drift adaptation techniques due to their high learning performance and flexibility. However, they are computationally expensive and pose a challenge in applications involving high‐speed data streams. In this paper, we present a computationally efficient heterogeneous classifier ensemble entitled OMS‐MAB which uses online model selection for concept‐drift adaptation by posing it as a non‐stationary multi‐armed bandit (MAB) problem. We use a MAB to select a single adaptive learner within the ensemble for learning and prediction while systematically exploring promising alternatives. Each ensemble member is made drift resistant using explicit drift detection and is represented as an arm of the MAB. An exploration factor ϵ controls the trade‐off between predictive performance and computational resource requirements, eliminating the need to continuously train and evaluate all the ensemble members. A rigorous evaluation on 20 benchmark datasets and 9 algorithms indicates that the accuracy of OMS‐MAB is statistically at par with state‐of‐the‐art (SOTA) ensembles. Moreover, it offers a significant reduction in execution time and model size in comparison to several SOTA ensemble methods, making it a promising ensemble for resource constrained stream‐mining problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Online Learning in Budget-Constrained Dynamic Colonel Blotto Games.
- Author
-
Leon, Vincent and Etesami, S. Rasoul
- Abstract
In this paper, we study the strategic allocation of limited resources using a Colonel Blotto game (CBG) under a dynamic setting and analyze the problem using an online learning approach. In this model, one of the players is a learner who has limited troops to allocate over a finite time horizon, and the other player is an adversary. In each round, the learner plays a one-shot Colonel Blotto game with the adversary and strategically determines the allocation of troops among battlefields based on past observations. The adversary chooses its allocation action randomly from some fixed distribution that is unknown to the learner. The learner's objective is to minimize its regret, which is the difference between the cumulative reward of the best mixed strategy and the realized cumulative reward by following a learning algorithm while not violating the budget constraint. The learning in dynamic CBG is analyzed under the framework of combinatorial bandits and bandits with knapsacks. We first convert the budget-constrained dynamic CBG to a path planning problem on directed graph. We then devise an efficient algorithm that combines a special combinatorial bandit algorithm for path planning problem and a bandits with knapsack algorithm to cope with the budget constraint. The theoretical analysis shows that the learner's regret is bounded by a term sublinear in time horizon and polynomial in other parameters. Finally, we justify our theoretical results by carrying out simulations for various scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Reinforcement Learning in Modern Biostatistics: Constructing Optimal Adaptive Interventions.
- Author
-
Deliu, Nina, Williams, Joseph Jay, and Chakraborty, Bibhas
- Subjects
- *
MOBILE health , *BIOMETRY , *ARTIFICIAL intelligence , *RESEARCH personnel , *MACHINE learning - Abstract
Summary In recent years, reinforcement learning (RL) has acquired a prominent position in health‐related sequential decision‐making problems, gaining traction as a valuable tool for delivering adaptive interventions (AIs). However, in part due to a poor synergy between the methodological and the applied communities, its real‐life application is still limited and its potential is still to be realised. To address this gap, our work provides the first unified technical survey on RL methods, complemented with case studies, for constructing various types of AIs in healthcare. In particular, using the common methodological umbrella of RL, we bridge two seemingly different AI domains, dynamic treatment regimes and just‐in‐time adaptive interventions in mobile health, highlighting similarities and differences between them and discussing the implications of using RL. Open problems and considerations for future research directions are outlined. Finally, we leverage our experience in designing case studies in both areas to showcase the significant collaborative opportunities between statistical, RL and healthcare researchers in advancing AIs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Fast Weakness Identification for Adaptive Feedback
- Author
-
Morland, Raymond, Wang, Lawrence, Lin, Fuhua, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sifaleras, Angelo, editor, and Lin, Fuhua, editor
- Published
- 2024
- Full Text
- View/download PDF
12. QuizMaster: An Adaptive Formative Assessment System
- Author
-
Lin, Fuhua, Morland, Raymond, Yan, Hongxin, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sifaleras, Angelo, editor, and Lin, Fuhua, editor
- Published
- 2024
- Full Text
- View/download PDF
13. Multi-armed Bandits with Generalized Temporally-Partitioned Rewards
- Author
-
Broek, Ronald C. van den, Litjens, Rik, Sagis, Tobias, Verbeeke, Nina, Gajane, Pratik, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Miliou, Ioanna, editor, Piatkowski, Nico, editor, and Papapetrou, Panagiotis, editor
- Published
- 2024
- Full Text
- View/download PDF
14. Learning Action Embeddings for Off-Policy Evaluation
- Author
-
Cief, Matej, Golebiowski, Jacek, Schmidt, Philipp, Abedjan, Ziawasch, Bekasov, Artur, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Goharian, Nazli, editor, Tonellotto, Nicola, editor, He, Yulan, editor, Lipani, Aldo, editor, McDonald, Graham, editor, Macdonald, Craig, editor, and Ounis, Iadh, editor
- Published
- 2024
- Full Text
- View/download PDF
15. Performance Fuzzing with Reinforcement-Learning and Well-Defined Constraints for the B Method
- Author
-
Dunkelau, Jannik, Leuschel, Michael, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Herber, Paula, editor, and Wijs, Anton, editor
- Published
- 2024
- Full Text
- View/download PDF
16. Starlet: Network defense resource allocation with multi-armed bandits for cloud-edge crowd sensing in IoT
- Author
-
Hui Xia, Ning Huang, Xuecai Feng, Rui Zhang, and Chao Liu
- Subjects
Internet of things ,Defense resource sharing ,Multi-armed bandits ,Defense resource allocation ,Information technology ,T58.5-58.64 - Abstract
The cloud platform has limited defense resources to fully protect the edge servers used to process crowd sensing data in Internet of Things. To guarantee the network's overall security, we present a network defense resource allocation with multi-armed bandits to maximize the network's overall benefit. Firstly, we propose the method for dynamic setting of node defense resource thresholds to obtain the defender (attacker) benefit function of edge servers (nodes) and distribution. Secondly, we design a defense resource sharing mechanism for neighboring nodes to obtain the defense capability of nodes. Subsequently, we use the decomposability and Lipschitz continuity of the defender's total expected utility to reduce the difference between the utility's discrete and continuous arms and analyze the difference theoretically. Finally, experimental results show that the method maximizes the defender's total expected utility and reduces the difference between the discrete and continuous arms of the utility.
- Published
- 2024
- Full Text
- View/download PDF
17. Multi-class boosting for the analysis of multiple incomplete views on microbiome data
- Author
-
Andrea Simeon, Miloš Radovanović, Tatjana Lončar-Turukalo, Michelangelo Ceci, Sanja Brdar, and Gianvito Pio
- Subjects
Multi-view learning ,Incomplete views ,Multi-armed bandits ,Boosting ,Microbiome analysis ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Microbiome dysbiosis has recently been associated with different diseases and disorders. In this context, machine learning (ML) approaches can be useful either to identify new patterns or learn predictive models. However, data to be fed to ML methods can be subject to different sampling, sequencing and preprocessing techniques. Each different choice in the pipeline can lead to a different view (i.e., feature set) of the same individuals, that classical (single-view) ML approaches may fail to simultaneously consider. Moreover, some views may be incomplete, i.e., some individuals may be missing in some views, possibly due to the absence of some measurements or to the fact that some features are not available/applicable for all the individuals. Multi-view learning methods can represent a possible solution to consider multiple feature sets for the same individuals, but most existing multi-view learning methods are limited to binary classification tasks or cannot work with incomplete views. Results We propose irBoost.SH, an extension of the multi-view boosting algorithm rBoost.SH, based on multi-armed bandits. irBoost.SH solves multi-class classification tasks and can analyze incomplete views. At each iteration, it identifies one winning view using adversarial multi-armed bandits and uses its predictions to update a shared instance weight distribution in a learning process based on boosting. In our experiments, performed on 5 multi-view microbiome datasets, the model learned by irBoost.SH always outperforms the best model learned from a single view, its closest competitor rBoost.SH, and the model learned by a multi-view approach based on feature concatenation, reaching an improvement of 11.8% of the F1-score in the prediction of the Autism Spectrum disorder and of 114% in the prediction of the Colorectal Cancer disease. Conclusions The proposed method irBoost.SH exhibited outstanding performances in our experiments, also compared to competitor approaches. The obtained results confirm that irBoost.SH can fruitfully be adopted for the analysis of microbiome data, due to its capability to simultaneously exploit multiple feature sets obtained through different sequencing and preprocessing pipelines.
- Published
- 2024
- Full Text
- View/download PDF
18. Adversarial Bandits with Knapsacks.
- Author
-
IMMORLICA, NICOLE, SANKARARAMAN, KARTHIK, SCHAPIRE, ROBERT, and SLIVKINS, ALEKSANDRS
- Subjects
ROBBERS ,PACKING problem (Mathematics) ,KNAPSACK problems ,BACKPACKS ,TIME-based pricing ,TIME perspective - Abstract
We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling.While the priorwork on BwK focused on the stochastic version, we pioneer the other extreme in which the outcomes can be chosen adversarially. This is a considerably harder problem, compared to both the stochastic version and the “classic” adversarial bandits, in that regret minimization is no longer feasible. Instead, the objective is to minimize the competitive ratio: the ratio of the benchmark reward to algorithm’s reward. We design an algorithm with competitive ratio O(logT ) relative to the best fixed distribution over actions, whereT is the time horizon; we also prove a matching lower bound. The key conceptual contribution is a new perspective on the stochastic version of the problem. We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work. We then analyze this algorithm for the adversarial version, and use it as a subroutine to solve the latter. Our algorithm is the first “black-box reduction” frombandits to BwK: it takes an arbitrary bandit algorithm and uses it as a subroutine. We use this reduction to derive several extensions. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Multi-class boosting for the analysis of multiple incomplete views on microbiome data.
- Author
-
Simeon, Andrea, Radovanović, Miloš, Lončar-Turukalo, Tatjana, Ceci, Michelangelo, Brdar, Sanja, and Pio, Gianvito
- Subjects
- *
BOOSTING algorithms , *AUTISM spectrum disorders , *HEBBIAN memory , *COLORECTAL cancer - Abstract
Background: Microbiome dysbiosis has recently been associated with different diseases and disorders. In this context, machine learning (ML) approaches can be useful either to identify new patterns or learn predictive models. However, data to be fed to ML methods can be subject to different sampling, sequencing and preprocessing techniques. Each different choice in the pipeline can lead to a different view (i.e., feature set) of the same individuals, that classical (single-view) ML approaches may fail to simultaneously consider. Moreover, some views may be incomplete, i.e., some individuals may be missing in some views, possibly due to the absence of some measurements or to the fact that some features are not available/applicable for all the individuals. Multi-view learning methods can represent a possible solution to consider multiple feature sets for the same individuals, but most existing multi-view learning methods are limited to binary classification tasks or cannot work with incomplete views. Results: We propose irBoost.SH, an extension of the multi-view boosting algorithm rBoost.SH, based on multi-armed bandits. irBoost.SH solves multi-class classification tasks and can analyze incomplete views. At each iteration, it identifies one winning view using adversarial multi-armed bandits and uses its predictions to update a shared instance weight distribution in a learning process based on boosting. In our experiments, performed on 5 multi-view microbiome datasets, the model learned by irBoost.SH always outperforms the best model learned from a single view, its closest competitor rBoost.SH, and the model learned by a multi-view approach based on feature concatenation, reaching an improvement of 11.8% of the F1-score in the prediction of the Autism Spectrum disorder and of 114% in the prediction of the Colorectal Cancer disease. Conclusions: The proposed method irBoost.SH exhibited outstanding performances in our experiments, also compared to competitor approaches. The obtained results confirm that irBoost.SH can fruitfully be adopted for the analysis of microbiome data, due to its capability to simultaneously exploit multiple feature sets obtained through different sequencing and preprocessing pipelines. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Dynamic Grouping within Minimax Optimal Strategy for Stochastic Multi-ArmedBandits in Reinforcement Learning Recommendation.
- Author
-
Feng, Jiamei, Zhu, Junlong, Zhao, Xuhui, and Ji, Zhihang
- Subjects
REINFORCEMENT learning ,MACHINE learning ,PSYCHOLOGICAL feedback - Abstract
The multi-armed bandit (MAB) problem is a typical problem of exploration and exploitation. As a classical MAB problem, the stochastic multi-armed bandit (SMAB) is the basis of reinforcement learning recommendation. However, most existing SMAB and MAB algorithms have two limitations: (1) they do not make full use of feedback from the environment or agent, such as the number of arms and rewards contained in user feedback; (2) they overlook the utilization of different action selections, which can affect the exploration and exploitation of the algorithm. These limitations motivate us to propose a novel dynamic grouping within the minimax optimal strategy in the stochastic case (DG-MOSS) algorithm for reinforcement learning recommendation for small and medium-sized data scenarios. DG-MOSS does not require additional contextual data and can be used for recommendation of various types of data. Specifically, we designed a new exploration calculation method based on dynamic grouping which uses the feedback information automatically in the selection process and adopts different action selections. During the thorough training of the algorithm, we designed an adaptive episode length to effectively improve the training efficiency. We also analyzed and proved the upper bound of DG-MOSS's regret. Our experimental results for different scales, densities, and field datasets show that DG-MOSS can yield greater rewards than nine baselines with sufficiently trained recommendation and demonstrate that it has better robustness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Multinomial Thompson sampling for rating scales and prior considerations for calibrating uncertainty.
- Author
-
Deliu, Nina
- Subjects
ROBBERS ,ALGORITHMS - Abstract
Bandit algorithms such as Thompson sampling (TS) have been put forth for decades as useful tools for conducting adaptively-randomised experiments. By skewing the allocation toward superior arms, they can substantially improve particular outcomes of interest for both participants and investigators. For example, they may use participants' ratings for continuously optimising their experience with a program. However, most of the bandit and TS variants are based on either binary or continuous outcome models, leading to suboptimal performances in rating scale data. Guided by behavioural experiments we conducted online, we address this problem by introducing Multinomial-TS for rating scales. After assessing its improved empirical performance in unique optimal arm scenarios, we explore potential considerations (including prior's role) for calibrating uncertainty and balancing arm allocation in scenarios with no unique optimal arms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Smart Testing with Vaccination: A Bandit Algorithm for Active Sampling for Managing COVID-19.
- Author
-
Wang, Yingfei, Yahav, Inbal, and Padmanabhan, Balaji
- Subjects
COVID-19 pandemic ,COVID-19 ,CONTACT tracing ,INFECTIOUS disease transmission ,VACCINATION - Abstract
This paper presents methods to proactively choose individuals to test for infection during a pandemic such as COVID-19, characterized by high contagion and presence of asymptomatic carriers. We show that by a smart integration of exploration/exploitation balancing, contact tracing, and location-based sampling, one can effectively mitigate the disease spread and significantly reduce the infection rates and death rates. Under different vaccination policies and under different compliance levels to quarantine order and/or testing requests, our smart testing algorithm can bring down the death rate significantly by 20% to 30%, as well as the percentage of infected drops by approximately 30%. The load on hospitals at peak times, a crucial aspect of managing COVID-19, drops, by 50% when implementing smart testing. We also show how procedural fairness can be incorporated into our method and present results that show that this can be done without hurting the effectiveness of the mitigation that can be achieved. This paper presents methods to choose individuals to test for infection during a pandemic such as COVID-19, characterized by high contagion and presence of asymptomatic carriers. The smart-testing ideas presented here are motivated by active learning and multi-armed bandit techniques in machine learning. Our active sampling method works in conjunction with quarantine policies, can handle different objectives, and is dynamic and adaptive in the sense that it continually adapts to changes in real-time data. The bandit algorithm uses contact tracing, location-based sampling and random sampling in order to select specific individuals to test. Using a data-driven agent-based model simulating New York City we show that the algorithm samples individuals to test in a manner that rapidly traces infected individuals. Experiments also suggest that smart-testing can significantly reduce the death rates as compared with current methods, with or without vaccination. While smart testing strategies can help mitigate disease spread, there could be unintended consequences with fairness or bias when deployed in real-world settings. To this end we show how procedural fairness can be incorporated into our method and present results that show that this can be done without hurting the effectiveness of the mitigation that can be achieved. History: Ahmed Abbasi, Senior Editor; Maytal Saar-Tsechansky, Associate Editor. Funding: W. Yahav is supported by the Jeremy Coller Foundation and the Henry Crown Institute of Business Research in Israel. Supplemental Material: The e-companion is available at https://doi.org/10.1287/isre.2023.1215. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Fast Model Selection and Hyperparameter Tuning for Generative Models.
- Author
-
Chen, Luming and Ghosh, Sujit K.
- Subjects
- *
OPTIMAL stopping (Mathematical statistics) , *BUDGET , *GENERATIVE adversarial networks , *RESOURCE allocation , *PROBABILISTIC generative models - Abstract
Generative models have gained significant attention in recent years. They are increasingly used to estimate the underlying structure of high-dimensional data and artificially generate various kinds of data similar to those from the real world. The performance of generative models depends critically on a good set of hyperparameters. Yet, finding the right hyperparameter configuration can be an extremely time-consuming task. In this paper, we focus on speeding up the hyperparameter search through adaptive resource allocation, early stopping underperforming candidates quickly and allocating more computational resources to promising ones by comparing their intermediate performance. The hyperparameter search is formulated as a non-stochastic best-arm identification problem where resources like iterations or training time constrained by some predetermined budget are allocated to different hyperparameter configurations. A procedure which uses hypothesis testing coupled with Successive Halving is proposed to make the resource allocation and early stopping decisions and compares the intermediate performance of generative models by their exponentially weighted Maximum Means Discrepancy (MMD). The experimental results show that the proposed method selects hyperparameter configurations that lead to a significant improvement in the model performance compared to Successive Halving for a wide range of budgets across several real-world applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed bandits.
- Author
-
Dorn, Yuriy, Kornilov, Nikita, Kutuzov, Nikolay, Nazin, Alexander, Gorbunov, Eduard, and Gasnikov, Alexander
- Subjects
FUTUROLOGISTS - Abstract
The Implicitly Normalized Forecaster (INF) algorithm is considered to be an optimal solution for adversarial multi-armed bandit (MAB) problems. However, most of the existing complexity results for INF rely on restrictive assumptions, such as bounded rewards. Recently, a related algorithm was proposed that works for both adversarial and stochastic heavy-tailed MAB settings. However, this algorithm fails to fully exploit the available data. In this paper, we propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INF-clip) for MAB problems with heavy-tailed reward distributions. We establish convergence results under mild assumptions on the rewards distribution and demonstrate that INF-clip is optimal for linear heavy-tailed stochastic MAB problems and works well for non-linear ones. Furthermore, we show that INF-clip outperforms the best-of-both-worlds algorithm in cases where it is difficult to distinguish between different arms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Image adaptive sampling using reinforcement learning.
- Author
-
Gong, Wenyong and Fan, Xu-Qian
- Abstract
Adaptive sampling and mesh representation of images play an important role in image compression and vectorization. In this paper, a multi-points stochastic gradient multi-armed bandits algorithm, a generalization of the gradient bandit algorithm, is presented to adaptively sample points in images. By modeling the adaptive image sampling as a multi-arm selection decision-making problem, we first propose an efficient action selection strategy based on a parameterized probability distribution, and then define an adaptive reward function according to the restored image of Delaunay triangulation and a feature map function, and the reward function can overcome the sparse reward issue effectively. As a result, the proposed multi-points stochastic gradient multi-armed bandits algorithm is used to evaluate the reward of each action. At last, a prescribed number of sampling points are selected using a simple and effective strategy according to the average reward of each pixel. The quality of reconstructed images based on sampled points is estimated, and experimental results demonstrate the proposed algorithm achieves a better reconstruction accuracy than that of existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Multi-armed bandits with dependent arms.
- Author
-
Singh, Rahul, Liu, Fang, Sun, Yin, and Shroff, Ness
- Subjects
MACHINE learning - Abstract
We study a variant of the multi-armed bandit problem (MABP) which we call as MABs with dependent arms. Multiple arms are grouped together to form a cluster, and the reward distributions of arms in the same cluster are known functions of an unknown parameter that is a characteristic of the cluster. Thus, pulling an arm i not only reveals information about its own reward distribution, but also about all arms belonging to the same cluster. This "correlation" among the arms complicates the exploration–exploitation trade-off that is encountered in the MABP because the observation dependencies allow us to test simultaneously multiple hypotheses regarding the optimality of an arm. We develop learning algorithms based on the principle of optimism in the face of uncertainty (Lattimore and Szepesvári in Bandit algorithms, Cambridge University Press, 2020), which know the clusters, and hence utilize these additional side observations appropriately while performing exploration–exploitation trade-off. We show that the regret of our algorithms grows as O (K log T) , where K is the number of clusters. In contrast, for an algorithm such as the vanilla UCB that does not utilize these dependencies, the regret scales as O (M log T) , where M is the number of arms. When K ≪ M , i.e. there is a lot of dependencies among arms, our proposed algorithm drastically reduces the dependence of regret on the number of arms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Learning to Give Useful Hints: Assistance Action Evaluation and Policy Improvements
- Author
-
Schmucker, Robin, Pachapurkar, Nimish, Bala, Shanmuga, Shah, Miral, Mitchell, Tom, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Viberg, Olga, editor, Jivet, Ioana, editor, Muñoz-Merino, Pedro J., editor, Perifanou, Maria, editor, and Papathoma, Tina, editor
- Published
- 2023
- Full Text
- View/download PDF
28. Case-Based Sample Generation Using Multi-Armed Bandits
- Author
-
Korger, Andreas, Baumeister, Joachim, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Massie, Stewart, editor, and Chakraborti, Sutanu, editor
- Published
- 2023
- Full Text
- View/download PDF
29. MockSAS: Facilitating the Evaluation of Bandit Algorithms in Self-adaptive Systems
- Author
-
Alberts, Elvin, Gerostathopoulos, Ilias, Bures, Tomas, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Batista, Thais, editor, Bureš, Tomáš, editor, Raibulet, Claudia, editor, and Muccini, Henry, editor
- Published
- 2023
- Full Text
- View/download PDF
30. Reinforcement Learning in Education: A Multi-armed Bandit Approach
- Author
-
Combrink, Herkulaas MvE, Marivate, Vukosi, Rosman, Benjamin, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Masinde, Muthoni, editor, and Bagula, Antoine, editor
- Published
- 2023
- Full Text
- View/download PDF
31. The future of AI is the market
- Author
-
Darrow, Ross M. and Vinod, Ben, editor
- Published
- 2023
- Full Text
- View/download PDF
32. Reinforcement Learning for Protocol Synthesis in Resource-Constrained Wireless Sensor and IoT Networks
- Author
-
Dutta, Hrishikesh, Bhuyan, Amit Kumar, Biswas, Subir, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sabir, Essaid, editor, Elbiaze, Halima, editor, Falcone, Francisco, editor, Ajib, Wessam, editor, and Sadik, Mohamed, editor
- Published
- 2023
- Full Text
- View/download PDF
33. Multi-domain Active Learning for Semi-supervised Anomaly Detection
- Author
-
Vercruyssen, Vincent, Perini, Lorenzo, Meert, Wannes, Davis, Jesse, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Amini, Massih-Reza, editor, Canu, Stéphane, editor, Fischer, Asja, editor, Guns, Tias, editor, Kralj Novak, Petra, editor, and Tsoumakas, Grigorios, editor
- Published
- 2023
- Full Text
- View/download PDF
34. On the Complexity of All -Best Arms Identification
- Author
-
al Marjani, Aymen, Kocak, Tomas, Garivier, Aurélien, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Amini, Massih-Reza, editor, Canu, Stéphane, editor, Fischer, Asja, editor, Guns, Tias, editor, Kralj Novak, Petra, editor, and Tsoumakas, Grigorios, editor
- Published
- 2023
- Full Text
- View/download PDF
35. Hypothesis Transfer in Bandits by Weighted Models
- Author
-
Bilaj, Steven, Dhouib, Sofien, Maghsudi, Setareh, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Amini, Massih-Reza, editor, Canu, Stéphane, editor, Fischer, Asja, editor, Guns, Tias, editor, Kralj Novak, Petra, editor, and Tsoumakas, Grigorios, editor
- Published
- 2023
- Full Text
- View/download PDF
36. Adaptive designs for best treatment identification with top‐two Thompson sampling and acceleration.
- Author
-
Wang, Jixian and Tiwari, Ram
- Subjects
- *
MACHINE learning , *DRUG development , *ERROR rates , *SAMPLE size (Statistics) , *CLINICAL trials - Abstract
We consider outcome adaptive phase II or phase II/III trials to identify the best treatment for further development. Different from many other multi‐arm multi‐stage designs, we borrow approaches for the best arm identification in multi‐armed bandit (MAB) approaches developed for machine learning and adapt them for clinical trial purposes. The best arm identification in MAB focuses on the error rate of identification at the end of the trial, but we are also interested in the cumulative benefit of trial patients, for example, the frequency of patients treated with the best treatment. In particular, we consider Top‐Two Thompson Sampling (TTTS) and propose an acceleration approach for better performance in drug development scenarios in which the sample size is much smaller than that considered in machine learning applications. We also propose a variant of TTTS (TTTS2) which is simpler, easier for implementation, and has comparable performance in small sample settings. An extensive simulation study was conducted to evaluate the performance of the proposed approach in multiple typical scenarios in drug development. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. Hedging using reinforcement learning: Contextual k-armed bandit versus Q-learning.
- Author
-
Cannelli, Loris, Nuti, Giuseppe, Sala, Marzio, and Szehr, Oleg
- Subjects
REINFORCEMENT learning ,RECURRENT neural networks ,TRANSACTION costs ,FINANCIAL engineering ,DATA analysis - Abstract
The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention is given to Recurrent Neural Network systems and variations of the Q-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent is trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual k-armed bandit problem, which is motivated by the simplicity and sampleefficiency of the architecture, which allows for realistic online model updates from real-world data. We find that the k-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than Q-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. Personalizing Activity Selection in Assistive Social Robots from Explicit and Implicit User Feedback
- Author
-
Maroto-Gómez, Marcos, Malfaz, María, Castillo, José Carlos, Castro-González, Álvaro, and Salichs, Miguel Ángel
- Published
- 2024
- Full Text
- View/download PDF
39. Information Asymmetries in Data-Driven and Sustainable Operations: Stochastic Models and Adaptive Algorithms for Strategic Agents
- Author
-
Dogan, Ilgin
- Subjects
Operations research ,Computer science ,Sustainability ,data-driven incentives ,data-driven operations ,information asymmetries ,multi-armed bandits ,repeated principal-agent games ,sustainable operations - Abstract
The modern landscape of operations management (OM) has undergone a profound paradigm shift driven by two surging forces: 1) the integration of expansive real-time data inflow, and 2) the recognition of ambiguity in navigating operational disruptions due to climate crisis. In this transition to data-driven and sustainable operations, a fundamental challenge lies in isolating the lack of transparency in collaboration willingness and misaligned economic motives of strategic agents (i.e., stakeholders) in socio-technical systems. Motivated by contributing to this breakthrough, this dissertation establishes a foundational theory that leverages data-driven decision-making to proactively mitigate intricate uncertainties, arising from imperfect model insights and information asymmetries, hindering sustainable OM. The dissertation begins by exploring nonlinear and non-stationary control systems under imperfect knowledge of the reward function and system dynamics—a nontrivial scenario common in applications like balancing occupant comfort and energy efficiency in buildings. Expanding on this rigorous control-theoretic learning analysis, the majority of the dissertation is devoted to devising novel, data-driven, and adaptive incentive frameworks to tackle unexplored information disparities in the context of repeated principal-agent games. Inspired by several real-world applications, such as forest conservation incentives in Payment for Ecosystem Services and renewable energy aggregator contracts for utility grids, this dissertation introduces the “hidden agent rewards” model within a multi-armed bandit framework, where: a principal learns to proactively lead an agent's choices by sequentially offering menus of incentives which contribute to the agent's hidden rewards for a finite set of arms. Designing policies in this setting is challenging, because it entails analyzing dynamic externalities imposed by two separate learning algorithms trained in parallel by strategic parties. To the best of our knowledge, this dissertation presents i) the first generic stochastic sequential model for this widely applicable information imbalance context, and ii) the first methodological framework that contends with the principal’s trade-off between consistently learning the agent’s rewards and maximizing their own rewards through adaptive incentives. We examine two scenarios: one where the agent has perfect knowledge of their reward model and another where the agent learns their model over time, potentially leading to misleading choices for the principal. In both cases, solid statistical consistency and regret guarantees are proven to persist without restricting the agent’s algorithm or reward distributions. Throughout the dissertation, these theoretical results, along with versatile practical insights, outline a prosperous future research landscape to enhance various incentive practices in OM confronting the hidden objectives of incentivized agents.
- Published
- 2024
40. Applications of Bayesian recommender systems in online environments
- Author
-
Mckenzie, Jack and House, Thomas
- Subjects
Matrix factorisation ,Collaborative topic models ,Content based filtering ,Bayesian graphical models ,Bayesian Inference ,News recommendation ,Collaborative filtering ,Sequential Learning ,Online learning ,Variational Bayes ,Multi-armed bandits ,Variational Inference ,Bayesian ,Recommender systems - Abstract
Recommender systems play a vital role in how any well functioning website presents content to users. With the amount of content growing day by day, the ability to accurately hone in on the specific interests of any given user has taken on an ever increasing significance. In this thesis we investigate the application of a handful of techniques that are aimed at improving how content is presented to users in online environments. Two major real-world use cases are discussed: the recommendation of adverts on AutoTrader and the recommendation of news articles on the BBC website. Throughout the thesis emphasis is placed on the use and development of algorithms which scale to these real world use cases. The structure is a follows: Chapters 2 and 3 set out the groundwork for the major advances in this area and ensure that the knowledge required for topics in later chapters are available for the reader; these are predominantly focused on the multi-armed bandit problem and the variation inference algorithm. Chapter 4 describes the development of a new hybrid algorithm, FAB-COST, which seeks to leverage two moment matching variational inference algorithms, namely Expectation Propagation and Assumed Density Filtering, to build a procedure which is both accurate and able to handle steaming datasets. Its superior performance is demonstrated by benchmarking it against another state of the art contextual bandit algorithm using real user click data from the Auto Trader website. Chapters 5 and 6 demonstrate how recommendations can be improved at BBC News via Collaborative Topic Models. The fact that users have a strong preference towards recently published news, as well as the data pipeline the BBC have in place, means that cold-start presents a serious problem when making news recommendations. By incorporating contextual information present in the articles into the recommender system, articles that have no user histories associated with them can be recommended to users -- a task that traditional collaborative filtering algorithms are unable to fulfil. We give an in-depth mathematical and empirical analysis of the algorithms, as well as a detailed account of how they have been and can be used in a real world setting. We provide codes which demonstrate that the Collaborative Topic Model provides superior recommendations qualitatively and quantitatively, and importantly, works with the current infrastructure the BBC have in place.
- Published
- 2021
41. Adversarial thresholding semi-bandits
- Author
-
Bower, Craig, Anjum, Ashiq, Bagdasar, Ovidiu, and Xue, Yong
- Subjects
Multi-armed bandits ,Adversarial bandits ,Thresholding bandits ,Multi-agent bandits ,Online machine learning ,High-energy physics detector control - Abstract
The classical multi-armed bandit is one of the most common examples of sequential decision-making, either by trading-off between exploiting and exploring arms to maximise some payoff or purely exploring arms until the optimal arm is identified. In particular, a bandit player wanting to only pull arms with stochastic feedback exceeding a given threshold, has been studied extensively in a pure exploration context. However, numerous applications fail to be expressed, where a player wishes to balance the need to observe regions of an uncertain environment that are currently interesting (exploit) and checking if neglected regions have become interesting since last observed (explore). We introduce the adversarial thresholding semi-bandit problem: a non-stochastic bandit model, where a player wants to only pull (potentially several) arms with feedback meeting some threshold condition. Our main objective is to design algorithms that meet the requirements of the adversarial thresholding semi-bandit problem theoretically, empirically and algorithmically, for a given application. In other words, we want to develop a machine that learns to select options according to some threshold condition and adapts quickly if the feedback from selecting an option unexpectedly changes. This work has many real-world applications and is motivated by online detector control monitoring in high-energy physics experiments, on the Large Hadron Collider. We begin by describing the adversarial thresholding semi-bandit problem (ATSBP) in terms of a multi-armed bandit with multiple plays and extending the stochastic thresholding bandit problem to the adversarial setting. The adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (T-Exp3.M) and an algorithm combining label efficient prediction (LET-Exp3.M), are introduced that satisfy theoretical and computational Research specifications, but either perform poorly or fail completely under certain threshold conditions. To meet empirical performance requirements, we propose the dynamic label efficient adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (dLET-Exp3.M). Whilst computational requirements match those for T-Exp3.M, theoretical upper bounds on performance are proven to be worse. We also introduce an ATSBP algorithm (AliceBandit) that decomposes the action of pulling an arm into selection and observation decisions. Computational complexity and empirical performance under two different threshold conditions are significantly improved, compared with exponentially weighted adversarial thresholding semi-bandits. Theoretical upper bounds on performance are also significantly improved, for certain environments. In the latter part of this thesis, we address the challenge of efficiently monitoring multiple condition parameters in high-energy experimental physics. Due to the extreme conditions experienced in heavy-ion particle colliders, the power supply to any device exceeding safe operating parameters is automatically shut down or tripped, to preserve integrity and functionality of the device. Prior to recent upgrades, a device or channel trip would halt data-taking for the entire experiment. Post-trip recovery requires a costly procedure both in terms of expertise and data-taking time. After the completion of the current upgrading phase (scheduled for 2021), the detector will collect data continuously. In this new regime, a channel trip will result in only the affected components of the experiment being shut down. However, since the new upgraded experiment will enable data-taking to increase by a factor of 100, each trip will have a significant impact on the experiments ability to provide physicists with reliable data to analyse. We demonstrate that adversarial thresholding semi-bandits efficiently identify device channels either exceeding a fixed threshold or deviating by more than a prescribed range prior to a trip, extending the state-of-the-art in high-energy physics detector control.
- Published
- 2020
42. Dynamic Grouping within Minimax Optimal Strategy for Stochastic Multi-ArmedBandits in Reinforcement Learning Recommendation
- Author
-
Jiamei Feng, Junlong Zhu, Xuhui Zhao, and Zhihang Ji
- Subjects
dynamic grouping ,multi-armed bandits ,exploration and exploitation ,reinforcement learning ,recommendation ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
The multi-armed bandit (MAB) problem is a typical problem of exploration and exploitation. As a classical MAB problem, the stochastic multi-armed bandit (SMAB) is the basis of reinforcement learning recommendation. However, most existing SMAB and MAB algorithms have two limitations: (1) they do not make full use of feedback from the environment or agent, such as the number of arms and rewards contained in user feedback; (2) they overlook the utilization of different action selections, which can affect the exploration and exploitation of the algorithm. These limitations motivate us to propose a novel dynamic grouping within the minimax optimal strategy in the stochastic case (DG-MOSS) algorithm for reinforcement learning recommendation for small and medium-sized data scenarios. DG-MOSS does not require additional contextual data and can be used for recommendation of various types of data. Specifically, we designed a new exploration calculation method based on dynamic grouping which uses the feedback information automatically in the selection process and adopts different action selections. During the thorough training of the algorithm, we designed an adaptive episode length to effectively improve the training efficiency. We also analyzed and proved the upper bound of DG-MOSS’s regret. Our experimental results for different scales, densities, and field datasets show that DG-MOSS can yield greater rewards than nine baselines with sufficiently trained recommendation and demonstrate that it has better robustness.
- Published
- 2024
- Full Text
- View/download PDF
43. Hedging using reinforcement learning: Contextual k-armed bandit versus Q-learning
- Author
-
Loris Cannelli, Giuseppe Nuti, Marzio Sala, and Oleg Szehr
- Subjects
Hedging ,Reinforcement Learning ,Q-Learning ,Multi-Armed Bandits ,Electronic computers. Computer science ,QA75.5-76.95 ,Finance ,HG1-9999 - Abstract
The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention is given to Recurrent Neural Network systems and variations of the Q-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent is trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual k-armed bandit problem, which is motivated by the simplicity and sample-efficiency of the architecture, which allows for realistic online model updates from real-world data. We find that the k-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than Q-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.
- Published
- 2023
- Full Text
- View/download PDF
44. Cutting to the chase with warm-start contextual bandits.
- Author
-
Oetomo, Bastian, Perera, R. Malinga, Borovica-Gajic, Renata, and Rubinstein, Benjamin I. P.
- Subjects
ROBBERS ,CONTEXTUAL learning ,SUPERVISED learning ,DATABASES ,KNOWLEDGE transfer ,PRIOR learning - Abstract
Multi-armed bandits achieve excellent long-term performance in practice and sublinear cumulative regret in theory. However, a real-world limitation of bandit learning is poor performance in early rounds due to the need for exploration—a phenomenon known as the cold-start problem. While this limitation may be necessary in the general classical stochastic setting, in practice where "pre-training" data or knowledge is available, it is natural to attempt to "warm-start" bandit learners. This paper provides a theoretical treatment of warm-start contextual bandit learning, adopting Linear Thompson Sampling as a principled framework for flexibly transferring domain knowledge as might be captured by bandit learning in a prior related task, a supervised pre-trained Bayesian posterior, or domain expert knowledge. Under standard conditions, we prove a general regret bound. We then apply our warm-start algorithmic technique to other common bandit learners—the ϵ -greedy and upper-confidence bound contextual learners. An upper regret bound is then provided for LinUCB. Our suite of warm-start learners are evaluated in experiments with both artificial and real-world datasets, including a motivating task of tuning a commercial database. A comprehensive range of experimental results are presented, highlighting the effect of different hyperparameters and quantities of pre-training data. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. Block pruning residual networks using Multi-Armed Bandits.
- Author
-
Benatia, Mohamed Akrem, Amara, Yacine, Boulahia, Said Yacine, and Hocini, Abdelouahab
- Abstract
Recently, deep neural networks have been adopted in many application domains showing impressive performances that sometimes even exceed those of humans. However, the considerable size of deep learning models and their computational cost limits their use in mobile and embedded systems. Pruning is an important technique that has been used in the literature to compress a trained model by removing some of its elements (weights, neurons, filters, etc.) that do not notably contribute to its performance. In this paper, we propose a new block pruning method for residual networks that removes holes layers of the network based on the Multi-Armed Bandits (MAB) framework. The proposed method takes advantage of the existence of multiple trainable paths in residual models to identify and prune the less important residual blocks for inference. To evaluate our approach, we conducted extensive experiments using three different MAB algorithms, namely $\varepsilon $ε-greedy, Win Stay Lose Shift (WSLS), and the Upper Confidence Bound (UCB). The obtained results on different datasets show that our block pruning method can considerably reduce the model size and FLOPs with little to no impact on overall accuracy. We conducted a comparative analysis of our method with three existing techniques to demonstrate that our approach is highly competitive in terms of accuracy, while significantly reducing both model size and FLOPs. Moreover, our experimental results demonstrate that when our block pruning method is combined with a filter-level pruning technique, it leads to even greater reductions in model size and FLOPs while maintaining a relatively high accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. $\mathsf{HyHooVer}$: Verification and Parameter Synthesis in Stochastic Systems With Hybrid State Space Using Optimistic Optimization
- Author
-
Negin Musavi, Dawei Sun, Sayan Mitra, Geir E. Dullerud, and Sanjay Shakkottai
- Subjects
Autonomous systems ,black-box optimization ,monte-carlo tree search ,multi-armed bandits ,safety verification ,Control engineering systems. Automatic machinery (General) ,TJ212-225 ,Technology - Abstract
This article presents a new method for model-free verification of a general class of control systems with unknown nonlinear dynamics, where the state space has both a continuum-based and a discrete component. Specifically, we focus on finding what choices of initial states or parameters maximize a given probabilistic objective function over all choices of initial states or parameters from such hybrid state space, without having exact knowledge of the system dynamics. We introduce the notion of set initialized Markov chains to represent such systems. Our method utilizes generalized techniques from multi-armed bandit theory on the continuum, in an attempt to make an efficient use of the available sampling budget. We introduce a new algorithm called the Hybrid Hierarchical Optimistic Optimization (HyHOO) algorithm, which is designed to address the problem outlined in this paper. The algorithm combines elements of the existing Hierarchical Optimistic Optimization (HOO) bandit algorithm with carefully chosen parameters to create a fresh perspective on the problem. By viewing the problem as a multi-armed bandit problem, we are able to provide theoretical regret bounds on sample efficiency of our tool, $\mathsf{HyHooVer}$. This is achieved by making assumptions about the smoothness of the underlying system. The results of experiments in formal verification and parameter synthesis of variety of scenarios, indicate that the proposed method is effective and efficient when applied to realistic-sized problems and it performs well compared to other methods, specifically PlasmaLab, BoTorch, and the baseline HOO algorithm. Specifically, it demonstrates better efficiency when employed on models with large state space and when the objective function has sharp slopes in comparison with other tools.
- Published
- 2023
- Full Text
- View/download PDF
47. Time is Budget: A Heuristic for Reducing the Risk of Ruin in Multi-armed Gambler Bandits
- Author
-
Perotto, Filipo Studzinski, Pucel, Xavier, Farges, Jean-Loup, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bramer, Max, editor, and Stahl, Frederic, editor
- Published
- 2022
- Full Text
- View/download PDF
48. Foundations of Ranking & Selection for Simulation Optimization
- Author
-
Nelson, Barry L., Botev, Zdravko, editor, Keller, Alexander, editor, Lemieux, Christiane, editor, and Tuffin, Bruno, editor
- Published
- 2022
- Full Text
- View/download PDF
49. Network Defense Resource Allocation Scheme with Multi-armed Bandits
- Author
-
Huang, Ning, Feng, Xue-cai, Zhang, Rui, Yang, Xiu-gui, Xia, Hui, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wang, Lei, editor, Segal, Michael, editor, Chen, Jenhui, editor, and Qiu, Tie, editor
- Published
- 2022
- Full Text
- View/download PDF
50. Encounters with Martingales in Statistics and Stochastic Optimization
- Author
-
Lai, Tze Leung, Mazliak, Laurent, editor, and Shafer, Glenn, editor
- Published
- 2022
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.