Author: "Vakili, Sattar" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Vakili, Sattar"' showing total 115 results

Start Over Author "Vakili, Sattar"

115 results on '"Vakili, Sattar"'

1. Kernel-Based Function Approximation for Average Reward Reinforcement Learning: An Optimist No-Regret Algorithm

Author: Vakili, Sattar and Olkhovskaya, Julia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Reinforcement learning utilizing kernel ridge regression to predict the expected value function represents a powerful method with great representational capacity. This setting is a highly versatile framework amenable to analytical results. We consider kernel-based function approximation for RL in the infinite horizon average reward setting, also referred to as the undiscounted setting. We propose an optimistic algorithm, similar to acquisition function based algorithms in the special case of bandits. We establish novel no-regret performance guarantees for our algorithm, under kernel-based modelling assumptions. Additionally, we derive a novel confidence interval for the kernel-based prediction of the expected value function, applicable across various RL problems., Comment: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
Published: 2024

2. Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning

Author: Vakili, Sattar
Subjects: Computer Science - Machine Learning
Abstract: Reinforcement Learning (RL) has shown great empirical success in various application domains. The theoretical aspects of the problem have been extensively studied over past decades, particularly under tabular and linear Markov Decision Process structures. Recently, non-linear function approximation using kernel-based prediction has gained traction. This approach is particularly interesting as it naturally extends the linear structure, and helps explain the behavior of neural-network-based models at their infinite width limit. The analytical results however do not adequately address the performance guarantees for this case. We will highlight this open problem, overview existing partial results, and discuss related challenges., Comment: Open problem track. Conference on Learning Theory (COLT 2024)
Published: 2024

3. Optimal Regret Bounds for Collaborative Learning in Bandits

Author: Shidani, Amitis and Vakili, Sattar
Subjects: Computer Science - Machine Learning, Computer Science - Multiagent Systems, Statistics - Machine Learning
Abstract: We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed., Comment: Algorithmic Learning Theory (ALT) 2024
Published: 2023

4. Robust Best-arm Identification in Linear Bandits

Author: Wang, Wei, Vakili, Sattar, and Bogunovic, Ilija
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We study the robust best-arm identification problem (RBAI) in the case of linear rewards. The primary objective is to identify a near-optimal robust arm, which involves selecting arms at every round and assessing their robustness by exploring potential adversarial actions. This approach is particularly relevant when utilizing a simulator and seeking to identify a robust solution for real-world transfer. To this end, we present an instance-dependent lower bound for the robust best-arm identification problem with linear rewards. Furthermore, we propose both static and adaptive bandit algorithms that achieve sample complexity that matches the lower bound. In synthetic experiments, our algorithms effectively identify the best robust arm and perform similarly to the oracle strategy. As an application, we examine diabetes care and the process of learning insulin dose recommendations that are robust with respect to inaccuracies in standard calculators. Our algorithms prove to be effective in identifying robust dosage values across various age ranges of patients.
Published: 2023

5. Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency

Author: Salgia, Sudeep, Vakili, Sattar, and Zhao, Qing
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We consider Bayesian optimization using Gaussian Process models, also referred to as kernel-based bandit optimization. We study the methodology of exploring the domain using random samples drawn from a distribution. We show that this random exploration approach achieves the optimal error rates. Our analysis is based on novel concentration bounds in an infinite dimensional Hilbert space established in this work, which may be of independent interest. We further develop an algorithm based on random exploration with domain shrinking and establish its order-optimal regret guarantees under both noise-free and noisy settings. In the noise-free setting, our analysis closes the existing gap in regret performance and thereby resolves a COLT open problem. The proposed algorithm also enjoys a computational advantage over prevailing methods due to the random exploration that obviates the expensive optimization of a non-convex acquisition function for choosing the query points at each iteration.
Published: 2023

6. Adversarial Contextual Bandits Go Kernelized

Author: Neu, Gergely, Olkhovskaya, Julia, and Vakili, Sattar
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios. We propose a computationally efficient algorithm that makes use of a new optimistically biased estimator for the loss functions and achieves near-optimal regret guarantees under a variety of eigenvalue decay assumptions made on the underlying kernel. Specifically, under the assumption of polynomial eigendecay with exponent $c>1$, the regret is $\widetilde{O}(KT^{\frac{1}{2}(1+\frac{1}{c})})$, where $T$ denotes the number of rounds and $K$ the number of actions. Furthermore, when the eigendecay follows an exponential pattern, we achieve an even tighter regret bound of $\widetilde{O}(\sqrt{T})$. These rates match the lower bounds in all special cases where lower bounds are known at all, and match the best known upper bounds available for the more well-studied stochastic counterpart of our problem.
Published: 2023

7. Generative Diffusion Models for Radio Wireless Channel Modelling and Sampling

Author: Sengupta, Ushnish, Jao, Chinkuo, Bernacchia, Alberto, Vakili, Sattar, and Shiu, Da-shan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science, Computer Science - Networking and Internet Architecture, Statistics - Machine Learning
Abstract: Channel modelling is essential to designing modern wireless communication systems. The increasing complexity of channel modelling and the cost of collecting high-quality wireless channel data have become major challenges. In this paper, we propose a diffusion model based channel sampling approach for rapidly synthesizing channel realizations from limited data. We use a diffusion model with a U Net based architecture operating in the frequency space domain. To evaluate how well the proposed model reproduces the true distribution of channels in the training dataset, two evaluation metrics are used: $i)$ the approximate $2$-Wasserstein distance between real and generated distributions of the normalized power spectrum in the antenna and frequency domains and $ii)$ precision and recall metric for distributions. We show that, compared to existing GAN based approaches which suffer from mode collapse and unstable training, our diffusion based approach trains stably and generates diverse and high-fidelity samples from the true channel distribution. We also show that we can pretrain the model on a simulated urban macro-cellular channel dataset and fine-tune it on a smaller, out-of-distribution urban micro-cellular dataset, therefore showing that it is feasible to model real world channels using limited data with this approach., Comment: 2023 IEEE Global Communications Conference
Published: 2023

8. Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Author: Vakili, Sattar and Olkhovskaya, Julia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose $\pi$-KRVI, an optimistic modification of least-squares value iteration, when the state-action value function is represented by a reproducing kernel Hilbert space (RKHS). We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Mat\'ern kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds. We show a sublinear regret bound that is order optimal in the case of Mat\'ern kernels where a lower bound on regret is known., Comment: Advances in Neural Information Processing Systems (NeurIPS), 2023. In the previous version, we utilized Lemma C.1 from Yang et al., 2020a to bound the RKHS norm of the kernel ridge predictor. In the current version, this is proven in Lemma 5
Published: 2023

9. Image generation with shortest path diffusion

Author: Das, Ayan, Fotiadis, Stathi, Batra, Anil, Nabiei, Farhang, Liao, FengTing, Vakili, Sattar, Shiu, Da-shan, and Bernacchia, Alberto
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The field of image generation has made significant progress thanks to the introduction of Diffusion Models, which learn to progressively reverse a given image corruption. Recently, a few studies introduced alternative ways of corrupting images in Diffusion Models, with an emphasis on blurring. However, these studies are purely empirical and it remains unclear what is the optimal procedure for corrupting an image. In this work, we hypothesize that the optimal procedure minimizes the length of the path taken when corrupting an image towards a given final state. We propose the Fisher metric for the path length, measured in the space of probability distributions. We compute the shortest path according to this metric, and we show that it corresponds to a combination of image sharpening, rather than blurring, and noise deblurring. While the corruption was chosen arbitrarily in previous work, our Shortest Path Diffusion (SPD) determines uniquely the entire spatiotemporal structure of the corruption. We show that SPD improves on strong baselines without any hyperparameter tuning, and outperforms all previous Diffusion Models based on image blurring. Furthermore, any small deviation from the shortest path leads to worse performance, suggesting that SPD provides the optimal procedure to corrupt images. Our work sheds new light on observations made in recent works and provides a new approach to improve diffusion models on images and other types of data., Comment: AD and SF contributed equally
Published: 2023

10. Trieste: Efficiently Exploring The Depths of Black-box Functions with TensorFlow

Author: Picheny, Victor, Berkeley, Joel, Moss, Henry B., Stojic, Hrvoje, Granta, Uri, Ober, Sebastian W., Artemev, Artem, Ghani, Khurram, Goodall, Alexander, Paleyes, Andrei, Vakili, Sattar, Pascual-Diaz, Sergio, Markou, Stratis, Qing, Jixiang, Loka, Nasrulloh R. B. S, and Couckuyt, Ivo
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We present Trieste, an open-source Python package for Bayesian optimization and active learning benefiting from the scalability and efficiency of TensorFlow. Our library enables the plug-and-play of popular TensorFlow-based models within sequential decision-making loops, e.g. Gaussian processes from GPflow or GPflux, or neural networks from Keras. This modular mindset is central to the package and extends to our acquisition functions and the internal dynamics of the decision-making loop, both of which can be tailored and extended by researchers or engineers when tackling custom use cases. Trieste is a research-friendly and production-ready toolkit backed by a comprehensive test suite, extensive documentation, and available at https://github.com/secondmind-labs/trieste.
Published: 2023

11. Delayed Feedback in Kernel Bandits

Author: Vakili, Sattar, Ahmed, Danyal, Bernacchia, Alberto, and Pike-Burke, Ciara
Subjects: Statistics - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations. The existing work predominantly assumes feedback is immediately available; an assumption which fails in many real world situations, including recommendation systems, clinical trials and hyperparameter tuning. We consider a kernel bandit problem under stochastically delayed feedback, and propose an algorithm with $\tilde{\mathcal{O}}(\sqrt{\Gamma_k(T)T}+\mathbb{E}[\tau])$ regret, where $T$ is the number of time steps, $\Gamma_k(T)$ is the maximum information gain of the kernel with $T$ observations, and $\tau$ is the delay random variable. This represents a significant improvement over the state of the art regret bound of $\tilde{\mathcal{O}}(\Gamma_k(T)\sqrt{T}+\mathbb{E}[\tau]\Gamma_k(T))$ reported in Verma et al. (2022). In particular, for very non-smooth kernels, the information gain grows almost linearly in time, trivializing the existing results. We also validate our theoretical results with simulations.
Published: 2023

12. Sample Complexity of Kernel-Based Q-Learning

Author: Yeh, Sing-Yuan, Chang, Fu-Chieh, Yueh, Chang-Wei, Wu, Pei-Yuan, Bernacchia, Alberto, and Vakili, Sattar
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Modern reinforcement learning (RL) often faces an enormous state-action space. Existing analytical results are typically for settings with a small number of state-actions, or simple models such as linearly modeled Q-functions. To derive statistically efficient RL policies handling large state-action spaces, with more general Q-functions, some recent works have considered nonlinear function approximation using kernel ridge regression. In this work, we derive sample complexities for kernel based Q-learning when a generative model exists. We propose a nonparametric Q-learning algorithm which finds an $\epsilon$-optimal policy in an arbitrarily large scale discounted MDP. The sample complexity of the proposed algorithm is order optimal with respect to $\epsilon$ and the complexity of the kernel (in terms of its information gain). To the best of our knowledge, this is the first result showing a finite sample complexity under such a general model.
Published: 2023

13. Collaborative Learning in Kernel-based Bandits for Distributed Users

Author: Salgia, Sudeep, Vakili, Sattar, and Zhao, Qing
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We study collaborative learning among distributed clients facilitated by a central server. Each client is interested in maximizing a personalized objective function that is a weighted sum of its local objective and a global objective. Each client has direct access to random bandit feedback on its local objective, but only has a partial view of the global objective and relies on information exchange with other clients for collaborative learning. We adopt the kernel-based bandit framework where the objective functions belong to a reproducing kernel Hilbert space. We propose an algorithm based on surrogate Gaussian process (GP) models and establish its order-optimal regret performance (up to polylogarithmic factors). We also show that the sparse approximations of the GP models can be employed to reduce the communication overhead across clients.
Published: 2022

14. Near-Optimal Collaborative Learning in Bandits

Author: Réda, Clémence, Vakili, Sattar, and Kaufmann, Emilie
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: This paper introduces a general multi-agent bandit model in which each agent is facing a finite set of arms and may communicate with other agents through a central controller in order to identify, in pure exploration, or play, in regret minimization, its optimal arm. The twist is that the optimal arm for each agent is the arm with largest expected mixed reward, where the mixed reward of an arm is a weighted sum of the rewards of this arm for all agents. This makes communication between agents often necessary. This general setting allows to recover and extend several recent models for collaborative bandit learning, including the recently proposed federated learning with personalization (Shi et al., 2021). In this paper, we provide new lower bounds on the sample complexity of pure exploration and on the regret. We then propose a near-optimal algorithm for pure exploration. This algorithm is based on phased elimination with two novel ingredients: a data-dependent sampling scheme within each phase, aimed at matching a relaxation of the lower bound.
Published: 2022

15. Provably and Practically Efficient Neural Contextual Bandits

Author: Salgia, Sudeep, Vakili, Sattar, and Zhao, Qing
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We consider the neural contextual bandit problem. In contrast to the existing work which primarily focuses on ReLU neural nets, we consider a general set of smooth activation functions. Under this more general setting, (i) we derive non-asymptotic error bounds on the difference between an overparameterized neural net and its corresponding neural tangent kernel, (ii) we propose an algorithm with a provably sublinear regret bound that is also efficient in the finite regime as demonstrated by empirical studies. The non-asymptotic error bounds may be of broader interest as a tool to establish the relation between the smoothness of the activation functions in neural contextual bandits and the smoothness of the kernels in kernel bandits.
Published: 2022

16. Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

Author: Vakili, Sattar, Scarlett, Jonathan, Shiu, Da-shan, and Bernacchia, Alberto
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Kernel-based models such as kernel ridge regression and Gaussian processes are ubiquitous in machine learning applications for regression and optimization. It is well known that a major downside for kernel-based models is the high computational cost; given a dataset of $n$ samples, the cost grows as $\mathcal{O}(n^3)$. Existing sparse approximation methods can yield a significant reduction in the computational cost, effectively reducing the actual cost down to as low as $\mathcal{O}(n)$ in certain cases. Despite this remarkable empirical success, significant gaps remain in the existing results for the analytical bounds on the error due to approximation. In this work, we provide novel confidence intervals for the Nystr\"om method and the sparse variational Gaussian process approximation method, which we establish using novel interpretations of the approximate (surrogate) posterior variance of the models. Our confidence intervals lead to improved performance bounds in both regression and optimization problems., Comment: International Conference on Machine Learning (ICML) 2022
Published: 2022

17. Open Problem: Tight Online Confidence Intervals for RKHS Elements

Author: Vakili, Sattar, Scarlett, Jonathan, and Javidi, Tara
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Statistics Theory
Abstract: Confidence intervals are a crucial building block in the analysis of various online learning problems. The analysis of kernel based bandit and reinforcement learning problems utilize confidence intervals applicable to the elements of a reproducing kernel Hilbert space (RKHS). However, the existing confidence bounds do not appear to be tight, resulting in suboptimal regret bounds. In fact, the existing regret bounds for several kernelized bandit algorithms (e.g., GP-UCB, GP-TS, and their variants) may fail to even be sublinear. It is unclear whether the suboptimal regret bound is a fundamental shortcoming of these algorithms or an artifact of the proof, and the main challenge seems to stem from the online (sequential) nature of the observation points. We formalize the question of online confidence intervals in the RKHS setting and overview the existing results.
Published: 2021

18. Uniform Generalization Bounds for Overparameterized Neural Networks

Author: Vakili, Sattar, Bromberg, Michael, Garcia, Jezabel, Shiu, Da-shan, and Bernacchia, Alberto
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: An interesting observation in artificial neural networks is their favorable generalization error despite typically being extremely overparameterized. It is well known that the classical statistical learning methods often result in vacuous generalization errors in the case of overparameterized neural networks. Adopting the recently developed Neural Tangent (NT) kernel theory, we prove uniform generalization bounds for overparameterized neural networks in kernel regimes, when the true data generating model belongs to the reproducing kernel Hilbert space (RKHS) corresponding to the NT kernel. Importantly, our bounds capture the exact error rates depending on the differentiability of the activation functions. In order to establish these bounds, we propose the information gain of the NT kernel as a measure of complexity of the learning problem. Our analysis uses a Mercer decomposition of the NT kernel in the basis of spherical harmonics and the decay rate of the corresponding eigenvalues. As a byproduct of our results, we show the equivalence between the RKHS corresponding to the NT kernel and its counterpart corresponding to the Mat\'ern family of kernels, showing the NT kernels induce a very general class of models. We further discuss the implications of our analysis for some recent results on the regret bounds for reinforcement learning and bandit algorithms, which use overparameterized neural networks.
Published: 2021

19. Optimal Order Simple Regret for Gaussian Process Bandits

Author: Vakili, Sattar, Bouziani, Nacime, Jalali, Sepehr, Bernacchia, Alberto, and Shiu, Da-shan
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function $f$. The problem can be cast as a Gaussian Process (GP) bandit where $f$ lives in a reproducing kernel Hilbert space (RKHS). The state of the art analysis of several learning algorithms shows a significant gap between the lower and upper bounds on the simple regret performance. When $N$ is the number of exploration trials and $\gamma_N$ is the maximal information gain, we prove an $\tilde{\mathcal{O}}(\sqrt{\gamma_N/N})$ bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds. We show that this bound is order optimal up to logarithmic factors for the cases where a lower bound on regret is known. To establish these results, we prove novel and sharp confidence intervals for GP models applicable to RKHS elements which may be of broader interest.
Published: 2021

20. A Domain-Shrinking based Bayesian Optimization Algorithm with Order-Optimal Regret Performance

Author: Salgia, Sudeep, Vakili, Sattar, and Zhao, Qing
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We consider sequential optimization of an unknown function in a reproducing kernel Hilbert space. We propose a Gaussian process-based algorithm and establish its order-optimal regret performance (up to a poly-logarithmic factor). This is the first GP-based algorithm with an order-optimal regret guarantee. The proposed algorithm is rooted in the methodology of domain shrinking realized through a sequence of tree-based region pruning and refining to concentrate queries in increasingly smaller high-performing regions of the function domain. The search for high-performing regions is localized and guided by an iterative estimation of the optimal function value to ensure both learning efficiency and computational efficiency. Compared with the prevailing GP-UCB family of algorithms, the proposed algorithm reduces computational complexity by a factor of $O(T^{2d-1})$ (where $T$ is the time horizon and $d$ the dimension of the function domain)., Comment: Accepted to NeurIPS 2021
Published: 2020

21. On Information Gain and Regret Bounds in Gaussian Process Bandits

Author: Vakili, Sattar, Khezeli, Kia, and Picheny, Victor
Subjects: Statistics - Machine Learning, Computer Science - Information Theory, Computer Science - Machine Learning
Abstract: Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function $f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain $\gamma_T$ between $T$ observations and the underlying GP (surrogate) model. We provide general bounds on $\gamma_T$ based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $\gamma_T$, and subsequently the regret bounds relying on $\gamma_T$ under numerous settings. For the Mat\'ern family of kernels, where the lower bounds on $\gamma_T$, and regret under the frequentist setting, are known, our results close a huge polynomial in $T$ gap between the upper and lower bounds (up to logarithmic in $T$ factors).
Published: 2020

22. Scalable Thompson Sampling using Sparse Gaussian Process Models

Author: Vakili, Sattar, Moss, Henry, Artemev, Artem, Dutordoir, Vincent, and Picheny, Victor
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Thompson Sampling (TS) from Gaussian Process (GP) models is a powerful tool for the optimization of black-box functions. Although TS enjoys strong theoretical guarantees and convincing empirical performance, it incurs a large computational overhead that scales polynomially with the optimization budget. Recently, scalable TS methods based on sparse GP models have been proposed to increase the scope of TS, enabling its application to problems that are sufficiently multi-modal, noisy or combinatorial to require more than a few hundred evaluations to be solved. However, the approximation error introduced by sparse GPs invalidates all existing regret bounds. In this work, we perform a theoretical and empirical analysis of scalable TS. We provide theoretical guarantees and show that the drastic reduction in computational complexity of scalable TS can be enjoyed without loss in the regret performance over the standard TS. These conceptual claims are validated for practical implementations of scalable TS on synthetic benchmarks and as part of a real-world high-throughput molecular design task.
Published: 2020

23. Stochastic Coordinate Minimization with Progressive Precision for Stochastic Convex Optimization

Author: Salgia, Sudeep, Zhao, Qing, and Vakili, Sattar
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: A framework based on iterative coordinate minimization (CM) is developed for stochastic convex optimization. Given that exact coordinate minimization is impossible due to the unknown stochastic nature of the objective function, the crux of the proposed optimization algorithm is an optimal control of the minimization precision in each iteration. We establish the optimal precision control and the resulting order-optimal regret performance for strongly convex and separably nonsmooth functions. An interesting finding is that the optimal progression of precision across iterations is independent of the low-dimensional CM routine employed, suggesting a general framework for extending low-dimensional optimization routines to high-dimensional problems. The proposed algorithm is amenable to online implementation and inherits the scalability and parallelizability properties of CM for large-scale optimization. Requiring only a sublinear order of message exchanges, it also lends itself well to distributed computing as compared with the alternative approach of coordinate gradient descent.
Published: 2020

24. Amortized variance reduction for doubly stochastic objectives

Author: Boustati, Ayman, Vakili, Sattar, Hensman, James, and John, ST
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Statistics - Methodology
Abstract: Approximate inference in complex probabilistic models such as deep Gaussian processes requires the optimisation of doubly stochastic objective functions. These objectives incorporate randomness both from mini-batch subsampling of the data and from Monte Carlo estimation of expectations. If the gradient variance is high, the stochastic optimisation problem becomes difficult with a slow rate of convergence. Control variates can be used to reduce the variance, but past approaches do not take into account how mini-batch stochasticity affects sampling stochasticity, resulting in sub-optimal variance reduction. We propose a new approach in which we use a recognition network to cheaply approximate the optimal control variate for each mini-batch, with no additional model gradient computations. We illustrate the properties of this proposal and test its performance on logistic regression and deep Gaussian processes.
Published: 2020

25. Regret Bounds for Noise-Free Kernel-Based Bandits

Author: Vakili, Sattar
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Kernel-based bandit is an extensively studied black-box optimization problem, in which the objective function is assumed to live in a known reproducing kernel Hilbert space. While nearly optimal regret bounds (up to logarithmic factors) are established in the noisy setting, surprisingly, less is known about the noise-free setting (when the exact values of the underlying function is accessible without observation noise). We discuss several upper bounds on regret; none of which seem order optimal, and provide a conjecture on the order optimal regret bound., Comment: Conference on Learning Theory (COLT) 2022
Published: 2020

26. Ordinal Bayesian Optimisation

Author: Picheny, Victor, Vakili, Sattar, and Artemev, Artem
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: Bayesian optimisation is a powerful tool to solve expensive black-box problems, but fails when the stationary assumption made on the objective function is strongly violated, which is the case in particular for ill-conditioned or discontinuous objectives. We tackle this problem by proposing a new Bayesian optimisation framework that only considers the ordering of variables, both in the input and output spaces, to fit a Gaussian process in a latent space. By doing so, our approach is agnostic to the original metrics on the original spaces. We propose two algorithms, respectively based on an optimistic strategy and on Thompson sampling. For the optimistic strategy we prove an optimal performance under the measure of regret in the latent space. We illustrate the capability of our framework on several challenging toy problems.
Published: 2019

27. Adaptive Sensor Placement for Continuous Spaces

Author: Grant, James A, Boukouvalas, Alexis, Griffiths, Ryan-Rhys, Leslie, David S, Vakili, Sattar, and de Cote, Enrique Munoz
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We consider the problem of adaptively placing sensors along an interval to detect stochastically-generated events. We present a new formulation of the problem as a continuum-armed bandit problem with feedback in the form of partial observations of realisations of an inhomogeneous Poisson process. We design a solution method by combining Thompson sampling with nonparametric inference via increasingly granular Bayesian histograms and derive an $\tilde{O}(T^{2/3})$ bound on the Bayesian regret in $T$ rounds. This is coupled with the design of an efficent optimisation approach to select actions in polynomial time. In simulations we demonstrate our approach to have substantially lower and less variable regret than competitor algorithms., Comment: 13 pages, accepted to ICML 2019
Published: 2019

28. Stochastic Gradient Descent on a Tree: an Adaptive and Robust Approach to Stochastic Convex Optimization

Author: Vakili, Sattar, Salgia, Sudeep, and Zhao, Qing
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: Online minimization of an unknown convex function over the interval $[0,1]$ is considered under first-order stochastic bandit feedback, which returns a random realization of the gradient of the function at each query point. Without knowing the distribution of the random gradients, a learning algorithm sequentially chooses query points with the objective of minimizing regret defined as the expected cumulative loss of the function values at the query points in excess to the minimum value of the function. An approach based on devising a biased random walk on an infinite-depth binary tree constructed through successive partitioning of the domain of the function is developed. Each move of the random walk is guided by a sequential test based on confidence bounds on the empirical mean constructed using the law of the iterated logarithm. With no tuning parameters, this learning algorithm is robust to heavy-tailed noise with infinite variance and adaptive to unknown function characteristics (specifically, convex, strongly convex, and nonsmooth). It achieves the corresponding optimal regret orders (up to a $\sqrt{\log T}$ or a $\log\log T$ factor) in each class of functions and offers better or matching regret orders than the classical stochastic gradient descent approach which requires the knowledge of the function characteristics for tuning the sequence of step-sizes.
Published: 2019

29. Decision Variance in Online Learning

Author: Vakili, Sattar, Boukouvalas, Alexis, and Zhao, Qing
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Online learning has traditionally focused on the expected rewards. In this paper, a risk-averse online learning problem under the performance measure of the mean-variance of the rewards is studied. Both the bandit and full information settings are considered. The performance of several existing policies is analyzed, and new fundamental limitations on risk-averse learning is established. In particular, it is shown that although a logarithmic distribution-dependent regret in time $T$ is achievable (similar to the risk-neutral problem), the worst-case (i.e. minimax) regret is lower bounded by $\Omega(T)$ (in contrast to the $\Omega(\sqrt{T})$ lower bound in the risk-neutral problem). This sharp difference from the risk-neutral counterpart is caused by the the variance in the player's decisions, which, while absent in the regret under the expected reward criterion, contributes to excess mean-variance due to the non-linearity of this risk measure. The role of the decision variance in regret performance reflects a risk-averse player's desire for robust decisions and outcomes.
Published: 2018

30. Multi-Armed Bandits on Partially Revealed Unit Interval Graphs

Author: Xu, Xiao, Vakili, Sattar, Zhao, Qing, and Swami, Ananthram
Subjects: Computer Science - Machine Learning
Abstract: A stochastic multi-armed bandit problem with side information on the similarity and dissimilarity across different arms is considered. The action space of the problem can be represented by a unit interval graph (UIG) where each node represents an arm and the presence (absence) of an edge between two nodes indicates similarity (dissimilarity) between their mean rewards. Two settings of complete and partial side information based on whether the UIG is fully revealed are studied and a general two-step learning structure consisting of an offline reduction of the action space and online aggregation of reward observations from similar arms is proposed to fully exploit the topological structure of the side information. In both cases, the computation efficiency and the order optimality of the proposed learning policies in terms of both the size of the action space and the time length are established., Comment: Parts of the work have been presented at the 36th IEEE Military Communication Conference (MILCOM), October, 2017 and the 52nd Asilomar Conference on Signals, Systems and Computers, October, 2018
Published: 2018

31. Anomaly Detection in Hierarchical Data Streams under Unknown Models

Author: Vakili, Sattar, Zhao, Qing, Liu, Chang, and Chuah, Chen-Nee
Subjects: Computer Science - Learning
Abstract: We consider the problem of detecting a few targets among a large number of hierarchical data streams. The data streams are modeled as random processes with unknown and potentially heavy-tailed distributions. The objective is an active inference strategy that determines, sequentially, which data stream to collect samples from in order to minimize the sample complexity under a reliability constraint. We propose an active inference strategy that induces a biased random walk on the tree-structured hierarchy based on confidence bounds of sample statistics. We then establish its order optimality in terms of both the size of the search space (i.e., the number of data streams) and the reliability requirement. The results find applications in hierarchical heavy hitter detection, noisy group testing, and adaptive sampling for active learning, classification, and stochastic root finding.
Published: 2017

32. Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

Author: Vakili, Sattar and Zhao, Qing
Subjects: Computer Science - Learning
Abstract: The multi-armed bandit problems have been studied mainly under the measure of expected total reward accrued over a horizon of length $T$. In this paper, we address the issue of risk in multi-armed bandit problems and develop parallel results under the measure of mean-variance, a commonly adopted risk measure in economics and mathematical finance. We show that the model-specific regret and the model-independent regret in terms of the mean-variance of the reward process are lower bounded by $\Omega(\log T)$ and $\Omega(T^{2/3})$, respectively. We then show that variations of the UCB policy and the DSEE policy developed for the classic risk-neutral MAB achieve these lower bounds.
Published: 2016
Full Text: View/download PDF

33. Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

Author: Vakili, Sattar, Liu, Keqin, and Zhao, Qing
Subjects: Mathematics - Optimization and Control, Computer Science - Learning, Computer Science - Systems and Control, Mathematics - Probability, Mathematics - Statistics Theory, 62L05, 93E35 (Primary), 60G40 (Secondary)
Abstract: In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that for all light-tailed reward distributions, DSEE achieves the optimal logarithmic order of the regret, where regret is defined as the total expected reward loss against the ideal case with known reward models. For heavy-tailed reward distributions, DSEE achieves O(T^1/p) regret when the moments of the reward distributions exist up to the pth order for 12. With the knowledge of an upperbound on a finite moment of the heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret order. The proposed DSEE approach complements existing work on MAB by providing corresponding results for general reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including MAB with various objectives, decentralized MAB with multiple players and incomplete reward observations under collisions, MAB with unknown Markov dynamics, and combinatorial MAB with dependent arms that often arise in network optimization problems such as the shortest path, the minimum spanning, and the dominating set problems under unknown random weights., Comment: 22 pages, 2 figures
Published: 2011

34. Information Gain and Uniform Generalization Bounds for Neural Kernel Models

Author: Vakili, Sattar, primary, Bromberg, Michael, additional, Garcia, Jezabel, additional, Shiu, Da-Shan, additional, and Bernacchia, Alberto, additional
Published: 2023
Full Text: View/download PDF

35. Collaborative Learning in Kernel-Based Bandits for Distributed Users

Author: Salgia, Sudeep, primary, Vakili, Sattar, additional, and Zhao, Qing, additional
Published: 2023
Full Text: View/download PDF

36. Gambler Bandits and the Regret of Being Ruined (accepted paper)

Author: Studzinski Perotto, Filipo, Vakili, Sattar, Gajane, Pratik, Faghan, Yaser, Bourgais, Mathieu, Systèmes Multi-Agents Coopératifs (IRIT-SMAC), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées, Princeton University, Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Ecole Centrale de Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Ecole Centrale de Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Université de Lisbonne, Centre National de la Recherche Scientifique (CNRS), and International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Subjects: [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA]
Abstract: à paraître; International audience
Published: 2021

37. Multi-Armed Bandits on Partially Revealed Unit Interval Graphs

Author: Xu, Xiao, primary, Vakili, Sattar, additional, Zhao, Qing, additional, and Swami, Ananthram, additional
Published: 2020
Full Text: View/download PDF

38. Adaptive sensor placement for continuous spaces

Author: Grant, James, Boukouvalas, Alexis, Griffiths, Ryan-Rhys, Leslie, David, Vakili, Sattar, Munoz de Cote, Enrique, Grant, James, Boukouvalas, Alexis, Griffiths, Ryan-Rhys, Leslie, David, Vakili, Sattar, and Munoz de Cote, Enrique
Abstract: We consider the problem of adaptively placing sensors along an interval to detect stochasticallygenerated events. We present a new formulation of the problem as a continuum-armed bandit problem with feedback in the form of partial observations of realisations of an inhomogeneous Poisson process. We design a solution method by combining Thompson sampling with nonparametric inference via increasingly granular Bayesian histograms and derive an O˜(T2/3) bound on the Bayesian regret in T rounds. This is coupled with the design of an efficent optimisation approach to select actions in polynomial time. In simulations we demonstrate our approach to have substantially lower and less variable regret than competitor algorithms.
Published: 2019

39. Decision Variance in Risk-Averse Online Learning

Author: Vakili, Sattar, primary, Boukouvalas, Alexis, additional, and Zhao, Qing, additional
Published: 2019
Full Text: View/download PDF

40. Stochastic Gradient Descent on a Tree: an Adaptive and Robust Approach to Stochastic Convex Optimization

Author: Vakili, Sattar, primary, Salgia, Sudeep, additional, and Zhao, Qing, additional
Published: 2019
Full Text: View/download PDF

41. A Random Walk Approach to First-Order Stochastic Convex Optimization

Author: Vakili, Sattar, primary and Zhao, Qing, additional
Published: 2019
Full Text: View/download PDF

42. SEQUENTIAL METHODS FOR LEARNING AND INFERENCE UNDER UNKNOWN MODELS

Author: Vakili, Sattar
Published: 2017
Full Text: View/download PDF

43. Hierarchical Heavy Hitter Detection Under Unknown Models

Author: Vakili, Sattar, primary, Zhao, Qing, additional, Liu, Chang, additional, and Chuah, Chen-Nee, additional
Published: 2018
Full Text: View/download PDF

44. Online learning with side information

Author: Xu, Xiao, primary, Vakili, Sattar, additional, Zhao, Qing, additional, and Swami, Ananthram, additional
Published: 2017
Full Text: View/download PDF