Author: "Meyn, Sean P." / Topic: convergence - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Meyn, Sean P."' showing total 6 results

Start Over Author "Meyn, Sean P." Topic convergence

6 results on '"Meyn, Sean P."'

1. Q-Learning With Uniformly Bounded Variance.

Author: Devraj, Adithya M. and Meyn, Sean P.
Subjects: *FUNCTIONS of bounded variation, *STOCHASTIC control theory
Abstract: Sample complexity bounds are a common performance metric in the reinforcement learning literature. In the discounted cost, infinite horizon setting, all of the known bounds can be arbitrarily large, as the discount factor approaches unity. These results seem to imply that a very large number of samples is required to achieve an epsilon-optimal policy. The objective of the present work is to introduce a new class of algorithms that have sample complexity uniformly bounded over all discount factors. One may argue that this is impossible, due to a recent min–max lower bound. The explanation is that these prior bounds concern value function approximation and not policy approximation. We show that the asymptotic covariance of the tabular Q-learning algorithm with an optimized step-size sequence is a quadratic function of a factor that goes to infinity, as discount factor approaches 1; an essentially known result. The new relative Q-learning algorithm proposed here is shown to have asymptotic covariance that is uniformly bounded over all discount factors. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

2. Differential Temporal Difference Learning.

Author: Devraj, Adithya M., Kontoyiannis, Ioannis, and Meyn, Sean P.
Subjects: MACHINE learning, MARKOV processes, STOCHASTIC control theory, CENTRAL limit theorem, KEY performance indicators (Management), REINFORCEMENT learning
Abstract: Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning. Computation of the solution to the associated Bellman equations is challenging in most practical cases of interest. A popular class of approximation techniques, known as temporal difference (TD) learning algorithms, are an important subclass of general reinforcement learning methods. The algorithms introduced in this article are intended to resolve two well-known issues with TD-learning algorithms. Their slow convergence due to very high central limit theorem variance, and the fact that, for the problem of computing the relative value function, consistent algorithms exist only in special cases. First we show that the gradients of these value functions admit a representation that lends itself to algorithm design. Based on this result, a new class of differential TD-learning algorithms is introduced. For Markovian models on Euclidean space with smooth dynamics, the algorithms are shown to be consistent under general conditions. Numerical results show dramatic variance reduction in comparison to standard methods. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

3. Distributed Computation of Equilibria in Misspecified Convex Stochastic Nash Games.

Author: Jiang, Hao, Shanbhag, Uday V., and Meyn, Sean P.
Subjects: NASH equilibrium, STOCHASTIC models, CONVEX functions, GAME theory, PARAMETER estimation
Abstract: The distributed computation of Nash equilibria is assuming growing relevance in engineering where such problems emerge in the context of distributed control. Accordingly, we present schemes for computing equilibria of two classes of static stochastic convex games complicated by a parametric misspecification, a natural concern in the control of large-scale networked engineered system. In both schemes, players learn the equilibrium strategy while resolving the misspecification: 1) Monotone stochastic Nash games: We present a set of coupled stochastic approximation schemes distributed across agents in which the first scheme updates each agent's strategy via a projected (stochastic) gradient step, whereas the second scheme updates every agent's belief regarding its misspecified parameter using an independently specified learning problem. We proceed to show that the produced sequences converge in an almost sure sense to the true equilibrium strategy and the true parameter, respectively. Surprisingly, convergence in the equilibrium strategy achieves the optimal rate of convergence in a mean-squared sense with a quantifiable degradation in the rate constant; 2) Stochastic Nash–Cournot games with unobservable aggregate output: We refine 1) to a Cournot setting where we assume that the tuple of strategies is unobservable while payoff functions and strategy sets are public knowledge through a common knowledge assumption. By utilizing observations of noise-corrupted prices, iterative fixed-point schemes are developed, allowing for simultaneously learning the equilibrium strategies and the misspecified parameter in an almost sure sense. [ABSTRACT FROM PUBLISHER]
Published: 2018
Full Text: View/download PDF

4. Learning equilibria in constrained Nash-Cournot games with misspecified demand functions.

Author: Jiang, Hao, Shanbhag, Uday V., and Meyn, Sean P.
Abstract: We consider a constrained Nash-Cournot oligopoly where the demand function is linear. While cost functions and capacities are public information, firms only have partial information regarding the demand function. Specifically, firms either know the intercept or the slope of the demand function and cannot observe aggregate output. We consider a learning process in which firms update their profit-maximizing quantities and their beliefs regarding the unknown demand function parameters, based on disparities between observed and estimated prices. A characterization of the mappings, corresponding to the fixed point of the learning process, is provided. This result paves the way for developing a Tikhonov regularization scheme that is shown to learn the correct equilibrium, in spite of the multiplicity of equilibria. Despite the absence of monotonicity of the gradient maps, we prove the convergence of constant and diminishing steplength distributed gradient schemes under a suitable caveat on the starting points. Notably, precise rate of convergence estimates are provided for the constant steplength schemes. [ABSTRACT FROM PUBLISHER]
Published: 2011
Full Text: View/download PDF

5. A recursive learning algorithm for model reduction of Hidden Markov Models.

Author: Deng, Kun, Mehta, Prashant G., Meyn, Sean P., and Vidyasagar, Mathukumalli
Abstract: This paper is concerned with a recursive learning algorithm for model reduction of Hidden Markov Models (HMMs) with finite state space and finite observation space. The state space is aggregated/partitioned to reduce the complexity of the HMM. The optimal aggregation is obtained by minimizing the Kullback-Leibler divergence rate between the laws of the observation process. The optimal aggregated HMM is given as a function of the partition function of the state space. The optimal partition is obtained by using a recursive stochastic approximation learning algorithm, which can be implemented through a single sample path of the HMM. Convergence of the algorithm is established using ergodicity of the filtering process and standard stochastic approximation arguments. [ABSTRACT FROM PUBLISHER]
Published: 2011
Full Text: View/download PDF

6. Universal and Composite Hypothesis Testing via Mismatched Divergence.

Author: Unnikrishnan, Jayakrishnan, Huang, Dayu, Meyn, Sean P., Surana, Amit, and Veeravalli, Venugopal V.
Subjects: STATISTICAL hypothesis testing, APPROXIMATION theory, ENTROPY (Information theory), STOCHASTIC convergence, SOURCE code, ROBUST control, MATHEMATICAL sequences, MATHEMATICAL functions
Abstract: For the universal hypothesis testing problem, where the goal is to decide between the known null hypothesis distribution and some other unknown distribution, Hoeffding proposed a universal test in the nineteen sixties. Hoeffding's universal test statistic can be written in terms of Kullback–Leibler (K-L) divergence between the empirical distribution of the observations and the null hypothesis distribution. In this paper a modification of Hoeffding's test is considered based on a relaxation of the K-L divergence, referred to as the mismatched divergence. The resulting mismatched test is shown to be a generalized likelihood-ratio test (GLRT) for the case where the alternate distribution lies in a parametric family of distributions characterized by a finite-dimensional parameter, i.e., it is a solution to the corresponding composite hypothesis testing problem. For certain choices of the alternate distribution, it is shown that both the Hoeffding test and the mismatched test have the same asymptotic performance in terms of error exponents. A consequence of this result is that the GLRT is optimal in differentiating a particular distribution from others in an exponential family. It is also shown that the mismatched test has a significant advantage over the Hoeffding test in terms of finite sample size performance for applications involving large alphabet distributions. This advantage is due to the difference in the asymptotic variances of the two test statistics under the null hypothesis. [ABSTRACT FROM PUBLISHER]
Published: 2011
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Meyn, Sean P."'

1. Q-Learning With Uniformly Bounded Variance.

2. Differential Temporal Difference Learning.

3. Distributed Computation of Equilibria in Misspecified Convex Stochastic Nash Games.

4. Learning equilibria in constrained Nash-Cournot games with misspecified demand functions.

5. A recursive learning algorithm for model reduction of Hidden Markov Models.

6. Universal and Composite Hypothesis Testing via Mismatched Divergence.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

6 results on '"Meyn, Sean P."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources