Start Over

Solving Two-Person Zero-Sum Stochastic Games With Incomplete Information Using Learning Automata With Artificial Barriers

Authors :: Daniel Silvestre
B. John Oommen
Anis Yazidi
Source :: IEEE Transactions on Neural Networks and Learning Systems
Publication Year :: 2023
Publisher :: Institute of Electrical and Electronics Engineers (IEEE), 2023.
Abstract: Learning automata (LA) with artificially absorbing barriers was a completely new horizon of research in the 1980s (Oommen, 1986). These new machines yielded properties that were previously unknown. More recently, absorbing barriers have been introduced in continuous estimator algorithms so that the proofs could follow a martingale property, as opposed to monotonicity (Zhang et al., 2014), (Zhang et al., 2015). However, the applications of LA with artificial barriers are almost nonexistent. In that regard, this article is pioneering in that it provides effective and accurate solutions to an extremely complex application domain, namely that of solving two-person zero-sum stochastic games that are provided with incomplete information. LA have been previously used (Sastry et al., 1994) to design algorithms capable of converging to the game's Nash equilibrium under limited information. Those algorithms have focused on the case where the saddle point of the game exists in a pure strategy. However, the majority of the LA algorithms used for games are absorbing in the probability simplex space, and thus, they converge to an exclusive choice of a single action. These LA are thus unable to converge to other mixed Nash equilibria when the game possesses no saddle point for a pure strategy. The pioneering contribution of this article is that we propose an LA solution that is able to converge to an optimal mixed Nash equilibrium even though there may be no saddle point when a pure strategy is invoked. The scheme, being of the linear reward-inaction ( $L_{R-I}$ ) paradigm, is in and of itself, absorbing. However, by incorporating artificial barriers, we prevent it from being ``stuck'' or getting absorbed in pure strategies. Unlike the linear reward-εpenalty ( $L_{R-ε P}$ ) scheme proposed by Lakshmivarahan and Narendra almost four decades ago, our new scheme achieves the same goal with much less parameter tuning and in a more elegant manner. This article includes the nontrial proofs of the theoretical results characterizing our scheme and also contains experimental verification that confirms our theoretical findings.

Subjects :: Learning automata
Computer Networks and Communications
Computer science
VDP::Technology: 500::Information and communication technology: 550
Monotonic function
Mathematical proof
Martingale (betting system)
Computer Science Applications
symbols.namesake
Strategy
Artificial Intelligence
Complete information
Nash equilibrium
Saddle point
symbols
Applied mathematics
Software

Details

ISSN :: 21622388 and 2162237X
Volume :: 34
Database :: OpenAIRE
Journal :: IEEE Transactions on Neural Networks and Learning Systems
Accession number :: edsair.doi.dedup.....5b85ed61d852a1d41826d74b2effc9d9

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Solving Two-Person Zero-Sum Stochastic Games With Incomplete Information Using Learning Automata With Artificial Barriers

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Solving Two-Person Zero-Sum Stochastic Games With Incomplete Information Using Learning Automata With Artificial Barriers

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources