Author: "Piliouras, Georgios" / Publication Type: Electronic Resources - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Piliouras, Georgios"' showing total 121 results

Start Over Author "Piliouras, Georgios" Publication Type Electronic Resources

121 results on '"Piliouras, Georgios"'

1. On the Stability of Learning in Network Games with Many Players

Author: Hussain, Aamal, Leonte, Dan, Belardinelli, Francesco, Piliouras, Georgios, Hussain, Aamal, Leonte, Dan, Belardinelli, Francesco, and Piliouras, Georgios
Abstract: Multi-agent learning algorithms have been shown to display complex, unstable behaviours in a wide array of games. In fact, previous works indicate that convergent behaviours are less likely to occur as the total number of agents increases. This seemingly prohibits convergence to stable strategies, such as Nash Equilibria, in games with many players. To make progress towards addressing this challenge we study the Q-Learning Dynamics, a classical model for exploration and exploitation in multi-agent learning. In particular, we study the behaviour of Q-Learning on games where interactions between agents are constrained by a network. We determine a number of sufficient conditions, depending on the game and network structure, which guarantee that agent strategies converge to a unique stable strategy, called the Quantal Response Equilibrium (QRE). Crucially, these sufficient conditions are independent of the total number of agents, allowing for provable convergence in arbitrarily large games. Next, we compare the learned QRE to the underlying NE of the game, by showing that any QRE is an $\epsilon$-approximate Nash Equilibrium. We first provide tight bounds on $\epsilon$ and show how these bounds lead naturally to a centralised scheme for choosing exploration rates, which enables independent learners to learn stable approximate Nash Equilibrium strategies. We validate the method through experiments and demonstrate its effectiveness even in the presence of numerous agents and actions. Through these results, we show that independent learning dynamics may converge to approximate Nash Equilibria, even in the presence of many agents., Comment: AAMAS 2024. arXiv admin note: text overlap with arXiv:2307.13922
Published: 2024

2. Visualizing 2x2 Normal-Form Games: twoxtwogame LaTeX Package

Author: Marris, Luke, Gemp, Ian, Liu, Siqi, Leibo, Joel Z., Piliouras, Georgios, Marris, Luke, Gemp, Ian, Liu, Siqi, Leibo, Joel Z., and Piliouras, Georgios
Abstract: Normal-form games with two players, each with two strategies, are the most studied class of games. These so-called 2x2 games are used to model a variety of strategic interactions. They appear in game theory, economics, and artificial intelligence research. However, there lacks tools for describing and visualizing such games. This work introduces a LaTeX package for visualizing 2x2 games. This work has two goals: first, to provide high-quality tools and vector graphic visualizations, suitable for scientific publications. And second, to help promote standardization of names and representations of 2x2 games. The LaTeX package, twoxtwogame, is maintained on GitHub and mirrored on CTAN, and is available under a permissive Apache 2 license.
Published: 2024

3. On the Redistribution of Maximal Extractable Value: A Dynamic Mechanism

Author: Braga, Pedro, Chionas, Georgios, Leonardos, Stefanos, Krysta, Piotr, Piliouras, Georgios, Ventre, Carmine, Braga, Pedro, Chionas, Georgios, Leonardos, Stefanos, Krysta, Piotr, Piliouras, Georgios, and Ventre, Carmine
Abstract: Maximal Extractable Value (MEV) has emerged as a new frontier in the design of blockchain systems. The marriage between decentralization and finance gives the power to block producers (a.k.a., miners) not only to select and add transactions to the blockchain but, crucially, also to order them so as to extract as much financial gain as possible for themselves. Whilst this price may be unavoidable for the service provided by block producers, users of the chain may in the long run prefer to use less predatory systems. In this paper, we propose to make the MEV extraction rate part of the protocol design space. Our aim is to leverage this parameter to maintain a healthy balance between miners (who need to be compensated) and users (who need to feel encouraged to transact). Inspired by the principles introduced by EIP-1559 for transaction fees, we design a dynamic mechanism which updates the MEV extraction rate with the goal of stabilizing it at a target value. We analyse the evolution of this dynamic mechanism under various market conditions and provide formal guarantees about its long-term performance. Our results show that even when the system behavior is provably chaotic, the dynamics guarantee long-term liveness (survival) and robustness of the system. The main takeaway from our work is that the proposed system exhibits desirable behavior (near-optimal performance) even when it operates in out of equilibrium conditions that are often met in practice. Our work establishes, the first to our knowledge, dynamic framework for the integral problem of MEV sharing between extractors and users., Comment: Extended abstract in the 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2024)
Published: 2024

4. NfgTransformer: Equivariant Representation Learning for Normal-form Games

Author: Liu, Siqi, Marris, Luke, Piliouras, Georgios, Gemp, Ian, Heess, Nicolas, Liu, Siqi, Marris, Luke, Piliouras, Georgios, Gemp, Ian, and Heess, Nicolas
Abstract: Normal-form games (NFGs) are the fundamental model of strategic interaction. We study their representation using neural networks. We describe the inherent equivariance of NFGs -- any permutation of strategies describes an equivalent game -- as well as the challenges this poses for representation learning. We then propose the NfgTransformer architecture that leverages this equivariance, leading to state-of-the-art performance in a range of game-theoretic tasks including equilibrium-solving, deviation gain estimation and ranking, with a common approach to NFG representation. We show that the resulting model is interpretable and versatile, paving the way towards deep learning systems capable of game-theoretic reasoning when interacting with humans and with each other., Comment: Published at ICLR 2024. Open-sourced at https://github.com/google-deepmind/nfg_transformer
Published: 2024

5. Approximating the Core via Iterative Coalition Sampling

Author: Gemp, Ian, Lanctot, Marc, Marris, Luke, Mao, Yiran, Duéñez-Guzmán, Edgar, Perrin, Sarah, Gyorgy, Andras, Elie, Romuald, Piliouras, Georgios, Kaisers, Michael, Hennes, Daniel, Bullard, Kalesha, Larson, Kate, Bachrach, Yoram, Gemp, Ian, Lanctot, Marc, Marris, Luke, Mao, Yiran, Duéñez-Guzmán, Edgar, Perrin, Sarah, Gyorgy, Andras, Elie, Romuald, Piliouras, Georgios, Kaisers, Michael, Hennes, Daniel, Bullard, Kalesha, Larson, Kate, and Bachrach, Yoram
Abstract: The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, and to fully embrace cooperative game theory contributions in domains such as explainable AI (XAI), where the core can complement the Shapley values to identify influential features or instances supporting predictions by black-box models. We propose novel iterative algorithms for computing variants of the core, which avoid the computational bottleneck of many other approaches; namely solving large linear programs. As such, they scale better to very large problems as we demonstrate across different classes of cooperative games, including weighted voting games, induced subgraph games, and marginal contribution networks. We also explore our algorithms in the context of XAI, providing further evidence of the power of the core for such applications., Comment: Published in AAMAS 2024
Published: 2024

6. States as Strings as Strategies: Steering Language Models with Game-Theoretic Solvers

Author: Gemp, Ian, Bachrach, Yoram, Lanctot, Marc, Patel, Roma, Dasagi, Vibhavari, Marris, Luke, Piliouras, Georgios, Liu, Siqi, Tuyls, Karl, Gemp, Ian, Bachrach, Yoram, Lanctot, Marc, Patel, Roma, Dasagi, Vibhavari, Marris, Luke, Piliouras, Georgios, Liu, Siqi, and Tuyls, Karl
Abstract: Game theory is the study of mathematical models of strategic interactions among rational agents. Language is a key medium of interaction for humans, though it has historically proven difficult to model dialogue and its strategic motivations mathematically. A suitable model of the players, strategies, and payoffs associated with linguistic interactions (i.e., a binding to the conventional symbolic logic of game theory) would enable existing game-theoretic algorithms to provide strategic solutions in the space of language. In other words, a binding could provide a route to computing stable, rational conversational strategies in dialogue. Large language models (LLMs) have arguably reached a point where their generative capabilities can enable realistic, human-like simulations of natural dialogue. By prompting them in various ways, we can steer their responses towards different output utterances. Leveraging the expressivity of natural language, LLMs can also help us quickly generate new dialogue scenarios, which are grounded in real world applications. In this work, we present one possible binding from dialogue to game theory as well as generalizations of existing equilibrium finding algorithms to this setting. In addition, by exploiting LLMs generation capabilities along with our proposed binding, we can synthesize a large repository of formally-defined games in which one can study and test game-theoretic solution concepts. We also demonstrate how one can combine LLM-driven game generation, game-theoretic solvers, and imitation learning to construct a process for improving the strategic capabilities of LLMs., Comment: 32 pages, 8 figures, code available @ https://github.com/google-deepmind/open_spiel/blob/master/open_spiel/python/games/chat_game.py
Published: 2024

7. Neural Population Learning beyond Symmetric Zero-sum Games

Author: Liu, Siqi, Marris, Luke, Lanctot, Marc, Piliouras, Georgios, Leibo, Joel Z., Heess, Nicolas, Liu, Siqi, Marris, Luke, Lanctot, Marc, Piliouras, Georgios, Leibo, Joel Z., and Heess, Nicolas
Abstract: We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game. We show empirical convergence in a suite of OpenSpiel games, validated rigorously by exact game solvers. We then deploy NeuPL-JPSRO to complex domains, where our approach enables adaptive coordination in a MuJoCo control domain and skill transfer in capture-the-flag. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality, paving the way towards solving real-world games between heterogeneous players with mixed motives.
Published: 2024

8. Steering game dynamics towards desired outcomes

Author: Canyakmaz, Ilayda, Sakos, Iosif, Lin, Wayne, Varvitsiotis, Antonios, Piliouras, Georgios, Canyakmaz, Ilayda, Sakos, Iosif, Lin, Wayne, Varvitsiotis, Antonios, and Piliouras, Georgios
Abstract: The dynamic behavior of agents in games, which captures how their strategies evolve over time based on past interactions, can lead to a spectrum of undesirable behaviors, ranging from non-convergence to Nash equilibria to the emergence of limit cycles and chaos. To mitigate the effects of selfish behavior, central planners can use dynamic payments to guide strategic multi-agent systems toward stability and socially optimal outcomes. However, the effectiveness of such interventions critically relies on accurately predicting agents' responses to incentives and dynamically adjusting payments so that the system is guided towards the desired outcomes. These challenges are further amplified in real-time applications where the dynamics are unknown and only scarce data is available. To tackle this challenge, in this work we introduce the SIAR-MPC method, combining the recently introduced Side Information Assisted Regression (SIAR) method for system identification with Model Predictive Control (MPC). SIAR utilizes side-information constraints inherent to game theoretic applications to model agent responses to payments from scarce data, while MPC uses this model to facilitate dynamic payment adjustments. Our experiments demonstrate the efficiency of SIAR-MPC in guiding the system towards socially optimal equilibria, stabilizing chaotic behaviors, and avoiding specified regions of the state space. Comparative analyses in data-scarce settings show SIAR-MPC's superior performance over pairing MPC with Physics Informed Neural Networks (PINNs), a powerful system identification method that finds models satisfying specific constraints.
Published: 2024

9. Non-chaotic limit sets in multi-agent learning

Author: Czechowski, A.T. (author), Piliouras, Georgios (author), Czechowski, A.T. (author), and Piliouras, Georgios (author)
Abstract: Non-convergence is an inherent aspect of adaptive multi-agent systems, and even basic learning models, such as the replicator dynamics, are not guaranteed to equilibriate. Limit cycles, and even more complicated chaotic sets are in fact possible even in rather simple games, including variants of the Rock-Paper-Scissors game. A key challenge of multi-agent learning theory lies in characterization of these limit sets, based on qualitative features of the underlying game. Although chaotic behavior in learning dynamics can be precluded by the celebrated Poincaré–Bendixson theorem, it is only applicable directly to low-dimensional settings. In this work, we attempt to find other characteristics of a game that can force regularity in the limit sets of learning. We show that behavior consistent with the Poincaré–Bendixson theorem (limit cycles, but no chaotic attractor) follows purely from the topological structure of interactions, even for high-dimensional settings with an arbitrary number of players, and arbitrary payoff matrices. We prove our result for a wide class of follow-the-regularized leader (FoReL) dynamics, which generalize replicator dynamics, for binary games characterized interaction graphs where the payoffs of each player are only affected by one other player (i.e., interaction graphs of indegree one). Moreover, for cyclic games we provide further insight into the planar structure of limit sets, and in particular limit cycles. We propose simple conditions under which learning comes with efficiency guarantees, implying that FoReL learning achieves time-averaged sum of payoffs at least as good as that of a Nash equilibrium, thereby connecting the topology of the dynamics to social-welfare analysis., Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public., Interactive Intelligence
Published: 2023
Full Text: View/download PDF

10. Generative Adversarial Equilibrium Solvers

Author: Goktas, Denizalp, Parkes, David C., Gemp, Ian, Marris, Luke, Piliouras, Georgios, Elie, Romuald, Lever, Guy, Tacchetti, Andrea, Goktas, Denizalp, Parkes, David C., Gemp, Ian, Marris, Luke, Piliouras, Georgios, Elie, Romuald, Lever, Guy, and Tacchetti, Andrea
Abstract: We introduce the use of generative adversarial learning to compute equilibria in general game-theoretic settings, specifically the generalized Nash equilibrium (GNE) in pseudo-games, and its specific instantiation as the competitive equilibrium (CE) in Arrow-Debreu competitive economies. Pseudo-games are a generalization of games in which players' actions affect not only the payoffs of other players but also their feasible action spaces. Although the computation of GNE and CE is intractable in the worst-case, i.e., PPAD-hard, in practice, many applications only require solutions with high accuracy in expectation over a distribution of problem instances. We introduce Generative Adversarial Equilibrium Solvers (GAES): a family of generative adversarial neural networks that can learn GNE and CE from only a sample of problem instances. We provide computational and sample complexity bounds, and apply the framework to finding Nash equilibria in normal-form games, CE in Arrow-Debreu competitive economies, and GNE in an environmental economic model of the Kyoto mechanism., Comment: 41 pages, 13 figures
Published: 2023

11. Quantum Potential Games, Replicator Dynamics, and the Separability Problem

Author: Lin, Wayne, Piliouras, Georgios, Sim, Ryann, Varvitsiotis, Antonios, Lin, Wayne, Piliouras, Georgios, Sim, Ryann, and Varvitsiotis, Antonios
Abstract: Gamification is an emerging trend in the field of machine learning that presents a novel approach to solving optimization problems by transforming them into game-like scenarios. This paradigm shift allows for the development of robust, easily implementable, and parallelizable algorithms for hard optimization problems. In our work, we use gamification to tackle the Best Separable State (BSS) problem, a fundamental problem in quantum information theory that involves linear optimization over the set of separable quantum states. To achieve this we introduce and study quantum analogues of common-interest games (CIGs) and potential games where players have density matrices as strategies and their interests are perfectly aligned. We bridge the gap between optimization and game theory by establishing the equivalence between KKT (first-order stationary) points of a BSS instance and the Nash equilibria of its corresponding quantum CIG. Taking the perspective of learning in games, we introduce non-commutative extensions of the continuous-time replicator dynamics and the discrete-time Baum-Eagon/linear multiplicative weights update for learning in quantum CIGs, which also serve as decentralized algorithms for the BSS problem. We show that the common utility/objective value of a BSS instance is strictly increasing along trajectories of our algorithms, and finally corroborate our theoretical findings through extensive experiments.
Published: 2023

12. Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics

Author: Hussain, Aamal Abbas, Belardinelli, Francesco, Piliouras, Georgios, Hussain, Aamal Abbas, Belardinelli, Francesco, and Piliouras, Georgios
Abstract: Achieving convergence of multiple learning agents in general $N$-player games is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to autonomous systems. Yet it is known that, outside the bounds of simple two-player games, convergence cannot be taken for granted. To make progress in resolving this problem, we study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm which quantifies the tendency for learning agents to explore their state space or exploit their payoffs. We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game. We connect this result to games for which Q-Learning is known to converge with arbitrary exploration rates, including weighted Potential games and weighted zero sum polymatrix games. Finally, we examine the performance of the Q-Learning dynamic as measured by the Time Averaged Social Welfare, and comparing this with the Social Welfare achieved by the equilibrium. We provide a sufficient condition whereby the Q-Learning dynamic will outperform the equilibrium even if the dynamics do not converge., Comment: Accepted in AAMAS 2023
Published: 2023

13. Heterogeneous Beliefs and Multi-Population Learning in Network Games

Author: Hu, Shuyue, Soh, Harold, Piliouras, Georgios, Hu, Shuyue, Soh, Harold, and Piliouras, Georgios
Abstract: The effect of population heterogeneity in multi-agent learning is practically relevant but remains far from being well-understood. Motivated by this, we introduce a model of multi-population learning that allows for heterogeneous beliefs within each population and where agents respond to their beliefs via smooth fictitious play (SFP).We show that the system state -- a probability distribution over beliefs -- evolves according to a system of partial differential equations akin to the continuity equations that commonly desccribe transport phenomena in physical systems. We establish the convergence of SFP to Quantal Response Equilibria in different classes of games capturing both network competition as well as network coordination. We also prove that the beliefs will eventually homogenize in all network games. Although the initial belief heterogeneity disappears in the limit, we show that it plays a crucial role for equilibrium selection in the case of coordination games as it helps select highly desirable equilibria. Contrary, in the case of network competition, the resulting limit behavior is independent of the initialization of beliefs, even when the underlying game has many distinct Nash equilibria.
Published: 2023

14. Min-Max Optimization Made Simple: Approximating the Proximal Point Method via Contraction Maps

Author: Cevher, Volkan, Piliouras, Georgios, Sim, Ryann, Skoulakis, Stratis, Cevher, Volkan, Piliouras, Georgios, Sim, Ryann, and Skoulakis, Stratis
Abstract: In this paper we present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems while requiring a simple and intuitive analysis. Similarly to the seminal work of Nemirovski and the recent approach of Piliouras et al. in normal form games, our work is based on the fact that the update rule of the Proximal Point method (PP) can be approximated up to accuracy $\epsilon$ with only $O(\log 1/\epsilon)$ additional gradient-calls through the iterations of a contraction map. Then combining the analysis of (PP) method with an error-propagation analysis we establish that the resulting first order method, called Clairvoyant Extra Gradient, admits near-optimal time-average convergence for general domains and last-iterate convergence in the unconstrained case., Comment: To appear in SOSA23
Published: 2023

15. Chaos persists in large-scale multi-agent learning despite adaptive learning rates

Author: Vlatakis-Gkaragkounis, Emmanouil-Vasileios, Flokas, Lampros, Piliouras, Georgios, Vlatakis-Gkaragkounis, Emmanouil-Vasileios, Flokas, Lampros, and Piliouras, Georgios
Abstract: Multi-agent learning is intrinsically harder, more unstable and unpredictable than single agent optimization. For this reason, numerous specialized heuristics and techniques have been designed towards the goal of achieving convergence to equilibria in self-play. One such celebrated approach is the use of dynamically adaptive learning rates. Although such techniques are known to allow for improved convergence guarantees in small games, it has been much harder to analyze them in more relevant settings with large populations of agents. These settings are particularly hard as recent work has established that learning with fixed rates will become chaotic given large enough populations.In this work, we show that chaos persists in large population congestion games despite using adaptive learning rates even for the ubiquitous Multiplicative Weight Updates algorithm, even in the presence of only two strategies. At a technical level, due to the non-autonomous nature of the system, our approach goes beyond conventional period-three techniques Li-Yorke by studying fundamental properties of the dynamics including invariant sets, volume expansion and turbulent sets. We complement our theoretical insights with experiments showcasing that slight variations to system parameters lead to a wide variety of unpredictable behaviors., Comment: 30 pages, 6 figures
Published: 2023

16. Equilibrium-Invariant Embedding, Metric Space, and Fundamental Set of $2\times2$ Normal-Form Games

Author: Marris, Luke, Gemp, Ian, Piliouras, Georgios, Marris, Luke, Gemp, Ian, and Piliouras, Georgios
Abstract: Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equilibrium-inspired distance metric for the space of all normal-form games and uncover a distance-preserving equilibrium-invariant embedding. Furthermore, we propose an additional transform which defines a better-response-invariant distance metric and embedding. To demonstrate these metric spaces we study $2\times2$ games. The equilibrium-invariant embedding of $2\times2$ games has an efficient two variable parameterization (a reduction from eight), where each variable geometrically describes an angle on a unit circle. Interesting properties can be spatially inferred from the embedding, including: equilibrium support, cycles, competition, coordination, distances, best-responses, and symmetries. The best-response-invariant embedding of $2\times2$ games, after considering symmetries, rediscovers a set of 15 games, and their respective equivalence classes. We propose that this set of game classes is fundamental and captures all possible interesting strategic interactions in $2\times2$ games. We introduce a directed graph representation and name for each class. Finally, we leverage the tools developed for $2\times2$ games to develop game theoretic visualizations of large normal-form and extensive-form games that aim to fingerprint the strategic interactions that occur within., Comment: 42 pages
Published: 2023

17. Exploiting hidden structures in non-convex games for convergence to Nash equilibrium

Author: Sakos, Iosif, Vlatakis-Gkaragkounis, Emmanouil-Vasileios, Mertikopoulos, Panayotis, Piliouras, Georgios, Sakos, Iosif, Vlatakis-Gkaragkounis, Emmanouil-Vasileios, Mertikopoulos, Panayotis, and Piliouras, Georgios
Abstract: A wide array of modern machine learning applications - from adversarial models to multi-agent reinforcement learning - can be formulated as non-cooperative games whose Nash equilibria represent the system's desired operational states. Despite having a highly non-convex loss landscape, many cases of interest possess a latent convex structure that could potentially be leveraged to yield convergence to equilibrium. Driven by this observation, our paper proposes a flexible first-order method that successfully exploits such "hidden structures" and achieves convergence under minimal assumptions for the transformation connecting the players' control variables to the game's latent, convex-structured layer. The proposed method - which we call preconditioned hidden gradient descent (PHGD) - hinges on a judiciously chosen gradient preconditioning scheme related to natural gradient methods. Importantly, we make no separability assumptions for the game's hidden structure, and we provide explicit convergence rate guarantees for both deterministic and stochastic environments., Comment: 32 pages, 18 figures
Published: 2023

18. Scalable AI Safety via Doubly-Efficient Debate

Author: Brown-Cohen, Jonah, Irving, Geoffrey, Piliouras, Georgios, Brown-Cohen, Jonah, Irving, Geoffrey, and Piliouras, Georgios
Abstract: The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly. Irving et al. [2018] proposed a debate method in this direction with the goal of pitting the power of such AI models against each other until the problem of identifying (mis)-alignment is broken down into a manageable subtask. While the promise of this approach is clear, the original framework was based on the assumption that the honest strategy is able to simulate deterministic AI systems for an exponential number of steps, limiting its applicability. In this paper, we show how to address these challenges by designing a new set of debate protocols where the honest strategy can always succeed using a simulation of a polynomial number of steps, whilst being able to verify the alignment of stochastic AI systems, even when the dishonest strategy is allowed to use exponentially many simulation steps.
Published: 2023

19. A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games

Author: Vasconcelos, Francisca, Vlatakis-Gkaragkounis, Emmanouil-Vasileios, Mertikopoulos, Panayotis, Piliouras, Georgios, Jordan, Michael I., Vasconcelos, Francisca, Vlatakis-Gkaragkounis, Emmanouil-Vasileios, Mertikopoulos, Panayotis, Piliouras, Georgios, and Jordan, Michael I.
Abstract: Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed the first classical algorithm for computing equilibria in quantum zero-sum games using the Matrix Multiplicative Weight Updates (MMWU) method to achieve a convergence rate of $\mathcal{O}(d/\epsilon^2)$ iterations to $\epsilon$-Nash equilibria in the $4^d$-dimensional spectraplex. In this work, we propose a hierarchy of quantum optimization algorithms that generalize MMWU via an extra-gradient mechanism. Notably, within this proposed hierarchy, we introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $\mathcal{O}(d/\epsilon)$ iterations to $\epsilon$-Nash equilibria. This quadratic speed-up relative to Jain and Watrous' original algorithm sets a new benchmark for computing $\epsilon$-Nash equilibria in quantum zero-sum games., Comment: 53 pages, 7 figures, QTML 2023 (Accepted (Long Talk))
Published: 2023

20. No-Regret Learning and Equilibrium Computation in Quantum Games

Author: Lin, Wayne, Piliouras, Georgios, Sim, Ryann, Varvitsiotis, Antonios, Lin, Wayne, Piliouras, Georgios, Sim, Ryann, and Varvitsiotis, Antonios
Abstract: As quantum processors advance, the emergence of large-scale decentralized systems involving interacting quantum-enabled agents is on the horizon. Recent research efforts have explored quantum versions of Nash and correlated equilibria as solution concepts of strategic quantum interactions, but these approaches did not directly connect to decentralized adaptive setups where agents possess limited information. This paper delves into the dynamics of quantum-enabled agents within decentralized systems that employ no-regret algorithms to update their behaviors over time. Specifically, we investigate two-player quantum zero-sum games and polymatrix quantum zero-sum games, showing that no-regret algorithms converge to separable quantum Nash equilibria in time-average. In the case of general multi-player quantum games, our work leads to a novel solution concept, (separable) quantum coarse correlated equilibria (QCCE), as the convergent outcome of the time-averaged behavior no-regret algorithms, offering a natural solution concept for decentralized quantum systems. Finally, we show that computing QCCEs can be formulated as a semidefinite program and establish the existence of entangled (i.e., non-separable) QCCEs, which cannot be approached via the current paradigm of no-regret learning.
Published: 2023

21. Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization

Author: Gemp, Ian, Marris, Luke, Piliouras, Georgios, Gemp, Ian, Marris, Luke, and Piliouras, Georgios
Abstract: We propose the first loss function for approximate Nash equilibria of normal-form games that is amenable to unbiased Monte Carlo estimation. This construction allows us to deploy standard non-convex stochastic optimization techniques for approximating Nash equilibria, resulting in novel algorithms with provable guarantees. We complement our theoretical analysis with experiments demonstrating that stochastic gradient descent can outperform previous state-of-the-art approaches., Comment: Published at ICLR 2024
Published: 2023

22. Stability of Multi-Agent Learning: Convergence in Network Games with Many Players

Author: Hussain, Aamal, Leonte, Dan, Belardinelli, Francesco, Piliouras, Georgios, Hussain, Aamal, Leonte, Dan, Belardinelli, Francesco, and Piliouras, Georgios
Abstract: The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the dynamics to converge to a unique equilibrium in any network game. We find that this condition depends on the nature of pairwise interactions and on the network structure, but is explicitly independent of the total number of agents in the game. We evaluate this result on a number of representative network games and show that, under suitable network conditions, stable learning dynamics can be achieved with an arbitrary number of agents., Comment: Presented at the Workshop on New Frontiers in Learning, Control, and Dynamical Systems at the International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA, 2023
Published: 2023

23. Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics

Author: Hussain, Aamal, Belardinelli, Francesco, Piliouras, Georgios, Hussain, Aamal, Belardinelli, Francesco, and Piliouras, Georgios
Abstract: The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption. Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents' tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close' to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance' to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest' network zero-sum game can be found efficiently. As our experiments show, these guarantees are independent of whether the dynamics ultimately reach an equilibrium, or remain non-convergent., Comment: Presented at IJCAI 2023
Published: 2023

24. Discovering How Agents Learn Using Few Data

Author: Sakos, Iosif, Varvitsiotis, Antonios, Piliouras, Georgios, Sakos, Iosif, Varvitsiotis, Antonios, and Piliouras, Georgios
Abstract: Decentralized learning algorithms are an essential tool for designing multi-agent systems, as they enable agents to autonomously learn from their experience and past interactions. In this work, we propose a theoretical and algorithmic framework for real-time identification of the learning dynamics that govern agent behavior using a short burst of a single system trajectory. Our method identifies agent dynamics through polynomial regression, where we compensate for limited data by incorporating side-information constraints that capture fundamental assumptions or expectations about agent behavior. These constraints are enforced computationally using sum-of-squares optimization, leading to a hierarchy of increasingly better approximations of the true agent dynamics. Extensive experiments demonstrated that our approach, using only 5 samples from a short run of a single trajectory, accurately recovers the true dynamics across various benchmarks, including equilibrium selection and prediction of chaotic systems up to 10 Lyapunov times. These findings suggest that our approach has significant potential to support effective policy and decision-making in strategic multi-agent systems.
Published: 2023

25. Multiplicative Updates for Online Convex Optimization over Symmetric Cones

Author: Canyakmaz, Ilayda, Lin, Wayne, Piliouras, Georgios, Varvitsiotis, Antonios, Canyakmaz, Ilayda, Lin, Wayne, Piliouras, Georgios, and Varvitsiotis, Antonios
Abstract: We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan Algebras, we introduce the Symmetric-Cone Multiplicative Weights Update (SCMWU), a projection-free algorithm for online optimization over the trace-one slice of an arbitrary symmetric cone. We show that SCMWU is equivalent to Follow-the-Regularized-Leader and Online Mirror Descent with symmetric-cone negative entropy as regularizer. Using this structural result we show that SCMWU is a no-regret algorithm, and verify our theoretical results with extensive experiments. Our results unify and generalize the analysis for the Multiplicative Weights Update method over the probability simplex and the Matrix Multiplicative Weights Update method over the set of density matrices., Comment: 27 pages, 7 figures, 2 tables
Published: 2023

26. Poincaré-Bendixson Limit Sets in Multi-Agent Learning

Author: Czechowski, A.T. (author), Piliouras, Georgios (author), Czechowski, A.T. (author), and Piliouras, Georgios (author)
Abstract: A key challenge of evolutionary game theory and multi-agent learning is to characterize the limit behavior of game dynamics. Whereas convergence is often a property of learning algorithms in games satisfying a particular reward structure (e.g., zero-sum games), even basic learning models, such as the replicator dynamics, are not guaranteed to converge for general payoffs. Worse yet, chaotic behavior is possible even in rather simple games, such as variants of the Rock-Paper-Scissors game. Although chaotic behavior in learning dynamics can be precluded by the celebrated Poincaré-Bendixson theorem, it is only applicable to low-dimensional settings. Are there other characteristics of a game that can force regularity in the limit sets of learning? We show that behavior consistent with the Poincaré-Bendixson theorem (limit cycles, but no chaotic attractor) can follow purely from the topological structure of the interaction graph, even for high-dimensional settings with an arbitrary number of players and arbitrary payoff matrices. We prove our result for a wide class of follow-the-regularized leader (FoReL) dynamics, which generalize replicator dynamics, for binary games characterized interaction graphs where the payoffs of each player are only affected by one other player (i.e., interaction graphs of indegree one). Since chaos occurs already in games with only two players and three strategies, this class of non-chaotic games may be considered maximal. Moreover, we provide simple conditions under which such behavior translates into efficiency guarantees, implying that FoReL learning achieves time-averaged sum of payoffs at least as good as that of a Nash equilibrium, thereby connecting the topology of the dynamics to social-welfare analysis., Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public., Interactive Intelligence
Published: 2022

27. Optimality Despite Chaos in Fee Markets

Author: Leonardos, Stefanos, Reijsbergen, Daniël, Monnot, Barnabé, Piliouras, Georgios, Leonardos, Stefanos, Reijsbergen, Daniël, Monnot, Barnabé, and Piliouras, Georgios
Abstract: Transaction fee markets are essential components of blockchain economies, as they resolve the inherent scarcity in the number of transactions that can be added to each block. In early blockchain protocols, this scarcity was resolved through a first-price auction in which users were forced to guess appropriate bids from recent blockchain data. Ethereum's EIP-1559 fee market reform streamlines this process through the use of a base fee that is increased (or decreased) whenever a block exceeds (or fails to meet) a specified target block size. Previous work has found that the EIP-1559 mechanism may lead to a base fee process that is inherently chaotic, in which case the base fee does not converge to a fixed point even under ideal conditions. However, the impact of this chaotic behavior on the fee market's main design goal -- blocks whose long-term average size equals the target -- has not previously been explored. As our main contribution, we derive near-optimal upper and lower bounds for the time-average block size in the EIP-1559 mechanism despite its possibly chaotic evolution. Our lower bound is equal to the target utilization level whereas our upper bound is approximately 6% higher than optimal. Empirical evidence is shown in great agreement with these theoretical predictions. Specifically, the historical average was approximately 2.9% larger than the target rage under Proof-of-Work and decreased to approximately 2.0% after Ethereum's transition to Proof-of-Stake. We also find that an approximate version of EIP-1559 achieves optimality even in the absence of convergence.
Published: 2022

28. Matrix Multiplicative Weights Updates in Quantum Zero-Sum Games: Conservation Laws & Recurrence

Author: Jain, Rahul, Piliouras, Georgios, Sim, Ryann, Jain, Rahul, Piliouras, Georgios, and Sim, Ryann
Abstract: Recent advances in quantum computing and in particular, the introduction of quantum GANs, have led to increased interest in quantum zero-sum game theory, extending the scope of learning algorithms for classical games into the quantum realm. In this paper, we focus on learning in quantum zero-sum games under Matrix Multiplicative Weights Update (a generalization of the multiplicative weights update method) and its continuous analogue, Quantum Replicator Dynamics. When each player selects their state according to quantum replicator dynamics, we show that the system exhibits conservation laws in a quantum-information theoretic sense. Moreover, we show that the system exhibits Poincare recurrence, meaning that almost all orbits return arbitrarily close to their initial conditions infinitely often. Our analysis generalizes previous results in the case of classical games., Comment: NeurIPS 2022
Published: 2022

29. Learning Correlated Equilibria in Mean-Field Games

Author: Muller, Paul, Elie, Romuald, Rowland, Mark, Lauriere, Mathieu, Perolat, Julien, Perrin, Sarah, Geist, Matthieu, Piliouras, Georgios, Pietquin, Olivier, Tuyls, Karl, Muller, Paul, Elie, Romuald, Rowland, Mark, Lauriere, Mathieu, Perolat, Julien, Perrin, Sarah, Geist, Matthieu, Piliouras, Georgios, Pietquin, Olivier, and Tuyls, Karl
Abstract: The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games.
Published: 2022

30. Fast Convergence of Optimistic Gradient Ascent in Network Zero-Sum Extensive Form Games

Author: Piliouras, Georgios, Ratliff, Lillian, Sim, Ryann, Skoulakis, Stratis, Piliouras, Georgios, Ratliff, Lillian, Sim, Ryann, and Skoulakis, Stratis
Abstract: The study of learning in games has thus far focused primarily on normal form games. In contrast, our understanding of learning in extensive form games (EFGs) and particularly in EFGs with many agents lags far behind, despite them being closer in nature to many real world applications. We consider the natural class of Network Zero-Sum Extensive Form Games, which combines the global zero-sum property of agent payoffs, the efficient representation of graphical games as well the expressive power of EFGs. We examine the convergence properties of Optimistic Gradient Ascent (OGA) in these games. We prove that the time-average behavior of such online learning dynamics exhibits $O(1/T)$ rate convergence to the set of Nash Equilibria. Moreover, we show that the day-to-day behavior also converges to Nash with rate $O(c^{-t})$ for some game-dependent constant $c>0$., Comment: To appear in SAGT 2022
Published: 2022

31. Nash, Conley, and Computation: Impossibility and Incompleteness in Game Dynamics

Author: Milionis, Jason, Papadimitriou, Christos, Piliouras, Georgios, Spendlove, Kelly, Milionis, Jason, Papadimitriou, Christos, Piliouras, Georgios, and Spendlove, Kelly
Abstract: Under what conditions do the behaviors of players, who play a game repeatedly, converge to a Nash equilibrium? If one assumes that the players' behavior is a discrete-time or continuous-time rule whereby the current mixed strategy profile is mapped to the next, this becomes a problem in the theory of dynamical systems. We apply this theory, and in particular the concepts of chain recurrence, attractors, and Conley index, to prove a general impossibility result: there exist games for which any dynamics is bound to have starting points that do not end up at a Nash equilibrium. We also prove a stronger result for $\epsilon$-approximate Nash equilibria: there are games such that no game dynamics can converge (in an appropriate sense) to $\epsilon$-Nash equilibria, and in fact the set of such games has positive measure. Further numerical results demonstrate that this holds for any $\epsilon$ between zero and $0.09$. Our results establish that, although the notions of Nash equilibria (and its computation-inspired approximations) are universally applicable in all games, they are also fundamentally incomplete as predictors of long term behavior, regardless of the choice of dynamics., Comment: 25 pages
Published: 2022

32. Scalable Deep Reinforcement Learning Algorithms for Mean Field Games

Author: Laurière, Mathieu, Perrin, Sarah, Girgin, Sertan, Muller, Paul, Jain, Ayush, Cabannes, Theophile, Piliouras, Georgios, Pérolat, Julien, Élie, Romuald, Pietquin, Olivier, Geist, Matthieu, Laurière, Mathieu, Perrin, Sarah, Girgin, Sertan, Muller, Paul, Jain, Ayush, Cabannes, Theophile, Piliouras, Georgios, Pérolat, Julien, Élie, Romuald, Pietquin, Olivier, and Geist, Matthieu
Abstract: Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature.
Published: 2022

33. No-Regret Learning in Games is Turing Complete

Author: Andrade, Gabriel P., Frongillo, Rafael, Piliouras, Georgios, Andrade, Gabriel P., Frongillo, Rafael, and Piliouras, Georgios
Abstract: Games are natural models for multi-agent machine learning settings, such as generative adversarial networks (GANs). The desirable outcomes from algorithmic interactions in these games are encoded as game theoretic equilibrium concepts, e.g. Nash and coarse correlated equilibria. As directly computing an equilibrium is typically impractical, one often aims to design learning algorithms that iteratively converge to equilibria. A growing body of negative results casts doubt on this goal, from non-convergence to chaotic and even arbitrary behaviour. In this paper we add a strong negative result to this list: learning in games is Turing complete. Specifically, we prove Turing completeness of the replicator dynamic on matrix games, one of the simplest possible settings. Our results imply the undecicability of reachability problems for learning algorithms in games, a special case of which is determining equilibrium convergence., Comment: 18 pages, 1 figure
Published: 2022

34. Unpredictable dynamics in congestion games: memory loss can prevent chaos

Author: Bielawski, Jakub, Chotibut, Thiparat, Falniowski, Fryderyk, Misiurewicz, Michal, Piliouras, Georgios, Bielawski, Jakub, Chotibut, Thiparat, Falniowski, Fryderyk, Misiurewicz, Michal, and Piliouras, Georgios
Abstract: We study the dynamics of simple congestion games with two resources where a continuum of agents behaves according to a version of Experience-Weighted Attraction (EWA) algorithm. The dynamics is characterized by two parameters: the (population) intensity of choice $a>0$ capturing the economic rationality of the total population of agents and a discount factor $\sigma\in [0,1]$ capturing a type of memory loss where past outcomes matter exponentially less than the recent ones. Finally, our system adds a third parameter $b \in (0,1)$, which captures the asymmetry of the cost functions of the two resources. It is the proportion of the agents using the first resource at Nash equilibrium, with $b=1/2$ capturing a symmetric network. Within this simple framework, we show a plethora of bifurcation phenomena where behavioral dynamics destabilize from global convergence to equilibrium, to limit cycles or even (formally proven) chaos as a function of the parameters $a$, $b$ and $\sigma$. Specifically, we show that for any discount factor $\sigma$ the system will be destabilized for a sufficiently large intensity of choice $a$. Although for discount factor $\sigma=0$ almost always (i.e., $b \neq 1/2$) the system will become chaotic, as $\sigma$ increases the chaotic regime will give place to the attracting periodic orbit of period 2. Therefore, memory loss can simplify game dynamics and make the system predictable. We complement our theoretical analysis with simulations and several bifurcation diagrams that showcase the unyielding complexity of the population dynamics (e.g., attracting periodic orbits of different lengths) even in the simplest possible potential games., Comment: 30 pages, 4 figures
Published: 2022

35. Multi-agent Performative Prediction: From Global Stability and Optimality to Chaos

Author: Piliouras, Georgios, Yu, Fang-Yi, Piliouras, Georgios, and Yu, Fang-Yi
Abstract: The recent framework of performative prediction is aimed at capturing settings where predictions influence the target/outcome they want to predict. In this paper, we introduce a natural multi-agent version of this framework, where multiple decision makers try to predict the same outcome. We showcase that such competition can result in interesting phenomena by proving the possibility of phase transitions from stability to instability and eventually chaos. Specifically, we present settings of multi-agent performative prediction where under sufficient conditions their dynamics lead to global stability and optimality. In the opposite direction, when the agents are not sufficiently cautious in their learning/updates rates, we show that instability and in fact formal chaos is possible. We complement our theoretical predictions with simulations showcasing the predictive power of our results.
Published: 2022

36. Alternating Mirror Descent for Constrained Min-Max Games

Author: Wibisono, Andre, Tao, Molei, Piliouras, Georgios, Wibisono, Andre, Tao, Molei, and Piliouras, Georgios
Abstract: In this paper we study two-player bilinear zero-sum games with constrained strategy spaces. An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint. We propose and analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrained optimization. We interpret alternating mirror descent as an alternating discretization of a skew-gradient flow in the dual space, and use tools from convex optimization and modified energy function to establish an $O(K^{-2/3})$ bound on its average regret after $K$ iterations. This quantitatively verifies the algorithm's better behavior than the simultaneous version of mirror descent algorithm, which is known to diverge and yields an $O(K^{-1/2})$ average regret bound. In the special case of an unconstrained setting, our results recover the behavior of alternating gradient descent algorithm for zero-sum games which was studied in (Bailey et al., COLT 2020).
Published: 2022

37. On the Approximability of Multistage Min-Sum Set Cover

Author: Fotakis, Dimitris, Kostopanagiotis, Panagiotis, Nakos, Vasileios, Piliouras, Georgios, Skoulakis, Stratis, Fotakis, Dimitris, Kostopanagiotis, Panagiotis, Nakos, Vasileios, Piliouras, Georgios, and Skoulakis, Stratis
Published: 2021
Full Text: View/download PDF

38. On the Approximability of Multistage Min-Sum Set Cover

Author: Fotakis, Dimitris, Kostopanagiotis, Panagiotis, Nakos, Vasileios, Piliouras, Georgios, Skoulakis, Stratis, Fotakis, Dimitris, Kostopanagiotis, Panagiotis, Nakos, Vasileios, Piliouras, Georgios, and Skoulakis, Stratis
Published: 2021
Full Text: View/download PDF

39. Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning via Clairvoyant Multiplicative Weights Update

Author: Piliouras, Georgios, Sim, Ryann, Skoulakis, Stratis, Piliouras, Georgios, Sim, Ryann, and Skoulakis, Stratis
Abstract: In this paper, we provide a novel and simple algorithm, Clairvoyant Multiplicative Weights Updates (CMWU) for regret minimization in general games. CMWU effectively corresponds to the standard MWU algorithm but where all agents, when updating their mixed strategies, use the payoff profiles based on tomorrow's behavior, i.e. the agents are clairvoyant. CMWU achieves constant regret of $\ln(m)/\eta$ in all normal-form games with m actions and fixed step-sizes $\eta$. Although CMWU encodes in its definition a fixed point computation, which in principle could result in dynamics that are neither computationally efficient nor uncoupled, we show that both of these issues can be largely circumvented. Specifically, as long as the step-size $\eta$ is upper bounded by $\frac{1}{(n-1)V}$, where $n$ is the number of agents and $[0,V]$ is the payoff range, then the CMWU updates can be computed linearly fast via a contraction map. This implementation results in an uncoupled online learning dynamic that admits a $O (\log T)$-sparse sub-sequence where each agent experiences at most $O(nV\log m)$ regret. This implies that the CMWU dynamics converge with rate $O(nV \log m \log T / T)$ to a \textit{Coarse Correlated Equilibrium}. The latter improves on the current state-of-the-art convergence rate of \textit{uncoupled online learning dynamics} \cite{daskalakis2021near,anagnostides2021near}., Comment: Expanded on the uncoupled online nature of the dynamics
Published: 2021

40. A Blockchain-Based Approach for Collaborative Formalization of Mathematics and Programs

Author: Lim, Jin Xing, Monnot, Barnabé, Lin, Shaowei, Piliouras, Georgios, Lim, Jin Xing, Monnot, Barnabé, Lin, Shaowei, and Piliouras, Georgios
Abstract: Formalization of mathematics is the process of digitizing mathematical knowledge, which allows for formal proof verification as well as efficient semantic searches. Given the large and ever-increasing gap between the set of formalized and unformalized mathematical knowledge, there is a clear need to encourage more computer scientists and mathematicians to solve and formalize mathematical problems together. With blockchain technology, we are able to decentralize this process, provide time-stamped verification of authorship and encourage collaboration through implementation of incentive mechanisms via smart contracts. Currently, the formalization of mathematics is done through the use of proof assistants, which can be used to verify programs and protocols as well. Furthermore, with the advancement in artificial intelligence (AI), particularly machine learning, we can apply automated AI reasoning tools in these proof assistants and (at least partially) automate the process of synthesizing proofs. In our paper, we demonstrate a blockchain-based system for collaborative formalization of mathematics and programs incorporating both human labour as well as automated AI tools. We explain how Token-Curated Registries (TCR) and smart contracts are used to ensure appropriate documents are recorded and encourage collaboration through implementation of incentive mechanisms respectively. Using an illustrative example, we show how formalized proofs of different sorting algorithms can be produced collaboratively in our proposed blockchain system., Comment: This is an extended version of our accepted paper at The 4th IEEE International Conference on Blockchain (IEEE Blockchain-2021)
Published: 2021

41. Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

Author: Muller, Paul, Rowland, Mark, Elie, Romuald, Piliouras, Georgios, Perolat, Julien, Lauriere, Mathieu, Marinier, Raphael, Pietquin, Olivier, Tuyls, Karl, Muller, Paul, Rowland, Mark, Elie, Romuald, Piliouras, Georgios, Perolat, Julien, Lauriere, Mathieu, Marinier, Raphael, Pietquin, Olivier, and Tuyls, Karl
Abstract: Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-field games pro-vides an asymptotic solution to this problem when the considered gamesare anonymous-symmetric. Unfortunately, the mean-field approximationintroduces non-linearities which prevent a straightforward adaptation ofPSRO. Building upon optimization and adversarial regret minimization,this paper sidesteps this issue and introduces mean-field PSRO, an adap-tation of PSRO which learns Nash, coarse correlated and correlated equi-libria in mean-field games. The key is to replace the exact distributioncomputation step by newly-defined mean-field no-adversarial-regret learn-ers, or by black-box optimization. We compare the asymptotic complexityof the approach to standard PSRO, greatly improve empirical bandit con-vergence speed by compressing temporal mixture weights, and ensure itis theoretically robust to payoff noise. Finally, we illustrate the speed andaccuracy of mean-field PSRO on several mean-field games, demonstratingconvergence to strong and weak equilibria., Comment: AAMAS
Published: 2021

42. Online Learning in Periodic Zero-Sum Games

Author: Fiez, Tanner, Sim, Ryann, Skoulakis, Stratis, Piliouras, Georgios, Ratliff, Lillian, Fiez, Tanner, Sim, Ryann, Skoulakis, Stratis, Piliouras, Georgios, and Ratliff, Lillian
Abstract: A seminal result in game theory is von Neumann's minmax theorem, which states that zero-sum games admit an essentially unique equilibrium solution. Classical learning results build on this theorem to show that online no-regret dynamics converge to an equilibrium in a time-average sense in zero-sum games. In the past several years, a key research direction has focused on characterizing the day-to-day behavior of such dynamics. General results in this direction show that broad classes of online learning dynamics are cyclic, and formally Poincar\'{e} recurrent, in zero-sum games. We analyze the robustness of these online learning behaviors in the case of periodic zero-sum games with a time-invariant equilibrium. This model generalizes the usual repeated game formulation while also being a realistic and natural model of a repeated competition between players that depends on exogenous environmental variations such as time-of-day effects, week-to-week trends, and seasonality. Interestingly, time-average convergence may fail even in the simplest such settings, in spite of the equilibrium being fixed. In contrast, using novel analysis methods, we show that Poincar\'{e} recurrence provably generalizes despite the complex, non-autonomous nature of these dynamical systems., Comment: To appear at NeurIPS 2021
Published: 2021

43. Transaction Fees on a Honeymoon: Ethereum's EIP-1559 One Month Later

Author: Reijsbergen, Daniël, Sridhar, Shyam, Monnot, Barnabé, Leonardos, Stefanos, Skoulakis, Stratis, Piliouras, Georgios, Reijsbergen, Daniël, Sridhar, Shyam, Monnot, Barnabé, Leonardos, Stefanos, Skoulakis, Stratis, and Piliouras, Georgios
Abstract: Ethereum Improvement Proposal (EIP) 1559 was recently implemented to transform Ethereum's transaction fee market. EIP-1559 utilizes an algorithmic update rule with a constant learning rate to estimate a base fee. The base fee reflects prevailing network conditions and hence provides a more reliable oracle for current gas prices. Using on-chain data from the period after its launch, we evaluate the impact of EIP-1559 on the user experience and market performance. Our empirical findings suggest that although EIP-1559 achieves its goals on average, short-term behavior is marked by intense, chaotic oscillations in block sizes (as predicted by our recent theoretical dynamical system analysis [1]) and slow adjustments during periods of demand bursts (e.g., NFT drops). Both phenomena lead to unwanted inter-block variability in mining rewards. To address this issue, we propose an alternative base fee adjustment rule in which the learning rate varies according to an additive increase, multiplicative decrease (AIMD) update scheme. Our simulations show that the latter robustly outperforms the EIP-1559 protocol under various demand scenarios. These results provide evidence that variable learning rate mechanisms may constitute a promising alternative to the default EIP-1559-based format and contribute to the ongoing discussion on the design of more efficient transaction fee markets., Comment: IEEE Blockchain-2021, The 4th IEEE International Conference on Blockchain, Melbourne, Australia | 06-08 December 2021
Published: 2021

44. Stochastic Multiplicative Weights Updates in Zero-Sum Games

Author: Bailey, James P., Nagarajan, Sai Ganesh, Piliouras, Georgios, Bailey, James P., Nagarajan, Sai Ganesh, and Piliouras, Georgios
Abstract: We study agents competing against each other in a repeated network zero-sum game while applying the multiplicative weights update (MWU) algorithm with fixed learning rates. In our implementation, agents select their strategies probabilistically in each iteration and update their weights/strategies using the realized vector payoff of all strategies, i.e., stochastic MWU with full information. We show that the system results in an irreducible Markov chain where agent strategies diverge from the set of Nash equilibria. Further, we show that agents will play pure strategies with probability 1 in the limit.
Published: 2021

45. Constants of Motion: The Antidote to Chaos in Optimization and Game Dynamics

Author: Piliouras, Georgios, Wang, Xiao, Piliouras, Georgios, and Wang, Xiao
Abstract: Several recent works in online optimization and game dynamics have established strong negative complexity results including the formal emergence of instability and chaos even in small such settings, e.g., $2\times 2$ games. These results motivate the following question: Which methodological tools can guarantee the regularity of such dynamics and how can we apply them in standard settings of interest such as discrete-time first-order optimization dynamics? We show how proving the existence of invariant functions, i.e., constant of motions, is a fundamental contribution in this direction and establish a plethora of such positive results (e.g. gradient descent, multiplicative weights update, alternating gradient descent and manifold gradient descent) both in optimization as well as in game settings. At a technical level, for some conservation laws we provide an explicit and concise closed form, whereas for other ones we present non-constructive proofs using tools from dynamical systems.
Published: 2021

46. On the Approximability of Multistage Min-Sum Set Cover

Author: Fotakis, Dimitris, Kostopanagiotis, Panagiotis, Nakos, Vasileios, Piliouras, Georgios, Skoulakis, Stratis, Fotakis, Dimitris, Kostopanagiotis, Panagiotis, Nakos, Vasileios, Piliouras, Georgios, and Skoulakis, Stratis
Abstract: We investigate the polynomial-time approximability of the multistage version of Min-Sum Set Cover ($\mathrm{DSSC}$), a natural and intriguing generalization of the classical List Update problem. In $\mathrm{DSSC}$, we maintain a sequence of permutations $(\pi^0, \pi^1, \ldots, \pi^T)$ on $n$ elements, based on a sequence of requests $(R^1, \ldots, R^T)$. We aim to minimize the total cost of updating $\pi^{t-1}$ to $\pi^{t}$, quantified by the Kendall tau distance $\mathrm{D}_{\mathrm{KT}}(\pi^{t-1}, \pi^t)$, plus the total cost of covering each request $R^t$ with the current permutation $\pi^t$, quantified by the position of the first element of $R^t$ in $\pi^t$. Using a reduction from Set Cover, we show that $\mathrm{DSSC}$ does not admit an $O(1)$-approximation, unless $\mathrm{P} = \mathrm{NP}$, and that any $o(\log n)$ (resp. $o(r)$) approximation to $\mathrm{DSSC}$ implies a sublogarithmic (resp. $o(r)$) approximation to Set Cover (resp. where each element appears at most $r$ times). Our main technical contribution is to show that $\mathrm{DSSC}$ can be approximated in polynomial-time within a factor of $O(\log^2 n)$ in general instances, by randomized rounding, and within a factor of $O(r^2)$, if all requests have cardinality at most $r$, by deterministic rounding.
Published: 2021

47. Evolutionary Dynamics and $\Phi$-Regret Minimization in Games

Author: Piliouras, Georgios, Rowland, Mark, Omidshafiei, Shayegan, Elie, Romuald, Hennes, Daniel, Connor, Jerome, Tuyls, Karl, Piliouras, Georgios, Rowland, Mark, Omidshafiei, Shayegan, Elie, Romuald, Hennes, Daniel, Connor, Jerome, and Tuyls, Karl
Abstract: Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner's performance against a baseline in hindsight. It is well-known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full \emph{mixed} strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established $\Phi$-regret framework, which provides a continuum of stronger regret measures. Importantly, $\Phi$-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of $\Phi$-regret in generic $2 \times 2$ games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 $2 \times 2$ games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of $\Phi$-regret minimization by RD in some larger games, hinting at further opportunity for $\Phi$-regret based study of such algorithms from both a theoretical and empirical perspective.
Published: 2021

48. Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Author: Leonardos, Stefanos, Piliouras, Georgios, Spendlove, Kelly, Leonardos, Stefanos, Piliouras, Georgios, and Spendlove, Kelly
Abstract: The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games, we show that fast convergence of Q-learning in competitive settings is obtained regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.
Published: 2021

49. From Griefing to Stability in Blockchain Mining Economies

Author: Cheung, Yun Kuen, Leonardos, Stefanos, Piliouras, Georgios, Sridhar, Shyam, Cheung, Yun Kuen, Leonardos, Stefanos, Piliouras, Georgios, and Sridhar, Shyam
Abstract: We study a game-theoretic model of blockchain mining economies and show that griefing, a practice according to which participants harm other participants at some lesser cost to themselves, is a prevalent threat at its Nash equilibria. The proof relies on a generalization of evolutionary stability to non-homogeneous populations via griefing factors (ratios that measure network losses relative to deviator's own losses) which leads to a formal theoretical argument for the dissipation of resources, consolidation of power and high entry barriers that are currently observed in practice. A critical assumption in this type of analysis is that miners' decisions have significant influence in aggregate network outcomes (such as network hashrate). However, as networks grow larger, the miner's interaction more closely resembles a distributed production economy or Fisher market and its stability properties change. In this case, we derive a proportional response (PR) update protocol which converges to market equilibria at which griefing is irrelevant. Convergence holds for a wide range of miners risk profiles and various degrees of resource mobility between blockchains with different mining technologies. Our empirical findings in a case study with four mineable cryptocurrencies suggest that risk diversification, restricted mobility of resources (as enforced by different mining technologies) and network growth, all are contributing factors to the stability of the inherently volatile blockchain ecosystem.
Published: 2021

50. Efficient Online Learning for Dynamic k-Clustering

Author: Fotakis, Dimitris, Piliouras, Georgios, Skoulakis, Stratis, Fotakis, Dimitris, Piliouras, Georgios, and Skoulakis, Stratis
Abstract: We study dynamic clustering problems from the perspective of online learning. We consider an online learning problem, called \textit{Dynamic $k$-Clustering}, in which $k$ centers are maintained in a metric space over time (centers may change positions) such as a dynamically changing set of $r$ clients is served in the best possible way. The connection cost at round $t$ is given by the \textit{$p$-norm} of the vector consisting of the distance of each client to its closest center at round $t$, for some $p\geq 1$ or $p = \infty$. We present a \textit{$\Theta\left( \min(k,r) \right)$-regret} polynomial-time online learning algorithm and show that, under some well-established computational complexity conjectures, \textit{constant-regret} cannot be achieved in polynomial-time. In addition to the efficient solution of Dynamic $k$-Clustering, our work contributes to the long line of research on combinatorial online learning.
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

121 results on '"Piliouras, Georgios"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources