Author: "Tuyls, Karl" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Tuyls, Karl"' showing total 754 results

Start Over Author "Tuyls, Karl"

754 results on '"Tuyls, Karl"'

1. Using deep reinforcement learning to promote sustainable human behaviour on a common pool resource problem

Author: Koster, Raphael, Pîslar, Miruna, Tacchetti, Andrea, Balaguer, Jan, Liu, Leqi, Elie, Romuald, Hauser, Oliver P., Tuyls, Karl, Botvinick, Matt, and Summerfield, Christopher
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Computer Science and Game Theory
Abstract: A canonical social dilemma arises when finite resources are allocated to a group of people, who can choose to either reciprocate with interest, or keep the proceeds for themselves. What resource allocation mechanisms will encourage levels of reciprocation that sustain the commons? Here, in an iterated multiplayer trust game, we use deep reinforcement learning (RL) to design an allocation mechanism that endogenously promotes sustainable contributions from human participants to a common pool resource. We first trained neural networks to behave like human players, creating a stimulated economy that allowed us to study how different mechanisms influenced the dynamics of receipt and reciprocation. We then used RL to train a social planner to maximise aggregate return to players. The social planner discovered a redistributive policy that led to a large surplus and an inclusive economy, in which players made roughly equal gains. The RL agent increased human surplus over baseline mechanisms based on unrestricted welfare or conditional cooperation, by conditioning its generosity on available resources and temporarily sanctioning defectors by allocating fewer resources to them. Examining the AI policy allowed us to develop an explainable mechanism that performed similarly and was more popular among players. Deep reinforcement learning can be used to discover mechanisms that promote sustainable human behaviour.
Published: 2024

2. Empirical Game-Theoretic Analysis: A Survey

Author: Wellman, Michael P., Tuyls, Karl, and Greenwald, Amy
Subjects: Computer Science - Computer Science and Game Theory
Abstract: In the empirical approach to game-theoretic analysis (EGTA), the model of the game comes not from declarative representation, but is derived by interrogation of a procedural description of the game environment. The motivation for developing this approach was to enable game-theoretic reasoning about strategic situations too complex for analytic specification and solution. Since its introduction over twenty years ago, EGTA has been applied to a wide range of multiagent domains, from auctions and markets to recreational games to cyber-security. We survey the extensive methodology developed for EGTA over the years, organized by the elemental subproblems comprising the EGTA process. We describe key EGTA concepts and techniques, and the questions at the frontier of EGTA research. Recent advances in machine learning have accelerated progress in EGTA, and promise to significantly expand our capacities for reasoning about complex game situations., Comment: 72 pages, 17 figures
Published: 2024

3. States as Strings as Strategies: Steering Language Models with Game-Theoretic Solvers

Author: Gemp, Ian, Bachrach, Yoram, Lanctot, Marc, Patel, Roma, Dasagi, Vibhavari, Marris, Luke, Piliouras, Georgios, Liu, Siqi, and Tuyls, Karl
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory
Abstract: Game theory is the study of mathematical models of strategic interactions among rational agents. Language is a key medium of interaction for humans, though it has historically proven difficult to model dialogue and its strategic motivations mathematically. A suitable model of the players, strategies, and payoffs associated with linguistic interactions (i.e., a binding to the conventional symbolic logic of game theory) would enable existing game-theoretic algorithms to provide strategic solutions in the space of language. In other words, a binding could provide a route to computing stable, rational conversational strategies in dialogue. Large language models (LLMs) have arguably reached a point where their generative capabilities can enable realistic, human-like simulations of natural dialogue. By prompting them in various ways, we can steer their responses towards different output utterances. Leveraging the expressivity of natural language, LLMs can also help us quickly generate new dialogue scenarios, which are grounded in real world applications. In this work, we present one possible binding from dialogue to game theory as well as generalizations of existing equilibrium finding algorithms to this setting. In addition, by exploiting LLMs generation capabilities along with our proposed binding, we can synthesize a large repository of formally-defined games in which one can study and test game-theoretic solution concepts. We also demonstrate how one can combine LLM-driven game generation, game-theoretic solvers, and imitation learning to construct a process for improving the strategic capabilities of LLMs., Comment: 32 pages, 8 figures, code available @ https://github.com/google-deepmind/open_spiel/blob/master/open_spiel/python/games/chat_game.py
Published: 2024

4. Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization

Author: Zhao, Yunfan, Behari, Nikhil, Hughes, Edward, Zhang, Edwin, Nagaraj, Dheeraj, Tuyls, Karl, Taneja, Aparna, and Tambe, Milind
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast generalization, we learn a novel single policy network model that utilizes feature information and employs a training procedure in which arms opt-in and out over time. We derive a new update rule for a crucial $\lambda$-network with theoretical convergence guarantees and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems.
Published: 2023

5. TacticAI: an AI assistant for football tactics

Author: Wang, Zhe, Veličković, Petar, Hennes, Daniel, Tomašev, Nenad, Prince, Laurel, Kaisers, Michael, Bachrach, Yoram, Elie, Romuald, Wenliang, Li Kevin, Piccinini, Federico, Spearman, William, Graham, Ian, Connor, Jerome, Yang, Yi, Recasens, Adrià, Khan, Mina, Beauguerlange, Nathalie, Sprechmann, Pablo, Moreno, Pol, Heess, Nicolas, Bowling, Michael, Hassabis, Demis, and Tuyls, Karl
Subjects: Computer Science - Machine Learning, Computer Science - Multiagent Systems, Statistics - Machine Learning
Abstract: Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing corner kicks, as they offer coaches the most direct opportunities for interventions and improvements. TacticAI incorporates both a predictive and a generative component, allowing the coaches to effectively sample and explore alternative player setups for each corner kick routine and to select those with the highest predicted likelihood of success. We validate TacticAI on a number of relevant benchmark tasks: predicting receivers and shot attempts and recommending player position adjustments. The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC. We show that TacticAI's model suggestions are not only indistinguishable from real tactics, but also favoured over existing tactics 90% of the time, and that TacticAI offers an effective corner kick retrieval system. TacticAI achieves these results despite the limited availability of gold-standard data, achieving data efficiency through geometric deep learning., Comment: 32 pages, 10 figures
Published: 2023

6. TacticAI: an AI assistant for football tactics

Author: Wang, Zhe, Veličković, Petar, Hennes, Daniel, Tomašev, Nenad, Prince, Laurel, Kaisers, Michael, Bachrach, Yoram, Elie, Romuald, Wenliang, Li Kevin, Piccinini, Federico, Spearman, William, Graham, Ian, Connor, Jerome, Yang, Yi, Recasens, Adrià, Khan, Mina, Beauguerlange, Nathalie, Sprechmann, Pablo, Moreno, Pol, Heess, Nicolas, Bowling, Michael, Hassabis, Demis, and Tuyls, Karl
Published: 2024
Full Text: View/download PDF

7. Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas

Author: Madhushani, Udari, McKee, Kevin R., Agapiou, John P., Leibo, Joel Z., Everett, Richard, Anthony, Thomas, Hughes, Edward, Tuyls, Karl, and Duéñez-Guzmán, Edgar A.
Subjects: Computer Science - Multiagent Systems, Statistics - Machine Learning
Abstract: In social psychology, Social Value Orientation (SVO) describes an individual's propensity to allocate resources between themself and others. In reinforcement learning, SVO has been instantiated as an intrinsic motivation that remaps an agent's rewards based on particular target distributions of group reward. Prior studies show that groups of agents endowed with heterogeneous SVO learn diverse policies in settings that resemble the incentive structure of Prisoner's dilemma. Our work extends this body of results and demonstrates that (1) heterogeneous SVO leads to meaningfully diverse policies across a range of incentive structures in sequential social dilemmas, as measured by task-specific diversity metrics; and (2) learning a best response to such policy diversity leads to better zero-shot generalization in some situations. We show that these best-response agents learn policies that are conditioned on their co-players, which we posit is the reason for improved zero-shot generalization results.
Published: 2023

8. Human-Timescale Adaptation in an Open-Ended Task Space

Author: Adaptive Agent Team, Bauer, Jakob, Baumli, Kate, Baveja, Satinder, Behbahani, Feryal, Bhoopchand, Avishkar, Bradley-Schmieg, Nathalie, Chang, Michael, Clay, Natalie, Collister, Adrian, Dasagi, Vibhavari, Gonzalez, Lucy, Gregor, Karol, Hughes, Edward, Kashem, Sheleem, Loks-Thompson, Maria, Openshaw, Hannah, Parker-Holder, Jack, Pathak, Shreya, Perez-Nieves, Nicolas, Rakicevic, Nemanja, Rocktäschel, Tim, Schroecker, Yannick, Sygnowski, Jakub, Tuyls, Karl, York, Sarah, Zacherl, Alexander, and Zhang, Lei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing
Abstract: Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a vast space of held-out environment dynamics, our adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. Adaptation emerges from three ingredients: (1) meta-reinforcement learning across a vast, smooth and diverse task distribution, (2) a policy parameterised as a large-scale attention-based memory architecture, and (3) an effective automated curriculum that prioritises tasks at the frontier of an agent's capabilities. We demonstrate characteristic scaling laws with respect to network size, memory length, and richness of the training task distribution. We believe our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.
Published: 2023

9. An Analysis of Quantile Temporal-Difference Learning

Author: Rowland, Mark, Munos, Rémi, Azar, Mohammad Gheshlaghi, Tang, Yunhao, Ostrovski, Georg, Harutyunyan, Anna, Tuyls, Karl, Bellemare, Marc G., and Dabney, Will
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successes, a theoretical understanding of QTD has proven elusive until now. Unlike classical TD learning, which can be analysed with standard stochastic approximation tools, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points. The core result of this paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1, putting QTD on firm theoretical footing. The proof establishes connections between QTD and non-linear differential inclusions through stochastic approximation theory and non-smooth analysis., Comment: Accepted to JMLR
Published: 2023

10. Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers

Author: Marris, Luke, Gemp, Ian, Anthony, Thomas, Tacchetti, Andrea, Liu, Siqi, and Tuyls, Karl
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems
Abstract: Solution concepts such as Nash Equilibria, Correlated Equilibria, and Coarse Correlated Equilibria are useful components for many multiagent machine learning algorithms. Unfortunately, solving a normal-form game could take prohibitive or non-deterministic time to converge, and could fail. We introduce the Neural Equilibrium Solver which utilizes a special equivariant neural network architecture to approximately solve the space of all games of fixed shape, buying speed and determinism. We define a flexible equilibrium selection framework, that is capable of uniquely selecting an equilibrium that minimizes relative entropy, or maximizes welfare. The network is trained without needing to generate any supervised training data. We show remarkable zero-shot generalization to larger games. We argue that such a network is a powerful component for many possible multiagent algorithms., Comment: NeurIPS 2022
Published: 2022

11. Game Theoretic Rating in N-player general-sum games with Equilibria

Author: Marris, Luke, Lanctot, Marc, Gemp, Ian, Omidshafiei, Shayegan, McAleer, Stephen, Connor, Jerome, Tuyls, Karl, and Graepel, Thore
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: Rating strategies in a game is an important area of research in game theory and artificial intelligence, and can be applied to any real-world competitive or cooperative setting. Traditionally, only transitive dependencies between strategies have been used to rate strategies (e.g. Elo), however recent work has expanded ratings to utilize game theoretic solutions to better rate strategies in non-transitive games. This work generalizes these ideas and proposes novel algorithms suitable for N-player, general-sum rating of strategies in normal-form games according to the payoff rating system. This enables well-established solution concepts, such as equilibria, to be leveraged to efficiently rate strategies in games with complex strategic interactions, which arise in multiagent training and real-world interactions between many agents. We empirically validate our methods on real world normal-form data (Premier League) and multiagent reinforcement learning agent evaluation.
Published: 2022

12. Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

Author: Gemp, Ian, Anthony, Thomas, Bachrach, Yoram, Bhoopchand, Avishkar, Bullard, Kalesha, Connor, Jerome, Dasagi, Vibhavari, De Vylder, Bart, Duenez-Guzman, Edgar, Elie, Romuald, Everett, Richard, Hennes, Daniel, Hughes, Edward, Khan, Mina, Lanctot, Marc, Larson, Kate, Lever, Guy, Liu, Siqi, Marris, Luke, McKee, Kevin R., Muller, Paul, Perolat, Julien, Strub, Florian, Tacchetti, Andrea, Tarassov, Eugene, Wang, Zhe, and Tuyls, Karl
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence
Abstract: The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research., Comment: Published in AI Communications 2022
Published: 2022

13. Learning Correlated Equilibria in Mean-Field Games

Author: Muller, Paul, Elie, Romuald, Rowland, Mark, Lauriere, Mathieu, Perolat, Julien, Perrin, Sarah, Geist, Matthieu, Piliouras, Georgios, Pietquin, Olivier, and Tuyls, Karl
Subjects: Computer Science - Computer Science and Game Theory, Statistics - Machine Learning
Abstract: The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games.
Published: 2022

14. Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Author: Perolat, Julien, de Vylder, Bart, Hennes, Daniel, Tarassov, Eugene, Strub, Florian, de Boer, Vincent, Muller, Paul, Connor, Jerome T., Burch, Neil, Anthony, Thomas, McAleer, Stephen, Elie, Romuald, Cen, Sarah H., Wang, Zhe, Gruslys, Audrunas, Malysheva, Aleksandra, Khan, Mina, Ozair, Sherjil, Timbers, Finbarr, Pohlen, Toby, Eccles, Tom, Rowland, Mark, Lanctot, Marc, Lespiau, Jean-Baptiste, Piot, Bilal, Omidshafiei, Shayegan, Lockhart, Edward, Sifre, Laurent, Beauguerlange, Nathalie, Munos, Remi, Silver, David, Singh, Satinder, Hassabis, Demis, and Tuyls, Karl
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems
Abstract: We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.
Published: 2022
Full Text: View/download PDF

15. Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

Author: Muller, Paul, Rowland, Mark, Elie, Romuald, Piliouras, Georgios, Perolat, Julien, Lauriere, Mathieu, Marinier, Raphael, Pietquin, Olivier, and Tuyls, Karl
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems
Abstract: Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-field games pro-vides an asymptotic solution to this problem when the considered gamesare anonymous-symmetric. Unfortunately, the mean-field approximationintroduces non-linearities which prevent a straightforward adaptation ofPSRO. Building upon optimization and adversarial regret minimization,this paper sidesteps this issue and introduces mean-field PSRO, an adap-tation of PSRO which learns Nash, coarse correlated and correlated equi-libria in mean-field games. The key is to replace the exact distributioncomputation step by newly-defined mean-field no-adversarial-regret learn-ers, or by black-box optimization. We compare the asymptotic complexityof the approach to standard PSRO, greatly improve empirical bandit con-vergence speed by compressing temporal mixture weights, and ensure itis theoretically robust to payoff noise. Finally, we illustrate the speed andaccuracy of mean-field PSRO on several mean-field games, demonstratingconvergence to strong and weak equilibria., Comment: AAMAS
Published: 2021

16. Statistical discrimination in learning agents

Author: Duéñez-Guzmán, Edgar A., McKee, Kevin R., Mao, Yiran, Coppin, Ben, Chiappa, Silvia, Vezhnevets, Alexander Sasha, Bakker, Michiel A., Bachrach, Yoram, Sadedin, Suzanne, Isaac, William, Tuyls, Karl, and Leibo, Joel Z.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems, 68T07 (Primary) 91A26, 91-10, 93A16 (Secondary), I.2.11, I.2.0
Abstract: Undesired bias afflicts both human and algorithmic decision making, and may be especially prevalent when information processing trade-offs incentivize the use of heuristics. One primary example is \textit{statistical discrimination} -- selecting social partners based not on their underlying attributes, but on readily perceptible characteristics that covary with their suitability for the task at hand. We present a theoretical model to examine how information processing influences statistical discrimination and test its predictions using multi-agent reinforcement learning with various agent architectures in a partner choice-based social dilemma. As predicted, statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture. All agents showed substantial statistical discrimination, defaulting to using the readily available correlates instead of the outcome relevant features. We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias. However, all agent algorithms we tried still exhibited substantial bias after learning in biased training populations., Comment: 29 pages, 10 figures
Published: 2021

17. Evolutionary Dynamics and $\Phi$-Regret Minimization in Games

Author: Piliouras, Georgios, Rowland, Mark, Omidshafiei, Shayegan, Elie, Romuald, Hennes, Daniel, Connor, Jerome, and Tuyls, Karl
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning, Computer Science - Multiagent Systems
Abstract: Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner's performance against a baseline in hindsight. It is well-known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full \emph{mixed} strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established $\Phi$-regret framework, which provides a continuum of stronger regret measures. Importantly, $\Phi$-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of $\Phi$-regret in generic $2 \times 2$ games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 $2 \times 2$ games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of $\Phi$-regret minimization by RD in some larger games, hinting at further opportunity for $\Phi$-regret based study of such algorithms from both a theoretical and empirical perspective.
Published: 2021

18. Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

Author: Marris, Luke, Muller, Paul, Lanctot, Marc, Tuyls, Karl, and Graepel, Thore
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning
Abstract: Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO), an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium. We further suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Maximum Gini Correlated Equilibrium (MGCE), a principled and computationally efficient family of solutions for solving the correlated equilibrium selection problem. We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games., Comment: ICML 2021, 9 pages, coded implementation available in https://github.com/deepmind/open_spiel/ (jpsro.py in examples)
Published: 2021

19. Time-series Imputation of Temporally-occluded Multiagent Trajectories

Author: Omidshafiei, Shayegan, Hennes, Daniel, Garnelo, Marta, Tarassov, Eugene, Wang, Zhe, Elie, Romuald, Connor, Jerome T., Muller, Paul, Graham, Ian, Spearman, William, and Tuyls, Karl
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: In multiagent environments, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment. These interactions, combined with the potential stochasticity of the agents' decision-making processes, make such systems complex and interesting to study from a dynamical perspective. Significant research has been conducted on learning models for forward-direction estimation of agent behaviors, for example, pedestrian predictions used for collision-avoidance in self-driving cars. However, in many settings, only sporadic observations of agents may be available in a given trajectory sequence. For instance, in football, subsets of players may come in and out of view of broadcast video footage, while unobserved players continue to interact off-screen. In this paper, we study the problem of multiagent time-series imputation, where available past and future observations of subsets of agents are used to estimate missing observations for other agents. Our approach, called the Graph Imputer, uses forward- and backward-information in combination with graph networks and variational autoencoders to enable learning of a distribution of imputed trajectories. We evaluate our approach on a dataset of football matches, using a projective camera module to train and evaluate our model for the off-screen player state estimation setting. We illustrate that our method outperforms several state-of-the-art approaches, including those hand-crafted for football.
Published: 2021

20. From Motor Control to Team Play in Simulated Humanoid Football

Author: Liu, Siqi, Lever, Guy, Wang, Zhe, Merel, Josh, Eslami, S. M. Ali, Hennes, Daniel, Czarnecki, Wojciech M., Tassa, Yuval, Omidshafiei, Shayegan, Abdolmaleki, Abbas, Siegel, Noah Y., Hasenclever, Leonard, Marris, Luke, Tunyasuvunakool, Saran, Song, H. Francis, Wulfmeier, Markus, Muller, Paul, Haarnoja, Tuomas, Tracey, Brendan D., Tuyls, Karl, Graepel, Thore, and Heess, Nicolas
Subjects: Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems, Computer Science - Neural and Evolutionary Computing, Computer Science - Robotics
Abstract: Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment. We develop a method that combines imitation learning, single- and multi-agent reinforcement learning and population-based training, and makes use of transferable representations of behaviour for decision making at different levels of abstraction. In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds. We investigate the emergence of behaviours at different levels of abstraction, as well as the representations that underlie these behaviours using several analysis techniques, including statistics from real-world sports analytics. Our work constitutes a complete demonstration of integrated decision-making at multiple scales in a physically embodied multi-agent setting. See project video at https://youtu.be/KHMwq9pv7mg.
Published: 2021

21. Scaling up Mean Field Games with Online Mirror Descent

Author: Perolat, Julien, Perrin, Sarah, Elie, Romuald, Laurière, Mathieu, Piliouras, Georgios, Geist, Matthieu, Tuyls, Karl, and Pietquin, Olivier
Subjects: Computer Science - Artificial Intelligence
Abstract: We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP). We empirically show that OMD scales up and converges significantly faster than FP by solving, for the first time to our knowledge, examples of MFGs with hundreds of billions states. This study establishes the state-of-the-art for learning in large-scale multi-agent and multi-population games.
Published: 2021

22. Game Plan: What AI can do for Football, and What Football can do for AI

Author: Tuyls, Karl, Omidshafiei, Shayegan, Muller, Paul, Wang, Zhe, Connor, Jerome, Hennes, Daniel, Graham, Ian, Spearman, William, Waskett, Tim, Steele, Dafydd, Luc, Pauline, Recasens, Adria, Galashov, Alexandre, Thornton, Gregory, Elie, Romuald, Sprechmann, Pablo, Moreno, Pol, Cao, Kris, Garnelo, Marta, Dutta, Praneet, Valko, Michal, Heess, Nicolas, Bridgland, Alex, Perolat, Julien, De Vylder, Bart, Eslami, Ali, Rowland, Mark, Jaegle, Andrew, Munos, Remi, Back, Trevor, Ahamed, Razia, Bouton, Simon, Beauguerlange, Nathalie, Broshear, Jackson, Graepel, Thore, and Hassabis, Demis
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems
Abstract: The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players' and coordinated teams' behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual).
Published: 2020

23. The Advantage Regret-Matching Actor-Critic

Author: Gruslys, Audrūnas, Lanctot, Marc, Munos, Rémi, Timbers, Finbarr, Schmid, Martin, Perolat, Julien, Morrill, Dustin, Zambaldi, Vinicius, Lespiau, Jean-Baptiste, Schultz, John, Azar, Mohammad Gheshlaghi, Bowling, Michael, and Tuyls, Karl
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior. We propose a model-free RL algorithm, the AdvantageRegret-Matching Actor-Critic (ARMAC): rather than saving past state-action data, ARMAC saves a buffer of past policies, replaying through them to reconstruct hindsight assessments of past behavior. These retrospective value estimates are used to predict conditional advantages which, combined with regret matching, produces a new policy. In particular, ARMAC learns from sampled trajectories in a centralized training setting, without requiring the application of importance sampling commonly used in Monte Carlo counterfactual regret (CFR) minimization; hence, it does not suffer from excessive variance in large environments. In the single-agent setting, ARMAC shows an interesting form of exploration by keeping past policies intact. In the multiagent setting, ARMAC in self-play approaches Nash equilibria on some partially-observable zero-sum benchmarks. We provide exploitability estimates in the significantly larger game of betting-abstracted no-limit Texas Hold'em.
Published: 2020

24. Navigating the Landscape of Multiplayer Games

Author: Omidshafiei, Shayegan, Tuyls, Karl, Czarnecki, Wojciech M., Santos, Francisco C., Rowland, Mark, Connor, Jerome, Hennes, Daniel, Muller, Paul, Perolat, Julien, De Vylder, Bart, Gruslys, Audrunas, and Munos, Remi
Subjects: Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: Multiplayer games have long been used as testbeds in artificial intelligence research, aptly referred to as the Drosophila of artificial intelligence. Traditionally, researchers have focused on using well-known games to build strong agents. This progress, however, can be better informed by characterizing games and their topological landscape. Tackling this latter question can facilitate understanding of agents and help determine what game an agent should target next as part of its training. Here, we show how network measures applied to response graphs of large-scale games enable the creation of a landscape of games, quantifying relationships between games of varying sizes and characteristics. We illustrate our findings in domains ranging from canonical games to complex empirical games capturing the performance of trained agents pitted against one another. Our results culminate in a demonstration leveraging this information to generate new and interesting games, including mixtures of empirical games synthesized from real world games.
Published: 2020
Full Text: View/download PDF

25. Real World Games Look Like Spinning Tops

Author: Czarnecki, Wojciech Marian, Gidel, Gauthier, Tracey, Brendan, Tuyls, Karl, Omidshafiei, Shayegan, Balduzzi, David, and Jaderberg, Max
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: This paper investigates the geometrical properties of real world games (e.g. Tic-Tac-Toe, Go, StarCraft II). We hypothesise that their geometrical structure resemble a spinning top, with the upright axis representing transitive strength, and the radial axis, which corresponds to the number of cycles that exist at a particular transitive strength, representing the non-transitive dimension. We prove the existence of this geometry for a wide class of real world games, exposing their temporal nature. Additionally, we show that this unique structure also has consequences for learning - it clarifies why populations of strategies are necessary for training of agents, and how population size relates to the structure of the game. Finally, we empirically validate these claims by using a selection of nine real world two-player zero-sum symmetric games, showing 1) the spinning top structure is revealed and can be easily re-constructed by using a new method of Nash clustering to measure the interaction between transitive and cyclical strategy behaviour, and 2) the effect that population size has on the convergence in these games.
Published: 2020

26. The Automated Inspection of Opaque Liquid Vaccines

Author: Palmer, Gregory, Schnieders, Benjamin, Savani, Rahul, Tuyls, Karl, Fossel, Joscha-David, and Flore, Harry
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In the pharmaceutical industry the screening of opaque vaccines containing suspensions is currently a manual task carried out by trained human visual inspectors. We show that deep learning can be used to effectively automate this process. A moving contrast is required to distinguish anomalies from other particles, reflections and dust resting on a vial's surface. We train 3D-ConvNets to predict the likelihood of 20-frame video samples containing anomalies. Our unaugmented dataset consists of hand-labelled samples, recorded using vials provided by the HAL Allergy Group, a pharmaceutical company. We trained ten randomly initialized 3D-ConvNets to provide a benchmark, observing mean AUROC scores of 0.94 and 0.93 for positive samples (containing anomalies) and negative (anomaly-free) samples, respectively. Using Frame-Completion Generative Adversarial Networks we: (i) introduce an algorithm for computing saliency maps, which we use to verify that the 3D-ConvNets are indeed identifying anomalies; (ii) propose a novel self-training approach using the saliency maps to determine if multiple networks agree on the location of anomalies. Our self-training approach allows us to augment our data set by labelling 217,888 additional samples. 3D-ConvNets trained with our augmented dataset improve on the results we get when we train only on the unaugmented dataset., Comment: 8 pages, 5 Figures, 3 Tables, ECAI 2020 Conference Proceedings
Published: 2020

27. From Poincar\'e Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

Author: Perolat, Julien, Munos, Remi, Lespiau, Jean-Baptiste, Omidshafiei, Shayegan, Rowland, Mark, Ortega, Pedro, Burch, Neil, Anthony, Thomas, Balduzzi, David, De Vylder, Bart, Piliouras, Georgios, Lanctot, Marc, and Tuyls, Karl
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG). We generalize existing results of Poincar\'e recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings. We then investigate how adapting the reward (by adding a regularization term) of the game can give strong convergence guarantees in monotone games. We continue by showing how this reward adaptation technique can be leveraged to build algorithms that converge exactly to the Nash equilibrium. Finally, we show how these insights can be directly used to build state-of-the-art model-free algorithms for zero-sum two-player Imperfect Information Games (IIG)., Comment: 43 pages
Published: 2020

28. A Generalized Training Approach for Multiagent Learning

Author: Muller, Paul, Omidshafiei, Shayegan, Rowland, Mark, Tuyls, Karl, Perolat, Julien, Liu, Siqi, Hennes, Daniel, Marris, Luke, Lanctot, Marc, Hughes, Edward, Wang, Zhe, Lever, Guy, Heess, Nicolas, Graepel, Thore, and Munos, Remi
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-player zero-sum games, a regime wherein Nash equilibria are tractably computable. In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible. Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept, $\alpha$-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings. We establish convergence guarantees in several games classes, and identify links between Nash equilibria and $\alpha$-Rank. We demonstrate the competitive performance of $\alpha$-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where $\alpha$-Rank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver. We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain.
Published: 2019

29. Multiagent Evaluation under Incomplete Information

Author: Rowland, Mark, Omidshafiei, Shayegan, Tuyls, Karl, Perolat, Julien, Valko, Michal, Piliouras, Georgios, and Munos, Remi
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents. Traditionally, researchers have relied on Elo ratings for this purpose, with recent works also using methods based on Nash equilibria. Unfortunately, Elo is unable to handle intransitive agent interactions, and other techniques are restricted to zero-sum, two-player settings or are limited by the fact that the Nash equilibrium is intractable to compute. Recently, a ranking method called {\alpha}-Rank, relying on a new graph-based game-theoretic solution concept, was shown to tractably apply to general games. However, evaluations based on Elo or {\alpha}-Rank typically assume noise-free game outcomes, despite the data often being collected from noisy simulations, making this assumption unrealistic in practice. This paper investigates multiagent evaluation in the incomplete information regime, involving general-sum many-player games with noisy outcomes. We derive sample complexity guarantees required to confidently rank agents in this setting. We propose adaptive algorithms for accurate ranking, provide correctness and sample complexity guarantees, then introduce a means of connecting uncertainties in noisy match outcomes to uncertainties in rankings. We evaluate the performance of these approaches in several domains, including Bernoulli games, a soccer meta-game, and Kuhn poker.
Published: 2019

30. OpenSpiel: A Framework for Reinforcement Learning in Games

Author: Lanctot, Marc, Lockhart, Edward, Lespiau, Jean-Baptiste, Zambaldi, Vinicius, Upadhyay, Satyaki, Pérolat, Julien, Srinivasan, Sriram, Timbers, Finbarr, Tuyls, Karl, Omidshafiei, Shayegan, Hennes, Daniel, Morrill, Dustin, Muller, Paul, Ewalds, Timo, Faulkner, Ryan, Kramár, János, De Vylder, Bart, Saeta, Brennan, Bradbury, James, Ding, David, Borgeaud, Sebastian, Lai, Matthew, Schrittwieser, Julian, Anthony, Thomas, Hughes, Edward, Danihelka, Ivo, and Ryan-Davis, Jonah
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems
Abstract: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partially- and fully- observable) grid worlds and social dilemmas. OpenSpiel also includes tools to analyze learning dynamics and other common evaluation metrics. This document serves both as an overview of the code base and an introduction to the terminology, core concepts, and algorithms across the fields of reinforcement learning, computational game theory, and search.
Published: 2019

31. Neural Replicator Dynamics

Author: Hennes, Daniel, Morrill, Dustin, Omidshafiei, Shayegan, Munos, Remi, Perolat, Julien, Lanctot, Marc, Gruslys, Audrunas, Lespiau, Jean-Baptiste, Parmas, Paavo, Duenez-Guzman, Edgar, and Tuyls, Karl
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. In this paper, we first demonstrate that standard softmax-based policy gradient can be prone to poor performance in the presence of even the most benign nonstationarity. By contrast, it is known that the replicator dynamics, a well-studied model from evolutionary game theory, eliminates dominated strategies and exhibits convergence of the time-averaged trajectories to interior Nash equilibria in zero-sum games. Thus, using the replicator dynamics as a foundation, we derive an elegant one-line change to policy gradient methods that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD). NeuRD reduces to the exponential weights/Hedge algorithm in the single-state all-actions case. Additionally, NeuRD has formal equivalence to softmax counterfactual regret minimization, which guarantees convergence in the sequential tabular case. Importantly, our algorithm provides a straightforward way of extending the replicator dynamics to the function approximation setting. Empirical results show that NeuRD quickly adapts to nonstationarities, outperforming policy gradient significantly in both tabular and function approximation settings, when evaluated on the standard imperfect information benchmarks of Kuhn Poker, Leduc Poker, and Goofspiel.
Published: 2019

32. Differentiable Game Mechanics

Author: Letcher, Alistair, Balduzzi, David, Racaniere, Sebastien, Martens, James, Foerster, Jakob, Tuyls, Karl, and Graepel, Thore
Subjects: Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems, Computer Science - Neural and Evolutionary Computing, Statistics - Machine Learning
Abstract: Deep learning is built on the foundational guarantee that gradient descent on an objective function converges to local minima. Unfortunately, this guarantee fails in settings, such as generative adversarial nets, that exhibit multiple interacting losses. The behavior of gradient-based methods in games is not well understood -- and is becoming increasingly important as adversarial and multi-objective architectures proliferate. In this paper, we develop new tools to understand and control the dynamics in n-player differentiable games. The key result is to decompose the game Jacobian into two components. The first, symmetric component, is related to potential games, which reduce to gradient descent on an implicit function. The second, antisymmetric component, relates to Hamiltonian games, a new class of games that obey a conservation law akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games. Basic experiments show SGA is competitive with recently proposed algorithms for finding stable fixed points in GANs -- while at the same time being applicable to, and having guarantees in, much more general cases., Comment: JMLR 2019, journal version of arXiv:1802.05642
Published: 2019

33. Evolving Indoor Navigational Strategies Using Gated Recurrent Units In NEAT

Author: Butterworth, James, Savani, Rahul, and Tuyls, Karl
Subjects: Computer Science - Neural and Evolutionary Computing
Abstract: Simultaneous Localisation and Mapping (SLAM) algorithms are expensive to run on smaller robotic platforms such as Micro-Aerial Vehicles. Bug algorithms are an alternative that use relatively little processing power, and avoid high memory consumption by not building an explicit map of the environment. Bug Algorithms achieve relatively good performance in simulated and robotic maze solving domains. However, because they are hand-designed, a natural question is whether they are globally optimal control policies. In this work we explore the performance of Neuroevolution - specifically NEAT - at evolving control policies for simulated differential drive robots carrying out generalised maze navigation. We extend NEAT to include Gated Recurrent Units (GRUs) to help deal with long term dependencies. We show that both NEAT and our NEAT-GRU can repeatably generate controllers that outperform I-Bug (an algorithm particularly well-suited for use in real robots) on a test set of 209 indoor maze like environments. We show that NEAT-GRU is superior to NEAT in this task but also that out of the 2 systems, only NEAT-GRU can continuously evolve successful controllers for a much harder task in which no bearing information about the target is provided to the agent.
Published: 2019

34. Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

Author: Lockhart, Edward, Lanctot, Marc, Pérolat, Julien, Lespiau, Jean-Baptiste, Morrill, Dustin, Timbers, Finbarr, and Tuyls, Karl
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning
Abstract: In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prove that when following this optimization, the exploitability of a player's strategy converges asymptotically to zero, and hence when both players employ this optimization, the joint policies converge to a Nash equilibrium. Unlike fictitious play (XFP) and counterfactual regret minimization (CFR), our convergence result pertains to the policies being optimized rather than the average policies. Our experiments demonstrate convergence rates comparable to XFP and CFR in four benchmark games in the tabular case. Using function approximation, we find that our algorithm outperforms the tabular version in two of the games, which, to the best of our knowledge, is the first such result in imperfect information games among this class of algorithms., Comment: IJCAI 2019, 11 pages, 1 figure
Published: 2019

35. $\alpha$-Rank: Multi-Agent Evaluation by Evolution

Author: Omidshafiei, Shayegan, Papadimitriou, Christos, Piliouras, Georgios, Tuyls, Karl, Rowland, Mark, Lespiau, Jean-Baptiste, Czarnecki, Wojciech M., Lanctot, Marc, Perolat, Julien, and Munos, Remi
Subjects: Computer Science - Multiagent Systems, Computer Science - Computer Science and Game Theory
Abstract: We introduce $\alpha$-Rank, a principled evolutionary dynamics methodology for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous- and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, the type of interactions, and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). $\alpha$-Rank provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics. This is a consequence of the links we establish to the MCC solution concept when the underlying evolutionary model's ranking-intensity parameter, $\alpha$, is chosen to be large, which exactly forms the basis of $\alpha$-Rank. In contrast to the Nash equilibrium, which is a static concept based on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley's Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. $\alpha$-Rank runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce proofs that not only provide a unifying perspective of existing continuous- and discrete-time evolutionary evaluation models, but also reveal the formal underpinnings of the $\alpha$-Rank methodology. We empirically validate the method in several domains including AlphaGo, AlphaZero, MuJoCo Soccer, and Poker.
Published: 2019

36. Fully Convolutional One-Shot Object Segmentation for Industrial Robotics

Author: Schnieders, Benjamin, Luo, Shan, Palmer, Gregory, and Tuyls, Karl
Subjects: Computer Science - Robotics
Abstract: The ability to identify and localize new objects robustly and effectively is vital for robotic grasping and manipulation in warehouses or smart factories. Deep convolutional neural networks (DCNNs) have achieved the state-of-the-art performance on established image datasets for object detection and segmentation. However, applying DCNNs in dynamic industrial scenarios, e.g., warehouses and autonomous production, remains a challenging problem. DCNNs quickly become ineffective when tasked with detecting objects that they have not been trained on. Given that re-training using the latest data is time consuming, DCNNs cannot meet the requirement of the Factory of the Future (FoF) regarding rapid development and production cycles. To address this problem, we propose a novel one-shot object segmentation framework, using a fully convolutional Siamese network architecture, to detect previously unknown objects based on a single prototype image. We turn to multi-task learning to reduce training time and improve classification accuracy. Furthermore, we introduce a novel approach to automatically cluster the learnt feature space representation in a weakly supervised manner. We test the proposed framework on the RoboCup@Work dataset, simulating requirements for the FoF. Results show that the trained network on average identifies 73% of previously unseen objects correctly from a single example image. Correctly identified objects are estimated to have a 87.53% successful pick-up rate. Finally, multi-task learning lowers the convergence time by up to 33%, and increases accuracy by 2.99%., Comment: International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), 9 pages
Published: 2019

37. Robust Temporal Difference Learning for Critical Domains

Author: Klima, Richard, Bloembergen, Daan, Kaisers, Michael, and Tuyls, Karl
Subjects: Computer Science - Machine Learning, Computer Science - Multiagent Systems, Statistics - Machine Learning
Abstract: We present a new Q-function operator for temporal difference (TD) learning methods that explicitly encodes robustness against significant rare events (SRE) in critical domains. The operator, which we call the $\kappa$-operator, allows to learn a robust policy in a model-based fashion without actually observing the SRE. We introduce single- and multi-agent robust TD methods using the operator $\kappa$. We prove convergence of the operator to the optimal robust Q-function with respect to the model using the theory of Generalized Markov Decision Processes. In addition we prove convergence to the optimal Q-function of the original MDP given that the probability of SREs vanishes. Empirical evaluations demonstrate the superior performance of $\kappa$-based TD methods both in the early learning phase as well as in the final converged stage. In addition we show robustness of the proposed method to small model errors, as well as its applicability in a multi-agent context., Comment: AAMAS 2019
Published: 2019

38. Generative Models over Neural Controllers for Transfer Learning

Author: Butterworth, James, Savani, Rahul, Tuyls, Karl, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Rudolph, Günter, editor, Kononova, Anna V., editor, Aguirre, Hernán, editor, Kerschke, Pascal, editor, Ochoa, Gabriela, editor, and Tušar, Tea, editor
Published: 2022
Full Text: View/download PDF

39. Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Author: Srinivasan, Sriram, Lanctot, Marc, Zambaldi, Vinicius, Perolat, Julien, Tuyls, Karl, Munos, Remi, and Bowling, Michael
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems, Statistics - Machine Learning
Abstract: Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against fixed policies and empirical convergence to approximate Nash equilibria in self-play with rates similar to or better than a baseline model-free algorithm for zero sum games, without any domain-specific state space reductions., Comment: NeurIPS 2018
Published: 2018

40. SCC-rFMQ Learning in Cooperative Markov Games with Continuous Actions

Author: Zhang, Chengwei, Li, Xiaohong, Hao, Jianye, Chen, Siqi, Tuyls, Karl, Feng, Zhiyong, Xue, Wanli, and Chen, Rong
Subjects: Computer Science - Artificial Intelligence
Abstract: Although many reinforcement learning methods have been proposed for learning the optimal solutions in single-agent continuous-action domains, multiagent coordination domains with continuous actions have received relatively few investigations. In this paper, we propose an independent learner hierarchical method, named Sample Continuous Coordination with recursive Frequency Maximum Q-Value (SCC-rFMQ), which divides the cooperative problem with continuous actions into two layers. The first layer samples a finite set of actions from the continuous action spaces by a re-sampling mechanism with variable exploratory rates, and the second layer evaluates the actions in the sampled action set and updates the policy using a reinforcement learning cooperative method. By constructing cooperative mechanisms at both levels, SCC-rFMQ can handle cooperative problems in continuous action cooperative Markov games effectively. The effectiveness of SCC-rFMQ is experimentally demonstrated on two well-designed games, i.e., a continuous version of the climbing game and a cooperative version of the boat problem. Experimental results show that SCC-rFMQ outperforms other reinforcement learning algorithms.
Published: 2018

41. Negative Update Intervals in Deep Multi-Agent Reinforcement Learning

Author: Palmer, Gregory, Savani, Rahul, and Tuyls, Karl
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies to learn optimal joint policies. Addressing one pathology often leaves approaches vulnerable towards others. For instance, hysteretic Q-learning addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as leniency, have proven more robust when dealing with multiple pathologies simultaneously. However, leniency has predominately been studied within the context of strategic form games (bimatrix games) and fully observable Markov games consisting of a small number of probabilistic state transitions. This raises the question of whether these findings scale to more complex domains. For this purpose we implement a temporally extend version of the Climb Game, within which agents must overcome multiple pathologies simultaneously, including relative overgeneralisation, stochasticity, the alter-exploration and moving target problems, while learning from a large observation space. We find that existing lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in this environment. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a Deep MA-RL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in our environment, overcoming the outlined pathologies., Comment: 11 Pages, 6 Figures, AAMAS2019 Conference Proceedings
Published: 2018

42. A Comparative Study of Bug Algorithms for Robot Navigation

Author: McGuire, Kimberly, de Croon, Guido, and Tuyls, Karl
Subjects: Computer Science - Robotics
Abstract: This paper presents a literature survey and a comparative study of Bug Algorithms, with the goal of investigating their potential for robotic navigation. At first sight, these methods seem to provide an efficient navigation paradigm, ideal for implementations on tiny robots with limited resources. Closer inspection, however, shows that many of these Bug Algorithms assume perfect global position estimate of the robot which in GPS-denied environments implies considerable expenses of computation and memory -- relying on accurate Simultaneous Localization And Mapping (SLAM) or Visual Odometry (VO) methods. We compare a selection of Bug Algorithms in a simulated robot and environment where they endure different types noise and failure-cases of their on-board sensors. From the simulation results, we conclude that the implemented Bug Algorithms' performances are sensitive to many types of sensor-noise, which was most noticeable for odometry-drift. This raises the question if Bug Algorithms are suitable for real-world, on-board, robotic navigation as is. Variations that use multiple sensors to keep track of their progress towards the goal, were more adept in completing their task in the presence of sensor-failures. This shows that Bug Algorithms must spread their risk, by relying on the readings of multiple sensors, to be suitable for real-world deployment., Comment: 22 pages, 20 figures, Submitted to Robotics and Autonomous Systems
Published: 2018

43. Fast Convergence for Object Detection by Learning how to Combine Error Functions

Author: Schnieders, Benjamin and Tuyls, Karl
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we introduce an innovative method to improve the convergence speed and accuracy of object detection neural networks. Our approach, CONVERGE-FAST-AUXNET, is based on employing multiple, dependent loss metrics and weighting them optimally using an on-line trained auxiliary network. Experiments are performed in the well-known RoboCup@Work challenge environment. A fully convolutional segmentation network is trained on detecting objects' pickup points. We empirically obtain an approximate measure for the rate of success of a robotic pickup operation based on the accuracy of the object detection network. Our experiments show that adding an optimally weighted Euclidean distance loss to a network trained on the commonly used Intersection over Union (IoU) metric reduces the convergence time by 42.48%. The estimated pickup rate is improved by 39.90%. Compared to state-of-the-art task weighting methods, the improvement is 24.5% in convergence, and 15.8% on the estimated pickup rate., Comment: Accepted for publication at IROS 2018
Published: 2018

44. Re-evaluating Evaluation

Author: Balduzzi, David, Tuyls, Karl, Perolat, Julien, and Graepel, Thore
Subjects: Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Statistics - Machine Learning
Abstract: Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation suites requires increasing effort. In this paper we take a step back and propose Nash averaging. The approach builds on a detailed analysis of the algebraic structure of evaluation in two basic scenarios: agent-vs-agent and agent-vs-task. The key strength of Nash averaging is that it automatically adapts to redundancies in evaluation data, so that results are not biased by the incorporation of easy tasks or weak agents. Nash averaging thus encourages maximally inclusive evaluation -- since there is no harm (computational cost aside) from including all available tasks and agents., Comment: NIPS 2018, final version
Published: 2018

45. Relational Deep Reinforcement Learning

Author: Zambaldi, Vinicius, Raposo, David, Santoro, Adam, Bapst, Victor, Li, Yujia, Babuschkin, Igor, Tuyls, Karl, Reichert, David, Lillicrap, Timothy, Lockhart, Edward, Shanahan, Murray, Langston, Victoria, Pascanu, Razvan, Botvinick, Matthew, Vinyals, Oriol, and Battaglia, Peter
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.
Published: 2018

46. Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input

Author: Lazaridou, Angeliki, Hermann, Karl Moritz, Tuyls, Karl, and Clark, Stephen
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Learning, Computer Science - Multiagent Systems
Abstract: The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks. Here we scale up this research by using contemporary deep learning methods and by training reinforcement-learning neural network agents on referential communication games. We extend previous work, in which agents were trained in symbolic environments, by developing agents which are able to learn from raw pixel data, a more challenging and realistic input representation. We find that the degree of structure found in the input data affects the nature of the emerged protocols, and thereby corroborate the hypothesis that structured compositional language is most likely to emerge when agents perceive the world as being structured., Comment: To appear at ICLR 2018
Published: 2018

47. Emergent Communication through Negotiation

Author: Cao, Kris, Lazaridou, Angeliki, Lanctot, Marc, Leibo, Joel Z, Tuyls, Karl, and Clark, Stephen
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Learning, Computer Science - Multiagent Systems
Abstract: Multi-agent reinforcement learning offers a way to study how communication could emerge in communities of agents needing to solve specific problems. In this paper, we study the emergence of communication in the negotiation environment, a semi-cooperative model of agent interaction. We introduce two communication protocols -- one grounded in the semantics of the game, and one which is \textit{a priori} ungrounded and is a form of cheap talk. We show that self-interested agents can use the pre-grounded communication channel to negotiate fairly, but are unable to effectively use the ungrounded channel. However, prosocial agents do learn to use cheap talk to find an optimal negotiating strategy, suggesting that cooperation is necessary for language to emerge. We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation., Comment: Published as a conference paper at ICLR 2018
Published: 2018

48. Inequity aversion improves cooperation in intertemporal social dilemmas

Author: Hughes, Edward, Leibo, Joel Z., Phillips, Matthew G., Tuyls, Karl, Duéñez-Guzmán, Edgar A., Castañeda, Antonio García, Dunning, Iain, Zhu, Tina, McKee, Kevin R., Koster, Raphael, Roff, Heather, and Graepel, Thore
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems, Quantitative Biology - Populations and Evolution
Abstract: Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist., Comment: 15 pages, 8 figures
Published: 2018

49. A Generalised Method for Empirical Game Theoretic Analysis

Author: Tuyls, Karl, Perolat, Julien, Lanctot, Marc, Leibo, Joel Z, and Graepel, Thore
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems
Abstract: This paper provides theoretical bounds for empirical game theoretical analysis of complex multi-agent interactions. We provide insights in the empirical meta game showing that a Nash equilibrium of the meta-game is an approximate Nash equilibrium of the true underlying game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Additionally, we extend the meta-game analysis methodology to asymmetric games. The state-of-the-art has only considered empirical games in which agents have access to the same strategy sets and the payoff structure is symmetric, implying that agents are interchangeable. Finally, we carry out an empirical illustration of the generalised method in several domains, illustrating the theory and evolutionary dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel Blotto game played by human players on Facebook (symmetric), and an example of a meta-game in Leduc Poker (asymmetric), generated by the PSRO multi-agent learning algorithm., Comment: will appear at AAMAS'18
Published: 2018

50. SA-IGA: A Multiagent Reinforcement Learning Method Towards Socially Optimal Outcomes

Author: Zhang, Chengwei, Li, Xiaohong, Hao, Jianye, Chen, Siqi, Tuyls, Karl, and Xue, Wanli
Subjects: Computer Science - Artificial Intelligence
Abstract: In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer's perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner's dilemma game and the coordination game) are analyzed in details. Based on the idea of SA-IGA, we further propose a practical multiagent learning algorithm, called SA-PGA, based on Q-learning update rule. Simulation results show that SA-PGA agent can achieve higher social welfare than previous social-optimality oriented Conditional Joint Action Learner (CJAL) and also is robust against individually rational opponents by reaching Nash equilibrium solutions.
Published: 2018

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

754 results on '"Tuyls, Karl"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources