Author: "Éric Piette" / Publisher: hal ccsd - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Éric Piette"' showing total 2 results

Start Over Author "Éric Piette" Publisher hal ccsd

2 results on '"Éric Piette"'

1. Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration

Author: Matthew Stephenson, Cameron Browne, Dennis J. N. J. Soemers, Éric Piette, Piette, Eric, Dept. of Advanced Computing Sciences, RS: FSE DACS, and RS: FSE DACS Mathematics Centre Maastricht
Subjects: FOS: Computer and information sciences, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Computer Science - Machine Learning, Computer science, Computer Science - Artificial Intelligence, Machine Learning (stat.ML), 02 engineering and technology, 010501 environmental sciences, [INFO] Computer Science [cs], Machine learning, computer.software_genre, 01 natural sciences, Machine Learning (cs.LG), Statistics - Machine Learning, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, Training performance, 0105 earth and related environmental sciences, business.industry, Sampling (statistics), Tree (data structure), Tree traversal, Artificial Intelligence (cs.AI), 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Expert Iteration (ExIt) is an effective framework for learning game-playing policies from self-play. ExIt involves training a policy to mimic the search behaviour of a tree search algorithm - such as Monte-Carlo tree search - and using the trained policy to guide it. The policy and the tree search can then iteratively improve each other, through experience gathered in self-play between instances of the guided tree search algorithm. This paper outlines three different approaches for manipulating the distribution of data collected from self-play, and the procedure that samples batches for learning updates from the collected data. Firstly, samples in batches are weighted based on the durations of the episodes in which they were originally experienced. Secondly, Prioritized Experience Replay is applied within the ExIt framework, to prioritise sampling experience from which we expect to obtain valuable training signals. Thirdly, a trained exploratory policy is used to diversify the trajectories experienced in self-play. This paper summarises the effects of these manipulations on training performance evaluated in fourteen different board games. We find major improvements in early training performance in some games, and minor improvements averaged over fourteen games., Comment: Accepted at the IEEE Conference on Games (CoG) 2020
Published: 2020

2. General Game Playing with Stochastic CSP

Author: Sébastien Tabary, Éric Piette, Frédéric Koriche, Sylvain Lagrue, DELORME, Fabien, Centre de Recherche en Informatique de Lens (CRIL), and Université d'Artois (UA)-Centre National de la Recherche Scientifique (CNRS)
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Mathematical optimization, Non-cooperative game, Sequential game, Computer science, Normal-form game, Symmetric game, ComputingMilieux_PERSONALCOMPUTING, Combinatorial game theory, 0102 computer and information sciences, 02 engineering and technology, computer.software_genre, 01 natural sciences, General game playing, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Computational Theory and Mathematics, 010201 computation theory & mathematics, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Repeated game, Discrete Mathematics and Combinatorics, 020201 artificial intelligence & image processing, Game tree, computer, Software
Abstract: The challenge of General Game Playing (GGP) is to devise game playing programs that take as input the rules of any strategic game, described in the Game Description Language (GDL), and that effectively play without human intervention. The aim of this paper is to address the GGP challenge by casting GDL games (potentially with chance events) into the Stochastic Constraint Satisfaction Problem (SCSP). The stochastic constraint network of a game is decomposed into a sequence of µSCSPs (also know as one-stage SCSP), each associated with a game round. Winning strategies are searched by coupling the MAC (Maintaining Arc Consistency) algorithm, used to solve each µSCSP in turn, together with the UCB (Upper Confidence Bound) policy for approximating the values of those strategies obtained by the last µSCSP in the sequence. Extensive experiments conducted on various GDL games with different deliberation times per round, demonstrate that the MAC-UCB algorithm significantly outperforms the state-of-the-art UCT (Upper Confidence bounds for Trees) algorithm.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

2 results on '"Éric Piette"'

1. Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration

2. General Game Playing with Stochastic CSP

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

2 results on '"Éric Piette"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources