Author: "Matignon, Laëtitia" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Matignon, Laëtitia"' showing total 44 results

Start Over Author "Matignon, Laëtitia"

44 results on '"Matignon, Laëtitia"'

1. Task-conditioned adaptation of visual features in multi-task policy learning

Author: Marza, Pierre, Matignon, Laetitia, Simonin, Olivier, and Wolf, Christian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Successfully addressing a wide variety of tasks is a core ability of autonomous agents, requiring flexibly adapting the underlying decision-making strategies and, as we argue in this work, also adapting the perception modules. An analogical argument would be the human visual system, which uses top-down signals to focus attention determined by the current task. Similarly, we adapt pre-trained large vision models conditioned on specific downstream tasks in the context of multi-task policy learning. We introduce task-conditioned adapters that do not require finetuning any pre-trained weights, combined with a single policy trained with behavior cloning and capable of addressing multiple tasks. We condition the visual adapters on task embeddings, which can be selected at inference if the task is known, or alternatively inferred from a set of example demonstrations. To this end, we propose a new optimization-based estimator. We evaluate the method on a wide variety of tasks from the CortexBench benchmark and show that, compared to existing work, it can be addressed with a single policy. In particular, we demonstrate that adapting visual features is a key design choice and that the method generalizes to unseen tasks given a few demonstrations.
Published: 2024

2. Attention Graph for Multi-Robot Social Navigation with Deep Reinforcement Learning

Author: Escudie, Erwan, Matignon, Laetitia, and Saraydaryan, Jacques
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Learning robot navigation strategies among pedestrian is crucial for domain based applications. Combining perception, planning and prediction allows us to model the interactions between robots and pedestrians, resulting in impressive outcomes especially with recent approaches based on deep reinforcement learning (RL). However, these works do not consider multi-robot scenarios. In this paper, we present MultiSoc, a new method for learning multi-agent socially aware navigation strategies using RL. Inspired by recent works on multi-agent deep RL, our method leverages graph-based representation of agent interactions, combining the positions and fields of view of entities (pedestrians and agents). Each agent uses a model based on two Graph Neural Network combined with attention mechanisms. First an edge-selector produces a sparse graph, then a crowd coordinator applies node attention to produce a graph representing the influence of each entity on the others. This is incorporated into a model-free RL framework to learn multi-agent policies. We evaluate our approach on simulation and provide a series of experiments in a set of various conditions (number of agents / pedestrians). Empirical results show that our method learns faster than social navigation deep RL mono-agent techniques, and enables efficient multi-agent implicit coordination in challenging crowd navigation with multiple heterogeneous humans. Furthermore, by incorporating customizable meta-parameters, we can adjust the neighborhood density to take into account in our navigation strategy.
Published: 2024

3. AutoNeRF: Training Implicit Scene Representations with Autonomous Agents

Author: Marza, Pierre, Matignon, Laetitia, Simonin, Olivier, Batra, Dhruv, Wolf, Christian, and Chaplot, Devendra Singh
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Implicit representations such as Neural Radiance Fields (NeRF) have been shown to be very effective at novel view synthesis. However, these models typically require manual and careful human data collection for training. In this paper, we present AutoNeRF, a method to collect data required to train NeRFs using autonomous embodied agents. Our method allows an agent to explore an unseen environment efficiently and use the experience to build an implicit map representation autonomously. We compare the impact of different exploration strategies including handcrafted frontier-based exploration, end-to-end and modular approaches composed of trained high-level planners and classical low-level path followers. We train these models with different reward functions tailored to this problem and evaluate the quality of the learned representations on four different downstream tasks: classical viewpoint rendering, map reconstruction, planning, and pose refinement. Empirical results show that NeRFs can be trained on actively collected data using just a single episode of experience in an unseen environment, and can be used for several downstream robotic tasks, and that modular trained exploration models outperform other classical and end-to-end baselines. Finally, we show that AutoNeRF can reconstruct large-scale scenes, and is thus a useful tool to perform scene-specific adaptation as the produced 3D environment models can be loaded into a simulator to fine-tune a policy of interest.
Published: 2023

4. Multi-Object Navigation with dynamically learned neural implicit representations

Author: Marza, Pierre, Matignon, Laetitia, Simonin, Olivier, and Wolf, Christian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Understanding and mapping a new environment are core abilities of any autonomously navigating agent. While classical robotics usually estimates maps in a stand-alone manner with SLAM variants, which maintain a topological or metric representation, end-to-end learning of navigation keeps some form of memory in a neural network. Networks are typically imbued with inductive biases, which can range from vectorial representations to birds-eye metric tensors or topological structures. In this work, we propose to structure neural networks with two neural implicit representations, which are learned dynamically during each episode and map the content of the scene: (i) the Semantic Finder predicts the position of a previously seen queried object; (ii) the Occupancy and Exploration Implicit Representation encapsulates information about explored area and obstacles, and is queried with a novel global read mechanism which directly maps from function space to a usable embedding space. Both representations are leveraged by an agent trained with Reinforcement Learning (RL) and learned online during each episode. We evaluate the agent on Multi-Object Navigation and show the high impact of using neural implicit representations as a memory source.
Published: 2022

5. An information-theoretic perspective on intrinsic motivation in reinforcement learning: a survey

Author: Aubret, Arthur, Matignon, Laetitia, and Hassas, Salima
Subjects: Computer Science - Machine Learning
Abstract: The reinforcement learning (RL) research area is very active, with an important number of new contributions; especially considering the emergent field of deep RL (DRL). However a number of scientific and technical challenges still need to be resolved, amongst which we can mention the ability to abstract actions or the difficulty to explore the environment in sparse-reward settings which can be addressed by intrinsic motivation (IM). We propose to survey these research works through a new taxonomy based on information theory: we computationally revisit the notions of surprise, novelty and skill learning. This allows us to identify advantages and disadvantages of methods and exhibit current outlooks of research. Our analysis suggests that novelty and surprise can assist the building of a hierarchy of transferable skills that further abstracts the environment and makes the exploration process more robust.
Published: 2022
Full Text: View/download PDF

6. Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation

Author: Marza, Pierre, Matignon, Laetitia, Simonin, Olivier, and Wolf, Christian
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: In the context of visual navigation, the capacity to map a novel environment is necessary for an agent to exploit its observation history in the considered place and efficiently reach known goals. This ability can be associated with spatial reasoning, where an agent is able to perceive spatial relationships and regularities, and discover object characteristics. Recent work introduces learnable policies parametrized by deep neural networks and trained with Reinforcement Learning (RL). In classical RL setups, the capacity to map and reason spatially is learned end-to-end, from reward alone. In this setting, we introduce supplementary supervision in the form of auxiliary tasks designed to favor the emergence of spatial perception capabilities in agents trained for a goal-reaching downstream objective. We show that learning to estimate metrics quantifying the spatial relationships between an agent at a given location and a goal to reach has a high positive impact in Multi-Object Navigation settings. Our method significantly improves the performance of different baseline agents, that either build an explicit or implicit representation of the environment, even matching the performance of incomparable oracle agents taking ground-truth maps as input. A learning-based agent from the literature trained with the proposed auxiliary losses was the winning entry to the Multi-Object Navigation Challenge, part of the CVPR 2021 Embodied AI Workshop.
Published: 2021

7. DisTop: Discovering a Topological representation to learn diverse and rewarding skills

Author: Aubret, Arthur, matignon, Laetitia, and Hassas, Salima
Subjects: Computer Science - Machine Learning
Abstract: The optimal way for a deep reinforcement learning (DRL) agent to explore is to learn a set of skills that achieves a uniform distribution of states. Following this,we introduce DisTop, a new model that simultaneously learns diverse skills and focuses on improving rewarding skills. DisTop progressively builds a discrete topology of the environment using an unsupervised contrastive loss, a growing network and a goal-conditioned policy. Using this topology, a state-independent hierarchical policy can select where the agent has to keep discovering skills in the state space. In turn, the newly visited states allows an improved learnt representation and the learning loop continues. Our experiments emphasize that DisTop is agnostic to the ground state representation and that the agent can discover the topology of its environment whether the states are high-dimensional binary data, images, or proprioceptive inputs. We demonstrate that this paradigm is competitiveon MuJoCo benchmarks with state-of-the-art algorithms on both single-task dense rewards and diverse skill discovery. By combining these two aspects, we showthat DisTop achieves state-of-the-art performance in comparison with hierarchical reinforcement learning (HRL) when rewards are sparse. We believe DisTop opens new perspectives by showing that bottom-up skill discovery combined with representation learning can unlock the exploration challenge in DRL.
Published: 2021

8. ELSIM: End-to-end learning of reusable skills through intrinsic motivation

Author: Aubret, Arthur, Matignon, Laetitia, and Hassas, Salima
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Taking inspiration from developmental learning, we present a novel reinforcement learning architecture which hierarchically learns and represents self-generated skills in an end-to-end way. With this architecture, an agent focuses only on task-rewarded skills while keeping the learning process of skills bottom-up. This bottom-up approach allows to learn skills that 1- are transferable across tasks, 2- improves exploration when rewards are sparse. To do so, we combine a previously defined mutual information objective with a novel curriculum learning algorithm, creating an unlimited and explorable tree of skills. We test our agent on simple gridworld environments to understand and visualize how the agent distinguishes between its skills. Then we show that our approach can scale on more difficult MuJoCo environments in which our agent is able to build a representation of skills which improve over a baseline both transfer learning and exploration when rewards are sparse., Comment: Accepted at ECML 2020
Published: 2020

9. A survey on intrinsic motivation in reinforcement learning

Author: Aubret, Arthur, Matignon, Laetitia, and Hassas, Salima
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: The reinforcement learning (RL) research area is very active, with an important number of new contributions; especially considering the emergent field of deep RL (DRL). However a number of scientific and technical challenges still need to be addressed, amongst which we can mention the ability to abstract actions or the difficulty to explore the environment which can be addressed by intrinsic motivation (IM). In this article, we provide a survey on the role of intrinsic motivation in DRL. We categorize the different kinds of intrinsic motivations and detail for each category, its advantages and limitations with respect to the mentioned challenges. Additionnally, we conduct an in-depth investigation of substantial current research questions, that are currently under study or not addressed at all in the considered research area of DRL. We choose to survey these research works, from the perspective of learning how to achieve tasks. We suggest then, that solving current challenges could lead to a larger developmental architecture which may tackle most of the tasks. We describe this developmental architecture on the basis of several building blocks composed of a RL algorithm and an IM module compressing information.
Published: 2019

10. TSRuleGrowth : Extraction de r\`egles de pr\'ediction semi-ordonn\'ees \`a partir d'une s\'erie temporelle d'\'el\'ements discrets, application dans un contexte d'intelligence ambiante

Author: Vuillemin, Benoit, Delphin-Poulat, Lionel, Nicol, Rozenn, Matignon, Laëtitia, and Hassas, Salima
Subjects: Computer Science - Artificial Intelligence
Abstract: This paper presents a new algorithm: TSRuleGrowth, looking for partially-ordered rules over a time series. This algorithm takes principles from the state of the art of rule mining and applies them to time series via a new notion of support. We apply this algorithm to real data from a connected environment, which extract user habits through different connected objects., Comment: in French. Conf\'erence Nationale sur les Applications Pratiques de l'Intelligence Artificielle (APIA), Jul 2019, Toulouse, France
Published: 2019

11. ELSIM: End-to-End Learning of Reusable Skills Through Intrinsic Motivation

Author: Aubret, Arthur, Matignon, Laetitia, Hassas, Salima, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hutter, Frank, editor, Kersting, Kristian, editor, Lijffijt, Jefrey, editor, and Valera, Isabel, editor
Published: 2021
Full Text: View/download PDF

12. TSRuleGrowth: Mining Partially-Ordered Prediction Rules From a Time Series of Discrete Elements, Application to a Context of Ambient Intelligence

Author: Vuillemin, Benoit, Delphin-Poulat, Lionel, Nicol, Rozenn, Matignon, Laetitia, Hassas, Salima, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Li, Jianxin, editor, Wang, Sen, editor, Qin, Shaowen, editor, Li, Xue, editor, and Wang, Shuliang, editor
Published: 2019
Full Text: View/download PDF

13. Context Aware Robot Architecture, Application to the RoboCup@Home Challenge

Author: Jumel, Fabrice, Saraydaryan, Jacques, Leber, Raphael, Matignon, Laetitia, Lombardi, Eric, Wolf, Christian, Simonin, Olivier, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Holz, Dirk, editor, Genter, Katie, editor, Saad, Maarouf, editor, and von Stryk, Oskar, editor
Published: 2019
Full Text: View/download PDF

14. Cooperative Multi-agent Policy Gradient

Author: Bono, Guillaume, Dibangoye, Jilles Steeve, Matignon, Laëtitia, Pereyron, Florian, Simonin, Olivier, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Berlingerio, Michele, editor, Bonchi, Francesco, editor, Gärtner, Thomas, editor, Hurley, Neil, editor, and Ifrim, Georgiana, editor
Published: 2019
Full Text: View/download PDF

15. Cooperative Multi-agent Policy Gradient

Author: Bono, Guillaume, primary, Dibangoye, Jilles Steeve, additional, Matignon, Laëtitia, additional, Pereyron, Florian, additional, and Simonin, Olivier, additional
Published: 2019
Full Text: View/download PDF

16. Apprentissage pour la navigation robotique parmi les humains : vers des chemins s'adaptant aux dynamiques de l'environnement

Author: Michelland, Yohan, Matignon, Laëtitia, Saraydaryan, Jacques, Simonin, Olivier, Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon, Department of Computer Science [Lyon] (CPE), École Supérieure de Chimie Physique Électronique de Lyon (CPE)-Université de Lyon, CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria), Robots coopératifs et adaptés à la présence humaine en environnements (CHROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), Lyon 1, LIRIS, and CITI - CITI Centre of Innovation in Telecommunications and Integration of services
Subjects: [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA], [INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: La navigation robotique en milieu peuplé est un champ de recherche très actif depuis de nombreuses années. De nouvelles perspectives sont offertes grâce à l’essor de l’apprentissage par renforcement profond. On peut retrouver deux types d’approches dans la littérature. D’un côté, une approche réaliste du point de vue robotique, où le robot apprend une politique de contrôle à partir des données bruitées fournies par ses capteurs. De l’autre, une approche symbolique, où le robot possède une perception complète des positions et vitesses des agents de l’environnement.On laisse de côté le problème de vision et de représentation du monde physique, pour se concentrer sur la meilleure manière de convertir la perception en action. Le modèle DS-RNN est un bon exemple de cette dernière approche. Mais ses performances diminuent dans des conditions plus réalistes. La limitation du champs de vision et la présence d’obstacles statiques dans l’environnement affectent grandement son taux de succès. L’indifférenciation entre humain et obstacle semble en être la cause principale, nous avons donc proposé de multiples modifications d’architectures pour palier à ce problème.
Published: 2022

17. Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning

Author: Matignon, Laëtitia, Laurent, Guillaume J., Le Fort-Piat, Nadine, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Kollias, Stefanos D., editor, Stafylopatis, Andreas, editor, Duch, Włodzisław, editor, and Oja, Erkki, editor
Published: 2006
Full Text: View/download PDF

18. Compressed information is all you need: unifying intrinsic motivations and representation learning

Author: Aubret, Arthur, Lefort, Mathieu, Triesch, Jochen, Matignon, Laëtitia, Hassas, Salima, Teulière, Céline, Institut Pascal (IP), Centre National de la Recherche Scientifique (CNRS)-Université Clermont Auvergne (UCA)-Institut national polytechnique Clermont Auvergne (INP Clermont Auvergne), Université Clermont Auvergne (UCA)-Université Clermont Auvergne (UCA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), and Frankfurt Institute for Advanced Studies (FIAS )
Subjects: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Published: 2022

19. Apprentissage séquentiel de compétences via la motivation intrinsèque et l'apprentissage par renforcement

Author: Bonnavaud, Hedwin, Aubret, Arthur, Matignon, Laëtitia, Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), and Université Lyon 1
Subjects: reinforcement learning, catastrophique forgetting, skills learning, intrinsic motivation, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: L'apprentissage de compétence permet à des agents d'apprendre des comportements distincts deux à deux pour chacune d'elles. Cela leur permet d'améliorer leur exploration, et de maximiser plus rapidement leurs récompenses. Cependant, cet apprentissage se fait de manière uniforme. Or, un agent n'a pas toujours accès aux données permettant un tel apprentissage, et cet apprentissage pose également un problème de complexité. Pour le corriger, nous avons développé une méthode permettant d'apprendre des compétences de manière séquentielle, via l'apprentissage par renforcement et les motivation intrinsèque. Au travers de simulations sur un environnement simple, nous avons pu montrer que les compétences apprises étaient plus distinguable que celles apprise de manière uniforme.
Published: 2021

20. Designing Decentralized Controllers for Distributed-Air-Jet MEMS-Based Micromanipulators by Reinforcement Learning

Author: Matignon, Laëtitia, Laurent, Guillaume J., Le Fort-Piat, Nadine, and Chapuis, Yves-André
Published: 2010
Full Text: View/download PDF

21. Stochastic Games

Author: Burkov, Andriy, primary, Matignon, Laëtitia, additional, and Chaib-Draa, Brahim, additional
Published: 2013
Full Text: View/download PDF

22. Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning

Author: Matignon, Laëtitia, primary, Laurent, Guillaume J., additional, and Le Fort-Piat, Nadine, additional
Published: 2006
Full Text: View/download PDF

23. TSRuleGrowth : Extraction de r\'egles de pr\'ediction semi-ordonn\'ees \'a partir d'une s\'erie temporelle d'\'el\'ements discrets, application dans un contexte d'intelligence ambiante

Author: Vuillemin, Benoit, Delphin-Poulat, Lionel, Nicol, Rozenn, Matignon, Laëtitia, and Hassas, Salima
Subjects: Computer Science - Artificial Intelligence
Abstract: This paper presents a new algorithm: TSRuleGrowth, looking for partially-ordered rules over a time series. This algorithm takes principles from the state of the art of rule mining and applies them to time series via a new notion of support. We apply this algorithm to real data from a connected environment, which extract user habits through different connected objects., Comment: in French. Conf\'erence Nationale sur les Applications Pratiques de l'Intelligence Artificielle (APIA), Jul 2019, Toulouse, France
Published: 2019

24. Étude de la motivation intrinsèque en apprentissage par renforcement

Author: Aubret, Arthur, Matignon, Laëtitia, Hassas, Salima, Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), and Aubret, Arthur
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], curiosity, meta-reward, acquisition de connaissances, options, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], knowledge acquisition, curiosité, génération d'objectifs, Apprentissage par renforcement, empowerment, Reinforcement learning, motivation intrinsèque, intrinsic motivation, generation of objectives
Abstract: Despite many existing works in reinforcement learning (RL) and the recent successes obtained by combining it with deep learning, RL is facing many challenges. Some of them, like the ability to abstract the action or the difficulty to conceive a reward function without expert knowledge, can be addressed by the use of intrinsic motivation. In this article, we provide a survey on the role of intrinsic motivation in RL and its different usages by detailing interests and limits of existing approaches. Our analysis suggests that mutual information is central to most of the work using intrinsic motivation in RL. The combination of deep RL and intrinsic motivation enables to learn more complicated and more generalisable behaviours than what enables standard RL., Malgré les nombreux travaux existants en apprentissage par renforcement (AR) et les récents succès obtenus notamment en le combinant avec l'apprentissage profond, l'AR fait encore aujourd'hui face à de nombreux défis. Certains d'entre eux, comme la problématique de l'abstraction temporelle des actions ou la difficulté de concevoir une fonction de récompense sans connaissances ex-pertes, peuvent être adressées par l'utilisation de récompenses intrinsèques. Dans cet article, nous proposons une étude du rôle de la motivation intrinsèque en AR et de ses différents usages, en détaillant les intérêts et les limites des approches existantes. Notre analyse suggère que la notion d'information mutuelle est centrale à la plupart des travaux utilisant la motivation intrinsèque en AR. Celle-ci, combinée aux algorithmes d'AR profond, permet d'apprendre des comportements plus complexes et plus généralisables que ce que permet l'AR traditionnel.
Published: 2019

25. SULFR: Simulation of Urban Logistic For Reinforcement

Author: Bono, Guillaume, Dibangoye, Jilles, Matignon, Laëtitia, Pereyron, Florian, Simonin, Olivier, Robots coopératifs et adaptés à la présence humaine en environnements dynamiques (CHROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA), CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria), Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), Volvo Group, Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), and Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)
Subjects: [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA], [INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: International audience; In urban logistics, various sources of uncertainty can invalidate pre-planned routes. In this context,a routing strategy that uses available information from the environment could help improvethe overall performance of the routing process by dynamically choosing the next client at theonline execution time. While static and deterministic testbeds for vehicle routing exist, theirstochastic and dynamic counterparts are still missing. This paper proposes an interface to the microtrafficsimulation package SUMO that implement a generative model of stochastic and dynamicvehicle routing problems. We formalize the latter using a reinforcement learning framework forsemi-Markov decision processes. The resulting testbeds make it possible to compare single- andmulti-agent reinforcement learning algorithms in customizable routing environments. We reportour preliminary tests to evaluate a hand-crafted policy on some basic scenarios.
Published: 2018

26. How explainable plans can make planning faster

Author: grea, antoine, Matignon, Laëtitia, Aknine, Samir, Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), and Gréa, Antoine
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Partial Order Planning, Partial Order Causal Link, Hierarchical planning, Planning Algorithms, POCL, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], [INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC], Real-time Planning, POP, HTN, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: International audience; In recent years the ubiquity of artificial intelligence raised concerns among the uninitiated. The misunderstanding is further increased since most advances do not have explainable results. For automated planning, the research often targets speed, quality, or expressiv-ity. Most existing solutions focus on one criteria while not addressing the others. However, human-related applications require a complex combination of all those criteria at different levels. We present a new method to compromise on these aspects while staying explainable. We aim to leave the range of potential applications as wide as possible but our main targets are human intent recognition and assistive robotics. We propose the HEART planner, a real-time decompositional planner based on a hierarchical version of Partial Order Causal Link (POCL). It cyclically explores the plan space while making sure that intermediary high level plans are valid and will return them as approximate solutions when interrupted. These plans are proven to be a guarantee of solvability. This paper aims to evaluate that process and its results compared to classical approaches in terms of efficiency and quality.
Published: 2018

27. Multi-Robot Simultaneous Coverage and Mapping of Complex Scene

Author: Matignon, Laëtitia, Simonin, Olivier, Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), Robots coopératifs et adaptés à la présence humaine en environnements dynamiques (CHROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA), Université de Lyon-Institut National des Sciences Appliquées (INSA), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), and Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)
Subjects: [INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA], [INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO], [INFO]Computer Science [cs]
Abstract: International audience; In this demonstration, participants will explore a system for multi-robot observation of a complex scene involving the activity of a person. Mobile robots have to cooperate to find a position around the scene maximizing its coverage, i.e. allowing a complete view of the human skeleton. Simultaneously, they have to map the unknown environment around the scene. We developed a simulator presented in this paper that allows to generate an environment, a scene, and to simulate robots' observations and motion. During the demonstration, users will be able to test our simulator, including setting up a scenario and a decision algorithm, monitoring the movements, observations and maps of the robots, and visualizing the performance of the team.
Published: 2018

28. Multi-Robot Simultaneous Coverage and Mapping of Complex Scene - Comparison of Different Strategies

Author: Matignon, Laëtitia, Simonin, Olivier, Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria), Robots coopératifs et adaptés à la présence humaine en environnements dynamiques (CHROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA), M. Dastani, G. Sukthankar, E. Andre, S. Koenig, Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), and Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)
Subjects: [INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA], [INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO]
Abstract: International audience; This paper addresses the problem of optimizing the observation of a human scene using several mobile robots. Mobile robots have to cooperate to find a position around the scene maximizing its coverage. The scene coverage is defined as the observation of the human pose skeleton. It is assumed that the robots can communicate but have no map of the environment. Thus the robots have to simultaneously cover and map the scene and the environment. We consider an incremental approach to master state-space complexity. Robots build an hybrid metric-topological map while evaluating the observation of the human pose skeleton. To this end we propose and evaluate different online optimization strategies exploiting local versus global information. We discuss the difference of the performance and cost. Experiments are performed both in simulation and with real robots.
Published: 2018

29. HEART: HiErarchical Abstraction for Real-Time Partial Order Causal Link Planning

Author: grea, antoine, Matignon, Laëtitia, Aknine, Samir, Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), and matignon, laetitia
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO]Computer Science [cs], [INFO] Computer Science [cs], ComputingMilieux_MISCELLANEOUS, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: International audience
Published: 2018

30. On the Study of Cooperative Multi-Agent Policy Gradient

Author: Bono, Guillaume, Dibangoye, Jilles, Matignon, Laëtitia, Pereyron, Florian, Simonin, Olivier, Robots coopératifs et adaptés à la présence humaine en environnements dynamiques (CHROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA), Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Volvo Group, INSA Lyon, INRIA, Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), and Université de Lyon-Université Lumière - Lyon 2 (UL2)
Subjects: [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Actor Critic, Decentralized Control, Multi-Agent Systems, Partial Observable Markov Decision Processes, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: Reinforcement Learning (RL) for decentralized partially observable Markov decisionprocesses (Dec-POMDPs) is lagging behind the spectacular breakthroughs of single-agent RL.That is because assumptions that hold in single-agent settings are often obsolete in decentralizedmulti-agent systems. To tackle this issue, we investigate the foundations of policy gradient methodswithin the centralized training for decentralized control (CTDC) paradigm. In this paradigm,learning can be accomplished in a centralized manner while execution can still be independent.Using this insight, we establish policy gradient theorem and compatible function approximationsfor decentralized multi-agent systems. Resulting actor-critic methods preserve the decentralizedcontrol at the execution phase, but can also estimate the policy gradient from collective experiencesguided by a centralized critic at the training phase. Experiments demonstrate our policy gradientmethods compare favorably against standard RL techniques in benchmarks from the literature.
Published: 2018

31. Classification des problèmes stochastiques et dynamiques de collectes et de livraisons par des véhicules intelligents

Author: Bono, Guillaume, Dibangoye, Jilles Steeve, Matignon, Laëtitia, Pereyron, Florian, Simonin, Olivier, Robots coopératifs et adaptés à la présence humaine en environnements dynamiques (CHROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA), CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria), Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), Volvo Group, Zanuttini, Bruno, Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), and Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Recherche Opérationnelle, [INFO.INFO-RO] Computer Science [cs]/Operations Research [cs.RO], Systèmes Multi-Agents, [INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA], Véhicules autonomes, [INFO.INFO-MA] Computer Science [cs]/Multiagent Systems [cs.MA], [INFO.INFO-RO]Computer Science [cs]/Operations Research [cs.RO], Optimisation dans l’incertain, Transport Intelligent Collaboratif, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: National audience; Les problèmes de planification de tournées de véhicules présentent une très grande richesseet disposent de nombreux raffinements dans la littérature. Les progrès récents autour des véhiculesautonomes ouvrent certaines perspectives quant à leur usage pour le transport de marchandises. Nousavons étudié la classification de cette famille de problèmes, et les modèles qui en découlent, pour tenterde nous positionner dans ce domaine avec cette nouvelle approche, intégrant l’utilisation d’une flottede véhicules autonomes et intelligents.
Published: 2017

32. Multi-Robot Navigation and Cooperative Mapping in a Circular Topology

Author: Bultmann, Simon, Matignon, Laëtitia, Simonin, Olivier, CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria), Robots coopératifs et adaptés à la présence humaine en environnements dynamiques (CHROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA), Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), INSA Lyon, Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), and Université de Lyon-Université Lumière - Lyon 2 (UL2)
Subjects: [INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO]
Abstract: Cooperative mapping of an environment by a team of multiple robots is an important problem to advance autonomous robot tasks for example in the field of service robotics or emergency assistance. A precise, global overview of the area the robots are working in, and the ability to navigate this area while avoiding obstacles and collisions between robots is a fundamental requirement for a large number of higher level robot-tasks in those domains. A cooperative mapping, navigation and communication framework supposing unknown initial relative robot positions is developed in this project based on the ROS libraries. It realizes robot displacement, localization and mapping under realistic real-world conditions. Such, the framework provides the underlying functions needed to realize a task of human activity observation in the future. Initially , local maps are individually constructed by the robots using the common gmapping SLAM algorithm from the ROS libraries. The robots are evolving on circles around the scene keeping a constant distance towards it or they can change radius, for example to circumvent obstacles. Local maps are continuously tried to align to compute a joint, global representation of the environment. The hypothesis of a common center point shared between the robots greatly facilitates this task, as the translation between local maps is inherently known and only the rotation has to be found. The map-merging is realized by adapting several methods known in literature to our specific topology. The developed framework is verified and evaluated in real-world scenarios using a team of three robots. Commonly available low-cost robot hardware is utilized. Good performances are reached in multiple scenarios, allowing the robots to construct a global overview by merging their limited local views of the scene.
Published: 2017

33. LOLLIPOP: Generating and using proper plan and negative refinements for online partial order planning

Author: grea, antoine, Aknine, Samir, Matignon, Laëtitia, Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), and aknine, samir
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], ComputingMilieux_MISCELLANEOUS, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: National audience
Published: 2016

34. Concentric and Incremental Multi-Robot Mapping to Observe Complex Scenes

Author: Cohen, Jonathan, Matignon, Laëtitia, Simonin, Olivier, CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria), Robots coopératifs et adaptés à la présence humaine en environnements dynamiques (CHROMA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA), Systèmes Cognitifs et Systèmes Multi-Agents (SyCoSMA), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Université de Lyon-Institut National des Sciences Appliquées (INSA), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), and Université de Lyon-Université Lumière - Lyon 2 (UL2)
Subjects: Mapping, Multi-robot exploration, [INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA], Scene observation, [INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO], Multi-robot coordination
Abstract: International audience; The observation and recognition of complex scenes can benefit from the use of multiple mobile cameras. In this paper we study a fleet of mobile robots where each robot controls the point of view of its embedded camera. The objective is to manage the cooperation between the robots to find a joint position that maximizes the joint observation of a scene, defined as the activity of one person. It is assumed that the robots can communicate but have no map of the environment and no external localisation. This paper presents a spatial concentric modeling of the environment well adapted to the navigation of two-wheeled non-holonomic robots. To reduce the complexity of finding the best solution in the space search, we propose an incremental mapping based on this model and some heuristics to search for the optimal observation in an online context. Experimental results in simulation are presented that show in particular the anytime aspects of the proposed algorithms.
Published: 2015

35. SOaN : un algorithme pour la coordination d'agents apprenants et non communicants

Author: Matignon, Laëtitia, Laurent, Guillaume J., Le Fort-Piat, Nadine, Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) (FEMTO-ST), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Université de Franche-Comté (UFC), and Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Apprentissage par renforcement, systèmes multi-agents, agents non communicants, jeux de Markov d'équipe, [SPI.NANO]Engineering Sciences [physics]/Micro and nanotechnologies/Microelectronics
Abstract: National audience; L'apprentissage par renforcement dans les systèmes multi-agents est un domaine de recherche très actif, comme en témoignent les états de l'art récents [Busoniu et al., 2008, Sandholm, 2007, Bab & Brafman, 2008, Vlassis, 2007]. Lauer et Riedmiller ont notamment montré que, sous certaines hypothèses, il est possible à des agents apprenants simultanément de coordonner leurs actions sans aucune communication et sans qu'ils perçoivent les actions de leurs congénères [Lauer & Riedmiller, 2000]. Cette propriété est particulièrement intéressante pour trouver des stratégies de coopération dans les systèmes multi-agents de grande taille.
Published: 2009

36. Coordination of independent learners in cooperative Markov games

Author: Matignon, Laëtitia, Laurent, Guillaume J., Le Fort-Piat, Nadine, Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) (FEMTO-ST), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Université de Franche-Comté (UFC), and Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Centre National de la Recherche Scientifique (CNRS)
Subjects: Cooperative Markov games, Distributed Q-learning, Multi-agent systems, Reinforcement learning, Frequency Maximum Q-value, Independent learners, [SPI.NANO]Engineering Sciences [physics]/Micro and nanotechnologies/Microelectronics
Abstract: In the framework of fully cooperative multi-agent systems, independent agents learning by reinforcement must overcome several difficulties as the coordination or the impact of exploration. The study of these issues allows first to synthesize the characteristics of existing reinforcement learning decentralized methods for independent learners in cooperative Markov games. Then, given the difficulties encountered by these approaches, we focus on two main skills: optimistic agents, which manage the coordination in deterministic environments, and the detection of the stochasticity of a game. Indeed, the key difficulty in stochastic environment is to distinguish between various causes of noise. The SOoN algorithm is so introduced, standing for “Swing between Optimistic or Neutral”, in which independent learners can adapt automatically to the environment stochasticity. Empirical results on various cooperative Markov games notably show that SOoN overcomes the main factors of non-coordination and is robust face to the exploration of other agents.
Published: 2009

37. Synthèse d'agents adaptatifs et coopératifs par apprentissage par renforcement. Application à la commande d'un système distribué de micromanipulation

Author: Matignon, Laëtitia, Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) (FEMTO-ST), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Centre National de la Recherche Scientifique (CNRS), Université de Franche-Comté, and Nadine Lefort-Piat(nadine.piat@ens2m.fr)
Subjects: decentralized control, reinforcement learning, independent learners, systèmes multiagents, smart surface, adaptative agents, agents independants, team Markov games, apprentissage par renforcement, système distribué de micromanipulation, [SPI.AUTO]Engineering Sciences [physics]/Automatic, cooperative agents, jeux de Markov d'équipe, multi-agent systems, distributed micromanipulation systems, agents coopératifs, agents adaptatifs, commande décentralisée
Abstract: Numerous applications can be formulated in terms of distributed systems, be it a necessity face to a physical distribution of entities (networks, mobile robotics) or a means of confronting the complexity to solve globally a problem. The objective is to use together reinforcement learning methods and multi-agent systems. Thus, cooperative and autonomous agents can learn to resolve in a decentralized way complex problems by adapting to them so as to realize a joint objective. Reinforcement learning methods do not need any a priori knowledge about the dynamics of the system, which can be stochastic and nonlinear. In order to improve the learning speed, knowledge incorporation methods are studied within the context of goal-directed tasks. A generic goal bias function is also proposed. Then we took an interest in independent learners in team Markov games. In this framework, agents learning by reinforcement must overcome several dificulties as the coordination or the impact of the exploration. The study of these issues allows first to synthesize the characteristics of existing reinforcement learning decentralized methods. Then, given the dificulties encountered by this approach, two algorithms are proposed. The first one, called hysteretic Q-learning, is based on agents with "adjustable optimistic tendency". The second one is the Swing between Optimistic or Neutral (SOoN) in which independent agents can adapt automatically to the environment stochasticity. Experimentations on various team Markov games notably show that SOoN overcomes the main factors of non-coordination and is robust face to the exploration of the other agents. An extension of these works to the decentralized control of a distributed micromanipulation system (smart surface) in a partially observable case is finally proposed.; De nombreuses applications peuvent être formulées en termes de systèmes distribués que ce soit une nécessité face à une distribution physique des entités (réseaux, robotique mobile) ou un moyen adopté face à la complexité d'appréhender un problème de manière globale. A travers l'utilisation conjointe de méthodes dites d'apprentissage par renforcement et des systèmes multi-agents, des agents autonomes coopératifs peuvent apprendre à résoudre de manière décentralisée des problèmes complexes en s'adaptant à ceux-ci afin de réaliser un objectif commun. Les méthodes d'apprentissage par renforcement ne nécessitent aucune connaissance a priori sur la dynamique du système, celui-ci pouvant être stochastique et non-linéaire. Cependant, afin d'améliorer la vitesse d'apprentissage, des méthodes d'injection de connaissances pour les problèmes de plus court chemin stochastique sont étudiées et une fonction d'influence générique est proposée. Nous nous intéressons ensuite au cas d'agents indépendants situés dans des jeux de Markov d'équipe. Dans ce cadre, les agents apprenant par renforcement doivent surmonter plusieurs enjeux tels que la coordination ou l'impact de l'exploration. L'étude de ces enjeux permet tout d'abord de synthétiser les caractéristiques des méthodes décentralisées d'apprentissage par renforcement existantes. Ensuite, au vu des difficultés rencontrées par ces approches, deux algorithmes sont proposés. Le premier est le Q-learning hystérétique qui repose sur des agents "à tendance optimiste réglable ". Le second est le Swing between Optimistic or Neutral (SOoN) qui permet à des agents indépendants de s'adapter automatiquement à la stochasticité de l'environnement. Les expérimentations sur divers jeux de Markov d'équipe montrent notamment que le SOoN surmonte les principaux facteurs de non-coordination et est robuste face à l'exploration des autres agents. Une extension de ces travaux à la commande décentralisée d'un système distribué de micromanipulation (smart surface) dans un cas partiellement observable est enfin exposée.
Published: 2008

38. Synthesis of adaptative and cooperative agents by reinforcement learning. Application to the control of a distributed micromanipulation system

Author: Matignon, Laëtitia, Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) (FEMTO-ST), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Centre National de la Recherche Scientifique (CNRS), Université de Franche-Comté, and Nadine Lefort-Piat(nadine.piat@ens2m.fr)
Subjects: decentralized control, reinforcement learning, independent learners, systèmes multiagents, smart surface, adaptative agents, agents independants, team Markov games, apprentissage par renforcement, système distribué de micromanipulation, [SPI.AUTO]Engineering Sciences [physics]/Automatic, cooperative agents, jeux de Markov d'équipe, multi-agent systems, distributed micromanipulation systems, agents coopératifs, agents adaptatifs, commande décentralisée
Abstract: Numerous applications can be formulated in terms of distributed systems, be it a necessity face to a physical distribution of entities (networks, mobile robotics) or a means of confronting the complexity to solve globally a problem. The objective is to use together reinforcement learning methods and multi-agent systems. Thus, cooperative and autonomous agents can learn to resolve in a decentralized way complex problems by adapting to them so as to realize a joint objective. Reinforcement learning methods do not need any a priori knowledge about the dynamics of the system, which can be stochastic and nonlinear. In order to improve the learning speed, knowledge incorporation methods are studied within the context of goal-directed tasks. A generic goal bias function is also proposed. Then we took an interest in independent learners in team Markov games. In this framework, agents learning by reinforcement must overcome several dificulties as the coordination or the impact of the exploration. The study of these issues allows first to synthesize the characteristics of existing reinforcement learning decentralized methods. Then, given the dificulties encountered by this approach, two algorithms are proposed. The first one, called hysteretic Q-learning, is based on agents with "adjustable optimistic tendency". The second one is the Swing between Optimistic or Neutral (SOoN) in which independent agents can adapt automatically to the environment stochasticity. Experimentations on various team Markov games notably show that SOoN overcomes the main factors of non-coordination and is robust face to the exploration of the other agents. An extension of these works to the decentralized control of a distributed micromanipulation system (smart surface) in a partially observable case is finally proposed.; De nombreuses applications peuvent être formulées en termes de systèmes distribués que ce soit une nécessité face à une distribution physique des entités (réseaux, robotique mobile) ou un moyen adopté face à la complexité d'appréhender un problème de manière globale. A travers l'utilisation conjointe de méthodes dites d'apprentissage par renforcement et des systèmes multi-agents, des agents autonomes coopératifs peuvent apprendre à résoudre de manière décentralisée des problèmes complexes en s'adaptant à ceux-ci afin de réaliser un objectif commun. Les méthodes d'apprentissage par renforcement ne nécessitent aucune connaissance a priori sur la dynamique du système, celui-ci pouvant être stochastique et non-linéaire. Cependant, afin d'améliorer la vitesse d'apprentissage, des méthodes d'injection de connaissances pour les problèmes de plus court chemin stochastique sont étudiées et une fonction d'influence générique est proposée. Nous nous intéressons ensuite au cas d'agents indépendants situés dans des jeux de Markov d'équipe. Dans ce cadre, les agents apprenant par renforcement doivent surmonter plusieurs enjeux tels que la coordination ou l'impact de l'exploration. L'étude de ces enjeux permet tout d'abord de synthétiser les caractéristiques des méthodes décentralisées d'apprentissage par renforcement existantes. Ensuite, au vu des difficultés rencontrées par ces approches, deux algorithmes sont proposés. Le premier est le Q-learning hystérétique qui repose sur des agents "à tendance optimiste réglable ". Le second est le Swing between Optimistic or Neutral (SOoN) qui permet à des agents indépendants de s'adapter automatiquement à la stochasticité de l'environnement. Les expérimentations sur divers jeux de Markov d'équipe montrent notamment que le SOoN surmonte les principaux facteurs de non-coordination et est robuste face à l'exploration des autres agents. Une extension de ces travaux à la commande décentralisée d'un système distribué de micromanipulation (smart surface) dans un cas partiellement observable est enfin exposée.
Published: 2008

39. A study of FMQ heuristic in cooperative multi-agent games

Author: Matignon, Laëtitia, Laurent, Guillaume, Le Fort - Piat, Nadine, Franche-Comté Électronique Mécanique, Thermique et Optique - Sciences et Technologies (UMR 6174) (FEMTO-ST), Université de Technologie de Belfort-Montbeliard (UTBM)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Centre National de la Recherche Scientifique (CNRS), Jiaying Shen, Pradeep Varakantham, and Rajiv Maheswaran.
Subjects: [SPI.NANO]Engineering Sciences [physics]/Micro and nanotechnologies/Microelectronics
Abstract: International audience; The article focuses on decentralized reinforcement learning (RL) in cooperative multi-agent games, where a team of independent learning agents (ILs) try to coordinate their individual actions to reach an optimal joint action. Within this framework, some algorithms based on Q-learning are proposed in recent works. Especially, we are interested in Distributed Q-learning which finds optimal policies in deterministic games, and in the Frequency Maximum Q value (FMQ) heuristic which is able in partially stochastic matrix games to distinguish if a poor reward received for the same action are due to either miscoordination or to the noisy reward function. Making this distinction is one of the main difficulties to solve stochastic games. Our objective is to find an algorithm able to switch over the updates according to a detection of the cause of noise. In this paper, a modified version of the FMQ heuristic is proposed which achieves this detection and the update adaptation. Moreover, this modified FMQ version is more robust and very easy to set.
Published: 2008

40. Un algorithme décentralisé d'apprentissage par renforcement multi-agents coopératifs : le Q-Learning Hystérétique

Author: Matignon, Laëtitia, Laurent, Guillaume, Le Fort - Piat, Nadine, Laboratoire d'automatique de Besançon (LAB), Centre National de la Recherche Scientifique (CNRS)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Université de Franche-Comté (UFC), Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC), Cépaduès Editions, and Azema, Martine
Subjects: [SPI.AUTO] Engineering Sciences [physics]/Automatic, DEC-POMPD, [INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA], [INFO.INFO-MA] Computer Science [cs]/Multiagent Systems [cs.MA], Q-Learning, Apprentissage par renforcement multi-agents, jeux matriciels répétés, [SPI.AUTO]Engineering Sciences [physics]/Automatic
Abstract: National audience; Nous nous intéressons aux techniques d'apprentissage par renforcement dans les systèmes multi-agents coopératifs. Nous présentons un nouvel algorithme pour agents indépendants qui permet d'apprendre l'action jointe optimale dans des jeux où la coordination est difficile. Nous motivons notre approche par le caractère décentralisé de cet algorithme qui ne nécessite aucune communication entre agents et des tables Q de taille indépendante du nombre d'agents. Des tests concluants sont de plus effectués sur des jeux coopératifs répétés, ainsi que sur un jeu de poursuite.
Published: 2007

41. Contrôle distribué d'une Smart surface par Apprentissage par Renforcement

Author: Matignon, Laëtitia, Laurent, Guillaume, Le Fort - Piat, Nadine, Azema, Martine, Laboratoire d'automatique de Besançon (LAB), Centre National de la Recherche Scientifique (CNRS)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Université de Franche-Comté (UFC), and Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)
Subjects: Apprentissage par renforcement, [SPI.NANO] Engineering Sciences [physics]/Micro and nanotechnologies/Microelectronics, coopération, jeux stochastiques, [SPI.NANO]Engineering Sciences [physics]/Micro and nanotechnologies/Microelectronics, système multiagent
Abstract: National audience; L'objectif est de développer de nouveaux algorithmes permettant la commande distribuée d'une smart surface et la coopération entre les micro actionneurs (valves pneumatiques) pour trier et positionner de façon optimale et robuste les pièces sur la smart surface. Ces algorithmes doivent permettre de commander un grand nombre de micro actionneurs, de prendre en compte l'hétérogénéité de leurs comportements et de gérer les coopérations entre actionneurs. L'approche que nous proposons pour commander ce système massivement distribué est fondée sur l'apprentissage par renforcement et les environnements multiagents. Le point de vue distribué permet de surmonter l'explosion combinatoire du nombre de commandes possibles. L'application de l'apprentissage par renforcement à un environnement multiagent constitue un réel challenge dû à l'apprentissage simultané des agents qui rompt l'hypothèse markovienne traditionnelle des méthodes d'apprentissage par renforcement.
Published: 2007

42. Choix de la fonction de renforcement et des valeurs initiales pour accélérer les problèmes d'Apprentissage par Renforcement de plus court chemin stochastique

Author: Matignon, Laëtitia, Laurent, Guillaume, Le Fort-Piat, Nadine, Azema, Martine, Laboratoire d'automatique de Besançon (LAB), Centre National de la Recherche Scientifique (CNRS)-Ecole Nationale Supérieure de Mécanique et des Microtechniques (ENSMM)-Université de Franche-Comté (UFC), and Université Bourgogne Franche-Comté [COMUE] (UBFC)-Université Bourgogne Franche-Comté [COMUE] (UBFC)
Subjects: [SPI.AUTO] Engineering Sciences [physics]/Automatic, fonction de renforcement, fonction d'influence, Apprentissage par renforcement de type goal-directed, initialisation de la table Q, reward shaping, [SPI.AUTO]Engineering Sciences [physics]/Automatic
Abstract: National audience; Un point important en apprentissage par renforcement (AR) est l'amélioration de la vitesse de convergence du processus d'apprentissage. Nous proposons dans cet article d'étudier l'influence de certains paramètres de l'AR sur la vitesse d'apprentissage. En effet, bien que les propriétés de convergence de l'AR ont été largement étudiées, peu de règles précises existent pour choisir correctement la fonction de renforcement et les valeurs initiales de la table Q. Notre méthode aide au choix de ces paramètres dans le cadre de problèmes de type goal-directed, c'est-à-dire dont l'objectif est d'atteindre un but en un minimum de temps. Nous développons une étude théorique et proposons ensuite des justifications expérimentales pour choisir d'une part la fonction de renforcement et d'autre part des valeurs initiales particulières de la table Q, basées sur une fonction d'influence.
Published: 2006

43. The world of independent learners is not markovian

Author: Laurent, Guillaume J., primary, Matignon, Laëtitia, additional, and Le Fort-Piat, N., additional
Published: 2011
Full Text: View/download PDF

44. Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning.

Author: Kollias, Stefanos, Stafylopatis, Andreas, Duch, Włodzisław, Oja, Erkki, Matignon, Laëtitia, Laurent, Guillaume J., and Fort-Piat, Nadine
Abstract: An important issue in Reinforcement Learning (RL) is to accelerate or improve the learning process. In this paper, we study the influence of some RL parameters over the learning speed. Indeed, although RL convergence properties have been widely studied, no precise rules exist to correctly choose the reward function and initial Q-values. Our method helps the choice of these RL parameters within the context of reaching a goal in a minimal time. We develop a theoretical study and also provide experimental justifications for choosing on the one hand the reward function, and on the other hand particular initial Q-values based on a goal bias function. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

44 results on '"Matignon, Laëtitia"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources