1. Dirichlet-Multinomial Counterfactual Rewards for Heterogeneous Multiagent Systems
- Author
-
Nicholas Zerbel, Gaurav Dixit, and Kagan Tumer
- Subjects
Counterfactual thinking ,business.industry ,Computer science ,Process (engineering) ,Multi-agent system ,Exploration problem ,Machine learning ,computer.software_genre ,Dirichlet distribution ,symbols.namesake ,symbols ,Domain knowledge ,Artificial intelligence ,Baseline (configuration management) ,business ,computer ,Selection (genetic algorithm) - Abstract
Multi-robot teams have been shown to be effective in accomplishing complex tasks which require tight coordination among team members. In homogeneous systems, recent work has demonstrated that “stepping stone” rewards are an effective way to provide agents with feedback on potentially valuable actions even when the agent-to-agent coupling requirements of an objective are not satisfied. In this work, we propose a new mechanism for inferring hypothetical partners in tightly-coupled, heterogeneous systems called Dirichlet-Multinomial Counterfactual Selection (DMCS). Using DMCS, we show that agents can learn to infer appropriate counterfactual partners to receive more informative stepping stone rewards by testing in a modified multi-rover exploration problem. We also show that DMCS outperforms a random partner selection baseline by over 40%, and we demonstrate how domain knowledge can be used to induce a prior to guide the agent learning process. Finally, we show that DMCS maintains superior performance for up to 15 distinct rover types compared to the performance of the baseline which degrades rapidly.
- Published
- 2019
- Full Text
- View/download PDF