1. Meta-Reinforcement Learning With Dynamic Adaptiveness Distillation
- Author
-
Shiji Song, Hangkai Hu, Xiang Li, and Gao Huang
- Subjects
Computer Networks and Communications ,business.industry ,Computer science ,Process (engineering) ,Probabilistic logic ,Context (language use) ,Machine learning ,computer.software_genre ,Computer Science Applications ,Task (computing) ,PEARL (programming language) ,Artificial Intelligence ,Generalization (learning) ,Reinforcement learning ,Artificial intelligence ,business ,Inefficiency ,computer ,Software ,computer.programming_language - Abstract
Deep reinforcement learning is confronted with problems of sampling inefficiency and poor task migration capability. Meta-reinforcement learning (meta-RL) enables meta-learners to utilize the task-solving skills trained on similar tasks and quickly adapt to new tasks. However, meta-RL methods lack enough queries toward the relationship between task-agnostic exploitation of data and task-related knowledge introduced by latent context, limiting their effectiveness and generalization ability. In this article, we develop an algorithm for off-policy meta-RL that can provide the meta-learners with self-oriented cognition toward how they adapt to the family of tasks. In our approach, we perform dynamic task-adaptiveness distillation to describe how the meta-learners adjust the exploration strategy in the meta-training process. Our approach also enables the meta-learners to balance the influence of task-agnostic self-oriented adaption and task-related information through latent context reorganization. In our experiments, our method achieves 10%-20% higher asymptotic reward than probabilistic embeddings for actor-critic RL (PEARL).
- Published
- 2023
- Full Text
- View/download PDF