1. Accelerating Model-Free Reinforcement Learning With Imperfect Model Knowledge in Dynamic Spectrum Access
- Author
-
Hao-Hsuan Chang, Jonathan Ashdown, Lianjun Li, Yang Yi, Lingjia Liu, Jianan Bai, Hao Chen, and Jianzhong Zhang
- Subjects
0209 industrial biotechnology ,Computer Networks and Communications ,Computer science ,business.industry ,Spectrum (functional analysis) ,020206 networking & telecommunications ,Sample (statistics) ,02 engineering and technology ,Computer Science Applications ,020901 industrial engineering & automation ,Computer engineering ,Hardware and Architecture ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,Wireless ,Imperfect ,Focus (optics) ,business ,Information Systems - Abstract
Current studies that Our records indicate that Hao-Hsuan Chang is a Graduate Student Member of the IEEE. Please verify. Our records indicate that Jonathan D. Ashdown is a Member of the IEEE. Please verify. apply reinforcement learning (RL) to dynamic spectrum access (DSA) problems in wireless communications systems mainly focus on model-free RL (MFRL). However, in practice, MFRL requires a large number of samples to achieve good performance making it impractical in real-time applications such as DSA. Combining model-free and model-based RL can potentially reduce the sample complexity while achieving a similar level of performance as MFRL as long as the learned model is accurate enough. However, in a complex environment, the learned model is never perfect. In this article, we combine model-free and model-based RL, and introduce an algorithm that can work with an imperfectly learned model to accelerate the MFRL. Results show our algorithm achieves higher sample efficiency than the standard MFRL algorithm and the Dyna algorithm (a standard algorithm integrating model-based RL and MFRL) with much lower computation complexity than the Dyna algorithm. For the extreme case where the learned model is highly inaccurate, the Dyna algorithm performs even worse than the MFRL algorithm while our algorithm can still outperform the MFRL algorithm.
- Published
- 2020