Back to Search
Start Over
Pessimistic Model Selection for Offline Deep Reinforcement Learning
- Publication Year :
- 2021
- Publisher :
- arXiv, 2021.
-
Abstract
- Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications. Despite its promising performance, practical gaps exist when deploying DRL in real-world scenarios. One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL. In particular, for offline DRL with observational data, model selection is a challenging task as there is no ground truth available for performance demonstration, in contrast with the online setting with simulated environments. In this work, we propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee, which features a provably effective framework for finding the best policy among a set of candidate models. Two refined approaches are also proposed to address the potential bias of DRL model in identifying the optimal policy. Numerical studies demonstrated the superior performance of our approach over existing methods.<br />Comment: Preprint. A non-archival and preliminary venue was presented at NeurIPS 2021 Offline Reinforcement Learning Workshop
- Subjects :
- Computational Engineering, Finance, and Science (cs.CE)
FOS: Computer and information sciences
Computer Science - Machine Learning
Artificial Intelligence (cs.AI)
Computer Science - Artificial Intelligence
FOS: Electrical engineering, electronic engineering, information engineering
Computer Science - Neural and Evolutionary Computing
Neural and Evolutionary Computing (cs.NE)
Systems and Control (eess.SY)
Computer Science - Computational Engineering, Finance, and Science
Electrical Engineering and Systems Science - Systems and Control
Machine Learning (cs.LG)
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....673a9c96d0567c8fba278cae55aefc81
- Full Text :
- https://doi.org/10.48550/arxiv.2111.14346