Back to Search Start Over

Provably efficient RL with Rich Observations via Latent State Decoding

Authors :
Du, Simon S.
Krishnamurthy, Akshay
Jiang, Nan
Agarwal, Alekh
Dudík, Miroslav
Langford, John
Publication Year :
2019
Publisher :
arXiv, 2019.

Abstract

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps -- where previously decoded latent states provide labels for later regression problems -- and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with na\"ive exploration, even when $Q$-learning has cheating access to latent states.<br />Comment: The ICML 2019 version omitted the second constraint on $\epsilon$ in Theorem 4.1. We thank Yonathan Efroni for calling this to our attention

Details

Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....23991c08ce83c0c8abda64b7487658a9
Full Text :
https://doi.org/10.48550/arxiv.1901.09018