Back to Search
Start Over
Sharp high-probability sample complexities for policy evaluation with linear function approximation
- Publication Year :
- 2023
- Publisher :
- arXiv, 2023.
-
Abstract
- This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms: the temporal difference (TD) learning algorithm and the two-timescale linear TD with gradient correction (TDC) algorithm. In both the on-policy setting, where observations are generated from the target policy, and the off-policy setting, where samples are drawn from a behavior policy potentially different from the target policy, we establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level. We also exhihit an explicit dependence on problem-related quantities, and show in the on-policy setting that our upper bound matches the minimax lower bound on crucial problem parameters, including the choice of the feature maps and the problem dimension.<br />Comment: The first two authors contributed equally
- Subjects :
- FOS: Computer and information sciences
Computer Science - Machine Learning
Statistics - Machine Learning
Optimization and Control (math.OC)
Computer Science - Information Theory
Information Theory (cs.IT)
FOS: Mathematics
Mathematics - Statistics Theory
Machine Learning (stat.ML)
Statistics Theory (math.ST)
Mathematics - Optimization and Control
Machine Learning (cs.LG)
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....3b628456ba92c347052d7eddfd099e5e
- Full Text :
- https://doi.org/10.48550/arxiv.2305.19001