Back to Search
Start Over
Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
- Source :
- Machine Learning, Machine Learning, 2014, 97 (3), pp.327-351. ⟨10.1007/s10994-014-5458-8⟩, Machine Learning, Springer Verlag, 2014, 97 (3), pp.327-351. ⟨10.1007/s10994-014-5458-8⟩
- Publication Year :
- 2014
- Publisher :
- HAL CCSD, 2014.
-
Abstract
- International audience; We introduce a novel approach to preference-based reinforcement learn-ing, namely a preference-based variant of a direct policy search method based on evolutionary optimization. The core of our approach is a preference-based racing algorithm that selects the best among a given set of candidate policies with high probability. To this end, the algorithm operates on a suitable ordinal preference structure and only uses pairwise comparisons between sample rollouts of the policies. Embedding the racing algorithm in a rank-based evolutionary search procedure, we show that approxima-tions of the so-called Smith set of optimal policies can be produced with certain theoretical guarantees. Apart from a formal performance and complexity analysis, we present first experimental studies showing that our approach performs well in practice.
- Subjects :
- Preference learning
business.industry
Computer science
Rank (computer programming)
Sample (statistics)
Smith set
Preference
Set (abstract data type)
[STAT.ML]Statistics [stat]/Machine Learning [stat.ML]
Artificial Intelligence
Reinforcement learning
Pairwise comparison
Artificial intelligence
business
Algorithm
Software
Subjects
Details
- Language :
- English
- ISSN :
- 08856125 and 15730565
- Database :
- OpenAIRE
- Journal :
- Machine Learning, Machine Learning, 2014, 97 (3), pp.327-351. ⟨10.1007/s10994-014-5458-8⟩, Machine Learning, Springer Verlag, 2014, 97 (3), pp.327-351. ⟨10.1007/s10994-014-5458-8⟩
- Accession number :
- edsair.doi.dedup.....2b22e617b3040b502a5c2ea3da3e4bb2