225 results on '"Michal Valko"'
Search Results
2. Decoding-time Realignment of Language Models.
3. Generalized Preference Optimization: A Unified Approach to Offline Alignment.
4. Nash Learning from Human Feedback.
5. Human Alignment of Large Language Models through Online Preference Optimisation.
6. Demonstration-Regularized RL.
7. Unlocking the Power of Representations in Long-term Novelty-based Exploration.
8. Identification of Microbial and Proteomic Biomarkers in Early Childhood Caries
9. Preference Optimization with Multi-Sample Comparisons.
10. A New Bound on the Cumulant Generating Function of Dirichlet Processes.
11. Optimal Design for Reward Modeling in RLHF.
12. Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving.
13. Understanding the performance gap between online and offline alignment algorithms.
14. Adapting to game trees in zero-sum imperfect information games.
15. Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments.
16. Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice.
17. DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm.
18. VA-learning as a more efficient alternative to Q-learning.
19. Fast Rates for Maximum Entropy Exploration.
20. Understanding Self-Predictive Learning for Reinforcement Learning.
21. Half-Hop: A graph upsampling approach for slowing down message passing.
22. Quantile Credit Assignment.
23. Marginalized Operators for Off-policy Reinforcement Learning.
24. Adaptive Multi-Goal Exploration.
25. From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses.
26. Retrieval-Augmented Reinforcement Learning.
27. Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times.
28. Model-free Posterior Sampling via Learning Rate Randomization.
29. Local and adaptive mirror descents in extensive-form games.
30. Nash Learning from Human Feedback.
31. Unlocking the Power of Representations in Long-term Novelty-based Exploration.
32. A General Theoretical Paradigm to Understand Learning from Human Preferences.
33. Demonstration-Regularized RL.
34. Broaden Your Views for Self-Supervised Video Learning.
35. Learning in two-player zero-sum partially observable Markov games with perfect recall.
36. A Provably Efficient Sample Collection Strategy for Reinforcement Learning.
37. Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity.
38. Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret.
39. Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation.
40. Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model.
41. Adaptive Reward-Free Exploration.
42. Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited.
43. A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces.
44. Fast active learning for pure exploration in reinforcement learning.
45. Revisiting Peng's Q(λ) for Modern Reinforcement Learning.
46. Online A-Optimal Design and Active Linear Regression.
47. Taylor Expansion of Discount Factors.
48. Kernel-Based Reinforcement Learning: A Finite-Time Analysis.
49. UCB Momentum Q-learning: Correcting the bias without forgetting.
50. Covariance-adapting algorithm for semi-bandits with application to sparse outcomes.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.