123 results on '"Aldo Pacchiano"'
Search Results
2. Provable Interactive Learning with Hindsight Instruction Feedback.
3. Improving Offline RL by Blending Heuristics.
4. An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit.
5. Dueling RL: Reinforcement Learning with Trajectory Preferences.
6. Leveraging Offline Data in Online Reinforcement Learning.
7. State-free Reinforcement Learning.
8. ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization.
9. Second Order Bounds for Contextual Bandits with Function Approximation.
10. Learning Rate-Free Reinforcement Learning: A Case for Model Selection with Non-Stationary Objectives.
11. Estimating Optimal Policy Value in Linear Contextual Bandits Beyond Gaussianity.
12. Provably Sample Efficient RLHF via Active Preference Optimization.
13. Multiple-policy Evaluation via Density Estimation.
14. Experiment Planning with Function Approximation.
15. A Framework for Partially Observed Reward-States in RLHF.
16. Contextual Bandits with Stage-wise Constraints.
17. Meta Learning MDPs with linear transition models.
18. Towards an Understanding of Default Policies in Multitask Policy Optimization.
19. Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback.
20. Neural Design for Genetic Perturbation Experiments.
21. Anytime Model Selection in Linear Bandits.
22. Supervised Pretraining Can Learn In-Context Reinforcement Learning.
23. Experiment Planning with Function Approximation.
24. A Unified Model and Dimension for Interactive Estimation.
25. Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem.
26. Supervised Pretraining Can Learn In-Context Reinforcement Learning.
27. Estimating Optimal Policy Value in General Linear Contextual Bandits.
28. Data-Driven Regret Balancing for Online Model Selection in Bandits.
29. Improving Offline RL by Blending Heuristics.
30. Neural Pseudo-Label Optimism for the Bank Loan Problem.
31. Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection.
32. Tactical Optimism and Pessimism for Deep Reinforcement Learning.
33. Near Optimal Policy Optimization via REPS.
34. On the Theory of Reinforcement Learning with Once-per-Episode Feedback.
35. Learning the Truth From Only One Side of the Story.
36. Stochastic Bandits with Linear Constraints.
37. Online Model Selection for Reinforcement Learning with Function Approximation.
38. Towards tractable optimism in model-based reinforcement learning.
39. Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity.
40. Dynamic Balancing for Model Selection in Bandits and RL.
41. Robustness Guarantees for Mode Estimation with an Application to Bandits.
42. Best of Both Worlds Model Selection.
43. Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity.
44. Learning General World Models in a Handful of Reward-Free Deployments.
45. Convergence Rates of Smooth Message Passing with Rounding in Entropy-Regularized MAP Inference.
46. Practical Nonisotropic Monte Carlo Sampling in High Dimensions via Determinantal Point Processes.
47. Accelerated Message Passing for Entropy-Regularized MAP Inference.
48. Learning to Score Behaviors for Guided Policy Optimization.
49. Stochastic Flows and Geometric Optimization on the Orthogonal Group.
50. On Approximate Thompson Sampling with Langevin Algorithms.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.