Search

Your search keyword '"Lambert, Nathan"' showing total 37 results

Search Constraints

Start Over You searched for: Author "Lambert, Nathan" Remove constraint Author: "Lambert, Nathan" Publication Year Range Last 3 years Remove constraint Publication Year Range: Last 3 years
37 results on '"Lambert, Nathan"'

Search Results

1. OLMoE: Open Mixture-of-Experts Language Models

2. Self-Directed Synthetic Dialogues and Revisions Technical Report

3. WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

4. Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

5. Towards a Framework for Openness in Foundation Models: Proceedings from the Columbia Convening on Openness in Artificial Intelligence

6. D2PO: Discriminator-Guided DPO with Response Evaluation Models

7. Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback

8. RewardBench: Evaluating Reward Models for Language Modeling

9. A Survey on Data Selection for Language Models

10. OLMo: Accelerating the Science of Language Models

11. Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

12. Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

13. The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

14. Zephyr: Direct Distillation of LM Alignment

15. Entangled Preferences: The History and Risks of Reinforcement Learning and Human Feedback

16. A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

17. Confidence-Building Measures for Artificial Intelligence: Workshop Proceedings

18. BLISS: Interplanetary Exploration with Swarms of Low-Cost Spacecraft

19. Measuring Data

20. Reward Reports for Reinforcement Learning

21. Investigating Compounding Prediction Errors in Learned Dynamics Models

22. Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems

23. The Challenges of Exploration for Offline Reinforcement Learning

25. Femoral Artery Closure Devices vs Manual Compression During Cardiac Catheterization and Percutaneous Coronary Intervention

29. Synergy of Prediction and Control in Model-based Reinforcement Learning

32. BotNet: A Simulator for Studying the Effects of Accurate Communication Models on Multi-Agent and Swarm Control

Catalog

Books, media, physical & digital resources