Search

Your search keyword '"Kravec, Shauna"' showing total 19 results

Search Constraints

Start Over You searched for: Author "Kravec, Shauna" Remove constraint Author: "Kravec, Shauna"
19 results on '"Kravec, Shauna"'

Search Results

1. Sabotage Evaluations for Frontier Models

2. Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

3. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

4. Evaluating and Mitigating Discrimination in Language Model Decisions

5. Specific versus General Principles for Constitutional AI

6. Towards Understanding Sycophancy in Language Models

7. The Capacity for Moral Self-Correction in Large Language Models

8. Discovering Language Model Behaviors with Model-Written Evaluations

9. Constitutional AI: Harmlessness from AI Feedback

10. Measuring Progress on Scalable Oversight for Large Language Models

11. Toy Models of Superposition

12. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

13. Language Models (Mostly) Know What They Know

14. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

15. Predictability and Surprise in Large Generative Models

16. High Energy Problems, Low Energy Solutions

17. Discovering Language Model Behaviors with Model-Written Evaluations

18. Predictability and Surprise in Large Generative Models

Catalog

Books, media, physical & digital resources