Search

Your search keyword '"Krueger, David"' showing total 638 results

Search Constraints

Start Over You searched for: Author "Krueger, David" Remove constraint Author: "Krueger, David"
638 results on '"Krueger, David"'

Search Results

5. Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

6. Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games

7. Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders

8. Predicting Future Actions of Reinforcement Learning Agents

9. Integrating uncertainty quantification into randomized smoothing based robustness guarantees

10. Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

11. Microscopic theory of spin friction and dissipative spin dynamics

12. Influence Functions for Scalable Data Attribution in Diffusion Models

13. Analyzing (In)Abilities of SAEs via Formal Languages

14. PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning

15. Exploring the design space of deep-learning-based weather forecasting systems

16. Towards Interpreting Visual Information Processing in Vision-Language Models

17. Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

18. Permissive Information-Flow Analysis for Large Language Models

19. Input Space Mode Connectivity in Deep Neural Networks

20. Protecting against simultaneous data poisoning attacks

21. A deeper look at depth pruning of LLMs

22. The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

23. IDs for AI Systems

25. Stress-Testing Capability Elicitation With Password-Locked Models

26. Foundational Challenges in Assuring Alignment and Safety of Large Language Models

27. Affirmative safety: An approach to risk management for high-risk AI

28. Safety Cases: How to Justify the Safety of Advanced AI Systems

29. A Generative Model of Symmetry Transformations

30. Black-Box Access is Insufficient for Rigorous AI Audits

31. Visibility into AI Agents

32. Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

34. Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

35. Managing extreme AI risks amid rapid progress

36. Implicit meta-learning may lead language models to trust more reliable sources

37. Interpreting Learned Feedback Patterns in Large Language Models

38. Reward Model Ensembles Help Mitigate Overoptimization

39. Thinker: Learning to Plan and Act

40. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

41. Investigating the Nature of 3D Generalization in Deep Neural Networks

42. Geometrical torque on magnetic moments coupled to a correlated antiferromagnet

43. Characterizing Manipulation from AI Systems

44. Unifying Grokking and Double Descent

45. Harms from Increasingly Agentic Algorithmic Systems

46. Blockwise Self-Supervised Learning at Scale

47. On The Fragility of Learned Reward Functions

48. Domain Generalization for Robust Model-Based Offline Reinforcement Learning

49. Mechanistic Mode Connectivity

50. Broken Neural Scaling Laws

Catalog

Books, media, physical & digital resources