Search

Your search keyword '"Guo, Phillip"' showing total 6 results

Search Constraints

Start Over You searched for: Author "Guo, Phillip" Remove constraint Author: "Guo, Phillip"
6 results on '"Guo, Phillip"'

Search Results

1. Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization

2. Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

3. Eight Methods to Evaluate Robust Unlearning in LLMs

4. Representation Engineering: A Top-Down Approach to AI Transparency

5. Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching

Catalog

Books, media, physical & digital resources