Search

Your search keyword '"Tramer, Florian"' showing total 115 results

Search Constraints

Start Over You searched for: Author "Tramer, Florian" Remove constraint Author: "Tramer, Florian"
115 results on '"Tramer, Florian"'

Search Results

1. International AI Safety Report

2. Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards

3. Consistency Checks for Language Model Forecasters

4. SoK: Watermarking for AI-Generated Content

5. Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust

6. Measuring Non-Adversarial Reproduction of Training Data in Large Language Models

7. International Scientific Report on the Safety of Advanced AI (Interim Report)

8. Persistent Pre-Training Poisoning of LLMs

9. Gradient-based Jailbreak Images for Multimodal Fusion Models

10. Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

11. An Adversarial Perspective on Machine Unlearning for AI Safety

12. Extracting Training Data from Document-Based VQA Models

13. Adversarial Search Engine Optimization for Large Language Models

14. Blind Baselines Beat Membership Inference Attacks for Foundation Models

15. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

16. Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

17. Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition

18. Evaluations of Machine Learning Privacy Defenses are Misleading

19. Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

20. Foundational Challenges in Assuring Alignment and Safety of Large Language Models

21. Privacy Backdoors: Stealing Data with Corrupted Pretrained Models

22. JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

23. Stealing Part of a Production Language Model

24. Query-Based Adversarial Prompt Generation

25. Universal Jailbreak Backdoors from Poisoned Human Feedback

26. Privacy Side Channels in Machine Learning Systems

27. Backdoor Attacks for In-Context Learning with Language Models

28. Are aligned neural networks adversarially aligned?

29. Evaluating Superhuman Models with Consistency Checks

30. Evading Black-box Classifiers Without Breaking Eggs

31. Randomness in ML Defenses Helps Persistent Attackers and Hinders Evaluators

32. Poisoning Web-Scale Training Datasets is Practical

33. Tight Auditing of Differentially Private Machine Learning

34. Extracting Training Data from Diffusion Models

35. Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining

36. Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

37. Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems

38. Red-Teaming the Stable Diffusion Safety Filter

39. SNAP: Efficient Extraction of Private Properties with Poisoning

40. Measuring Forgetting of Memorized Training Examples

41. Increasing Confidence in Adversarial Robustness Evaluations

42. (Certified!!) Adversarial Robustness for Free!

43. The Privacy Onion Effect: Memorization is Relative

44. Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets

45. Debugging Differential Privacy: A Case Study for Privacy Auditing

46. Quantifying Memorization Across Neural Language Models

47. What Does it Mean for a Language Model to Preserve Privacy?

48. Counterfactual Memorization in Neural Language Models

49. Membership Inference Attacks From First Principles

50. Large Language Models Can Be Strong Differentially Private Learners

Catalog

Books, media, physical & digital resources