171 results on '"Florian Tramèr"'
Search Results
2. Poisoning Web-Scale Training Datasets is Practical.
3. Evaluating Superhuman Models with Consistency Checks.
4. Evading Black-box Classifiers Without Breaking Eggs.
5. Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining.
6. Stealing part of a production language model.
7. Extracting Training Data From Document-Based VQA Models.
8. Privacy Backdoors: Stealing Data with Corrupted Pretrained Models.
9. Universal Jailbreak Backdoors from Poisoned Human Feedback.
10. Privacy Side Channels in Machine Learning Systems.
11. Extracting Training Data from Diffusion Models.
12. Tight Auditing of Differentially Private Machine Learning.
13. SNAP: Efficient Extraction of Private Properties with Poisoning.
14. Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems.
15. Preventing Generation of Verbatim Memorization in Language Models Gives a False Sense of Privacy.
16. Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data.
17. Persistent Pre-Training Poisoning of LLMs.
18. Gradient-based Jailbreak Images for Multimodal Fusion Models.
19. Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI.
20. JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models.
21. Adversarial Search Engine Optimization for Large Language Models.
22. Query-Based Adversarial Prompt Generation.
23. Blind Baselines Beat Membership Inference Attacks for Foundation Models.
24. Foundational Challenges in Assuring Alignment and Safety of Large Language Models.
25. Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs.
26. AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
27. Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition.
28. What Does it Mean for a Language Model to Preserve Privacy?
29. Membership Inference Attacks From First Principles.
30. Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets.
31. Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them.
32. Quantifying Memorization Across Neural Language Models.
33. Measuring Forgetting of Memorized Training Examples.
34. (Certified!!) Adversarial Robustness for Free!
35. Counterfactual Memorization in Neural Language Models.
36. Are aligned neural networks adversarially aligned?
37. Students Parrot Their Teachers: Membership Inference on Model Distillation.
38. AISec '22: 15th ACM Workshop on Artificial Intelligence and Security.
39. Poisoning Web-Scale Training Datasets is Practical.
40. Scalable Extraction of Training Data from (Production) Language Models.
41. Backdoor Attacks for In-Context Learning with Language Models.
42. Privacy Side Channels in Machine Learning Systems.
43. Randomness in ML Defenses Helps Persistent Attackers and Hinders Evaluators.
44. Universal Jailbreak Backdoors from Poisoned Human Feedback.
45. Evading Black-box Classifiers Without Breaking Eggs.
46. Evaluating Superhuman Models with Consistency Checks.
47. Antipodes of Label Differential Privacy: PATE and ALIBI.
48. Extracting Training Data from Large Language Models.
49. Is Private Learning Possible with Instance Encoding?
50. Label-Only Membership Inference Attacks.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.