Search

Your search keyword '"Dou, Shihan"' showing total 46 results

Search Constraints

Start Over You searched for: Author "Dou, Shihan" Remove constraint Author: "Dou, Shihan"
46 results on '"Dou, Shihan"'

Search Results

1. Multi-Programming Language Sandbox for LLMs

2. RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

3. TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities

4. What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

5. SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

6. Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

7. MetaRM: Shifted Distributions Alignment via Meta-Learning

8. CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection

9. EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

10. CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

11. Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution

12. Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

13. StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

14. MouSi: Poly-Visual-Expert Vision-Language Models

15. Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

16. Rethinking Jailbreaking through the Lens of Representation Engineering

17. Secrets of RLHF in Large Language Models Part II: Reward Modeling

18. ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

19. LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

20. Gitor: Scalable Code Clone Detection by Building Global Sample Graph

21. Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons

22. Improving Generalization of Alignment with Human Preferences through Group Invariant Learning

23. Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback

24. The Rise and Potential of Large Language Model Based Agents: A Survey

25. Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey

26. Secrets of RLHF in Large Language Models Part I: PPO

27. On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection

28. DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization

29. CausalAPM: Generalizable Literal Disentanglement for NLU Debiasing

30. Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence Embedding

31. MINER: Improving Out-of-Vocabulary Named Entity Recognition from an Information Theoretic Perspective

32. Decorrelate Irrelevant, Purify Relevant: Overcome Textual Spurious Correlations from a Feature Perspective

33. Boosting the Capability of Intelligent Vulnerability Detection by Training in a Human-Learning Manner

34. Contrastive Learning for Robust Android Malware Familial Classification

35. Open the Pandora's Box of LLMs: Jailbreaking LLMs through Representation Engineering

41. VulCNN

45. IntDroid

46. SCDetector

Catalog

Books, media, physical & digital resources