Search

Your search keyword '"Wang, Wenhai"' showing total 940 results

Search Constraints

Start Over You searched for: Author "Wang, Wenhai" Remove constraint Author: "Wang, Wenhai"
940 results on '"Wang, Wenhai"'

Search Results

1. Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

2. Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

3. Agents4PLC: Automating Closed-loop PLC Code Generation and Verification in Industrial Control Systems using LLM-based Agents

4. MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

5. Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori

6. Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks

7. ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

8. Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs

9. MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

10. InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

11. Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization

12. OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

13. VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

14. Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

15. Needle In A Multimodal Haystack

16. LLMs Meet Multimodal Generation and Editing: A Survey

17. Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting

18. S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models

19. How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

20. InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

21. Does Knowledge Graph Really Matter for Recommender Systems?

23. Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

24. Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

25. The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

26. RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

27. LuaTaint: A Static Analysis System for Web Configuration Interface Vulnerability of Internet of Things Devices

28. FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution

29. MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

30. Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

31. Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion

32. The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

33. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

34. A Survey of Reasoning with Foundation Models

35. DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

36. Prompting Frameworks for Large Language Models: A Survey

37. Exploring ChatGPT's Capabilities on Vulnerability Management

38. ControlLLM: Augment Language Models with Tools by Searching on Graphs

39. Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection

40. CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code

41. Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

42. Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

44. SyzTrust: State-aware Fuzzing on Trusted OS Designed for IoT Devices

45. FB-BEV: BEV Representation from Forward-Backward View Transformations

46. The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

47. AVSegFormer: Audio-Visual Segmentation with Transformer

49. Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

50. EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

Catalog

Books, media, physical & digital resources