Search

Your search keyword '"Wang, Wenhai"' showing total 97 results

Search Constraints

Start Over You searched for: Author "Wang, Wenhai" Remove constraint Author: "Wang, Wenhai" Database arXiv Remove constraint Database: arXiv
97 results on '"Wang, Wenhai"'

Search Results

1. Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

2. Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

3. Agents4PLC: Automating Closed-loop PLC Code Generation and Verification in Industrial Control Systems using LLM-based Agents

4. MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

5. Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori

6. Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks

7. ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

8. Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs

9. MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

10. InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

11. Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization

12. OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

13. VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

14. Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

15. Needle In A Multimodal Haystack

16. LLMs Meet Multimodal Generation and Editing: A Survey

17. Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting

18. S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models

19. How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

20. InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

21. Does Knowledge Graph Really Matter for Recommender Systems?

22. Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

23. Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

24. The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

25. RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

26. LuaTaint: A Static Analysis System for Web Configuration Interface Vulnerability of Internet of Things Devices

27. FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution

28. MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

29. Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

30. Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion

31. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

32. A Survey of Reasoning with Foundation Models

33. DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

34. Prompting Frameworks for Large Language Models: A Survey

35. Exploring ChatGPT's Capabilities on Vulnerability Management

36. ControlLLM: Augment Language Models with Tools by Searching on Graphs

37. Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection

38. CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code

39. Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

40. Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

41. SyzTrust: State-aware Fuzzing on Trusted OS Designed for IoT Devices

42. FB-BEV: BEV Representation from Forward-Backward View Transformations

43. The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

44. AVSegFormer: Audio-Visual Segmentation with Transformer

45. Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

46. EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

47. VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

48. Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

49. VideoChat: Chat-Centric Video Understanding

50. InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

Catalog

Books, media, physical & digital resources