Search

Your search keyword '"Li, Chunyuan"' showing total 834 results

Search Constraints

Start Over You searched for: Author "Li, Chunyuan" Remove constraint Author: "Li, Chunyuan"
834 results on '"Li, Chunyuan"'

Search Results

1. MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

2. SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

3. LLaVA-OneVision: Easy Visual Task Transfer

4. LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

5. LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

6. Long Context Transfer from Language to Vision

7. Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

8. MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

9. Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

10. Graphic Design with Large Multimodal Model

11. Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

12. Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

13. TrustLLM: Trustworthiness in Large Language Models

14. LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

15. Visual In-Context Prompting

16. LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

17. Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images

18. LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

19. Large Language Models are Visual Reasoning Coordinators

20. Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

21. BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys

22. Improved Baselines with Visual Instruction Tuning

23. HallE-Control: Controlling Object Hallucination in Large Multimodal Models

24. MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

25. A whole-slide foundation model for digital pathology from real-world data

28. Aligning Large Multimodal Models with Factually Augmented RLHF

29. Multimodal Foundation Models: From Specialists to General-Purpose Assistants

30. An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

31. Benchmarking and Analyzing Generative Data for Visual Recognition

32. Semantic-SAM: Segment and Recognize Anything at Any Granularity

35. Large Multimodal Models: Notes on CVPR 2023 Tutorial

36. MIMIC-IT: Multi-Modal In-Context Instruction Tuning

37. LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

38. OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models

39. Towards Building the Federated GPT: Federated Instruction Tuning

40. Visual Instruction Tuning

41. Instruction Tuning with GPT-4

43. A Simple Framework for Open-Vocabulary Segmentation and Detection

44. Scaling Vision-Language Models with Sparse Mixture of Experts

45. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

46. Learning Customized Visual Models with Retrieval-Augmented Knowledge

47. GLIGEN: Open-Set Grounded Text-to-Image Generation

48. Generalized Decoding for Pixel, Image, and Language

49. Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics

50. Lafite2: Few-shot Text-to-Image Generation

Catalog

Books, media, physical & digital resources