Search

Your search keyword '"Li, Chunyuan"' showing total 303 results

Search Constraints

Start Over You searched for: Author "Li, Chunyuan" Remove constraint Author: "Li, Chunyuan" Publication Year Range Last 3 years Remove constraint Publication Year Range: Last 3 years
303 results on '"Li, Chunyuan"'

Search Results

1. Video Instruction Tuning With Synthetic Data

2. LLaVA-Critic: Learning to Evaluate Multimodal Models

3. MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

4. SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

5. LLaVA-OneVision: Easy Visual Task Transfer

6. LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

7. MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

8. LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

9. Long Context Transfer from Language to Vision

10. Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

11. MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

12. Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

13. Graphic Design with Large Multimodal Model

14. Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

15. Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

16. TrustLLM: Trustworthiness in Large Language Models

17. LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

18. A whole-slide foundation model for digital pathology from real-world data

19. Visual In-Context Prompting

20. LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

21. Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images

22. LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

23. Large Language Models are Visual Reasoning Coordinators

24. Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

25. BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys

26. Improved Baselines with Visual Instruction Tuning

27. HallE-Control: Controlling Object Hallucination in Large Multimodal Models

28. MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

29. Aligning Large Multimodal Models with Factually Augmented RLHF

30. Multimodal Foundation Models: From Specialists to General-Purpose Assistants

31. An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models

33. Benchmarking and Analyzing Generative Data for Visual Recognition

34. Semantic-SAM: Segment and Recognize Anything at Any Granularity

35. Large Multimodal Models: Notes on CVPR 2023 Tutorial

36. MIMIC-IT: Multi-Modal In-Context Instruction Tuning

37. LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

40. OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models

41. Towards Building the Federated GPT: Federated Instruction Tuning

42. Visual Instruction Tuning

43. Instruction Tuning with GPT-4

44. A Simple Framework for Open-Vocabulary Segmentation and Detection

45. Scaling Vision-Language Models with Sparse Mixture of Experts

46. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

47. Learning Customized Visual Models with Retrieval-Augmented Knowledge

48. GLIGEN: Open-Set Grounded Text-to-Image Generation

49. Generalized Decoding for Pixel, Image, and Language

Catalog

Books, media, physical & digital resources