Search

Your search keyword '"Li, Hongsheng"' showing total 2,199 results

Search Constraints

Start Over You searched for: Author "Li, Hongsheng" Remove constraint Author: "Li, Hongsheng"
2,199 results on '"Li, Hongsheng"'

Search Results

1. Stable Consistency Tuning: Understanding and Improving Consistency Models

2. PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

3. A foundation model for generalizable disease diagnosis in chest X-ray images

4. SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction

5. MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

6. CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection

7. I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow

8. Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow

9. Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

10. MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More

11. Delving Deep into Engagement Prediction of Short Videos

12. UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

13. MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation

14. SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation

15. PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

16. MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines

17. LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

18. GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

19. Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

20. MAVIS: Mathematical Visual Instruction Tuning

21. DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition

22. AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents

23. Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning

24. MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

25. Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

26. UniZero: Generalized and Efficient Planning with Scalable Latent World Models

27. A3VLM: Actionable Articulation-Aware Vision Language Model

28. Trim 3D Gaussian Splatting for Accurate Geometry Representation

29. Learning 1D Causal Visual Representation with De-focus Attention Networks

30. Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

31. Enhancing Vision-Language Model with Unmasked Token Alignment

32. Phased Consistency Model

33. Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

34. Empowering Character-level Text Infilling by Eliminating Sub-Tokens

35. ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation

36. SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

37. TerDiT: Ternary Diffusion Models with Transformers

38. Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

39. Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

40. MoVA: Adapting Mixture of Vision Experts to Multimodal Context

41. GLID: Pre-training a Generalist Encoder-Decoder Vision Model

42. Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

43. Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior

44. CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

45. CameraCtrl: Enabling Camera Control for Text-to-Video Generation

46. Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

47. ECNet: Effective Controllable Text-to-Image Diffusion Models

48. Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

49. MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

50. Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

Catalog

Books, media, physical & digital resources