Search

Your search keyword '"Zhang, Shanghang"' showing total 133 results

Search Constraints

Start Over You searched for: Author "Zhang, Shanghang" Remove constraint Author: "Zhang, Shanghang" Database arXiv Remove constraint Database: arXiv
133 results on '"Zhang, Shanghang"'

Search Results

1. Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model

2. [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster

3. Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

4. Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective

5. EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting

6. MC-LLaVA: Multi-Concept Personalized Vision-Language Model

7. Learning from Different Samples: A Source-free Framework for Semi-supervised Domain Adaptation

8. Training-free Regional Prompting for Diffusion Transformers

9. Subgraph Aggregation for Out-of-Distribution Generalization on Graphs

10. Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

11. EVA: An Embodied World Model for Future Video Anticipation

12. SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

13. Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models

14. Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation

15. Discovering Long-Term Effects on Parameter Efficient Fine-tuning

16. FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models

17. MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

18. Multimodal Large Language Models for Bioimage Analysis

19. MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

20. Fisher-aware Quantization for DETR Detectors with Critical-category Objectives

21. MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

22. RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation

23. $\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

24. Implicit Neural Image Field for Biological Microscopy Image Compression

25. Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

26. Compositional Few-Shot Class-Incremental Learning

27. CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

28. Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

29. Unveiling the Tapestry of Consistency in Large Vision-Language Models

30. Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

31. LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

32. Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning

33. Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

34. SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera

35. Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

36. Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection

37. DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing

38. DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

39. A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track

40. Building Flexible Machine Learning Models for Scientific Computing at Scale

41. Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis

42. VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness

43. RustNeRF: Robust Neural Radiance Field with Low-Quality Images

44. VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

45. PiGW: A Plug-in Generative Watermarking Framework

46. A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models

47. Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation

48. Cloud-Device Collaborative Learning for Multimodal Large Language Models

49. Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training

50. FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection

Catalog

Books, media, physical & digital resources