Search

Your search keyword '"Zhang, Shanghang"' showing total 144 results

Search Constraints

Start Over You searched for: Author "Zhang, Shanghang" Remove constraint Author: "Zhang, Shanghang" Topic computer science - computer vision and pattern recognition Remove constraint Topic: computer science - computer vision and pattern recognition
144 results on '"Zhang, Shanghang"'

Search Results

1. Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model

2. [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster

3. Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

4. Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective

5. EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting

6. MC-LLaVA: Multi-Concept Personalized Vision-Language Model

7. Learning from Different Samples: A Source-free Framework for Semi-supervised Domain Adaptation

8. Training-free Regional Prompting for Diffusion Transformers

9. Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

10. EVA: An Embodied World Model for Future Video Anticipation

11. SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

12. Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation

13. MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

14. MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

15. Fisher-aware Quantization for DETR Detectors with Critical-category Objectives

16. MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

17. RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation

18. $\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

19. Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

20. Compositional Few-Shot Class-Incremental Learning

21. CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

22. Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

23. Unveiling the Tapestry of Consistency in Large Vision-Language Models

24. Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

25. LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

26. Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

27. SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera

28. Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

29. Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection

30. DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing

31. DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

32. A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track

33. Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis

34. VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness

35. RustNeRF: Robust Neural Radiance Field with Low-Quality Images

36. VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

37. A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models

38. Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation

39. Cloud-Device Collaborative Learning for Multimodal Large Language Models

40. Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training

41. FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection

42. LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

43. Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

44. Gradient-based Parameter Selection for Efficient Fine-Tuning

45. Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior

46. Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting

47. MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning

48. MoEC: Mixture of Experts Implicit Neural Compression

49. M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation

50. Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

Catalog

Books, media, physical & digital resources