Search

Your search keyword '"Zhang, Shanghang"' showing total 426 results

Search Constraints

Start Over You searched for: Author "Zhang, Shanghang" Remove constraint Author: "Zhang, Shanghang"
426 results on '"Zhang, Shanghang"'

Search Results

1. Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

2. Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective

3. EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting

4. MC-LLaVA: Multi-Concept Personalized Vision-Language Model

5. Learning from Different Samples: A Source-free Framework for Semi-supervised Domain Adaptation

6. Training-free Regional Prompting for Diffusion Transformers

7. Subgraph Aggregation for Out-of-Distribution Generalization on Graphs

8. Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

9. EVA: An Embodied World Model for Future Video Anticipation

10. SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

11. Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models

12. Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation

13. Discovering Long-Term Effects on Parameter Efficient Fine-tuning

14. FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models

15. MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

16. Multimodal Large Language Models for Bioimage Analysis

17. MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine

18. Fisher-aware Quantization for DETR Detectors with Critical-category Objectives

19. MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

20. RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation

21. $\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

22. Implicit Neural Image Field for Biological Microscopy Image Compression

23. Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

24. Compositional Few-Shot Class-Incremental Learning

25. CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

26. Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

27. Unveiling the Tapestry of Consistency in Large Vision-Language Models

28. Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention

29. LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

30. Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning

31. Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

32. SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera

33. Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

34. Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection

35. DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing

36. DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

37. A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track

38. Building Flexible Machine Learning Models for Scientific Computing at Scale

39. Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis

40. VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness

41. RustNeRF: Robust Neural Radiance Field with Low-Quality Images

42. VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

43. PiGW: A Plug-in Generative Watermarking Framework

44. A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models

45. LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model

46. Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation

47. Cloud-Device Collaborative Learning for Multimodal Large Language Models

48. Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training

49. FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection

50. LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

Catalog

Books, media, physical & digital resources