Search

Your search keyword '"Zhu, Xizhou"' showing total 202 results

Search Constraints

Start Over You searched for: Author "Zhu, Xizhou" Remove constraint Author: "Zhu, Xizhou"
202 results on '"Zhu, Xizhou"'

Search Results

1. Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

2. Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance

3. Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

4. MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

5. MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

6. TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

7. OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

8. VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

9. Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

10. Needle In A Multimodal Haystack

11. Learning 1D Causal Visual Representation with De-focus Attention Networks

12. Parameter-Inverted Image Pyramid Networks

13. How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

14. Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

15. The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

16. MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

17. Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

18. The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

19. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

20. DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

21. Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

22. ControlLLM: Augment Language Models with Tools by Searching on Graphs

23. Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

24. The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

25. ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

26. Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

27. VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

28. InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

29. Planning-oriented Autonomous Driving

30. BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

31. Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

32. Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

33. Demystify Transformers & Convolutions in Modern Image Deep Networks

34. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

35. Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

36. Siamese Image Modeling for Self-Supervised Vision Representation Learning

37. DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation

38. Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework

39. Searching Parameterized AP Loss for Object Detection

40. Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

41. VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

42. Collaborative Visual Navigation

43. AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks

44. Unsupervised Object Detection with LiDAR Clues

45. Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

46. Deformable DETR: Deformable Transformers for End-to-End Object Detection

47. Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation

48. Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation

49. VL-BERT: Pre-training of Generic Visual-Linguistic Representations

50. An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Catalog

Books, media, physical & digital resources