Search

Your search keyword '"Yuan, Zehuan"' showing total 219 results

Search Constraints

Start Over You searched for: Author "Yuan, Zehuan" Remove constraint Author: "Yuan, Zehuan"
219 results on '"Yuan, Zehuan"'

Search Results

1. ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval

2. HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling

3. OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

4. Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

5. Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

6. Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

7. Generative Region-Language Pretraining for Open-Ended Object Detection

8. UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

9. General Object Foundation Model for Images and Videos at Scale

10. Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

11. Optimization Efficient Open-World Visual Region Recognition

12. CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

13. EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE

14. Exploring Transformers for Open-world Instance Segmentation

15. ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

16. Meta Compositional Referring Expression Segmentation

17. Token Boosting for Robust Self-Supervised Visual Transformer Pre-training

18. Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce

19. Multi-Level Contrastive Learning for Dense Prediction Task

20. EGC: Image Generation and Classification via a Diffusion Energy-Based Model

21. Universal Instance Perception as Object Discovery and Retrieval

22. Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling

23. QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware Part-Level Query

24. Learning Object-Language Alignments for Open-Vocabulary Object Detection

25. MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning

26. Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders

27. Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding

28. Rethinking Resolution in the Context of Efficient Video Recognition

29. MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic Segmentation

30. Single-Stage Open-world Instance Segmentation with Cross-task Consistency Regularization

31. You Should Look at All Objects

32. Towards Grand Unification of Object Tracking

33. Masked Generative Distillation

34. Birds of A Feather Flock Together: Category-Divergence Guidance for Domain Adaptive Segmentation

35. MetaFormer: A Unified Meta Framework for Fine-Grained Recognition

36. Content-Variant Reference Image Quality Assessment via Knowledge Distillation

37. Language as Queries for Referring Video Object Segmentation

38. Trimap-guided Feature Mining and Fusion Network for Natural Image Matting

39. DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

40. Focal and Global Knowledge Distillation for Detectors

41. Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation

42. ByteTrack: Multi-Object Tracking by Associating Every Detection Box

43. Objects in Semantic Topology

44. Weakly Supervised Person Search with Region Siamese Networks

45. Memory Based Video Scene Parsing

46. Center Prediction Loss for Re-identification

47. Conditional Hyper-Network for Blind Super-Resolution with Multiple Degradations

48. TransTrack: Multiple Object Tracking with Transformer

49. Slimmable Generative Adversarial Networks

50. What Makes for End-to-End Object Detection?

Catalog

Books, media, physical & digital resources