Search

Your search keyword '"Xu, Haiyang"' showing total 1,311 results

Search Constraints

Start Over You searched for: Author "Xu, Haiyang" Remove constraint Author: "Xu, Haiyang"
1,311 results on '"Xu, Haiyang"'

Search Results

1. SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing

2. mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

3. MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model

4. mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

5. MIBench: Evaluating Multimodal Large Language Models over Multiple Images

6. OmniControlNet: Dual-stage Integration for Conditional Image Generation

7. Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

8. TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

9. mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

10. Bayesian Diffusion Models for 3D Shape Reconstruction

11. Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

12. Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

13. Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models

14. Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

15. Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection

17. TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training

18. Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

19. mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

20. AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

21. mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

22. UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

23. ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

24. Evaluation and Analysis of Hallucination in Large Vision-Language Models

25. COPA: Efficient Vision-Language Pre-training Through Collaborative Object- and Patch-Text Alignment

26. BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

27. mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

28. Vision Transformer with Attention Map Hallucination and FFN Compaction

29. Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

30. Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning

31. Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation

32. Transforming Visual Scene Graphs to Image Captions

33. mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

34. ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human

35. mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

36. Adaptively Clustering Neighbor Elements for Image-Text Generation

37. Learning Trajectory-Word Alignments for Video-Language Tasks

38. Quantitative analysis of cervical vertebral maturation in Chinese adolescents based on three-dimensional morphology of cervical vertebrae

39. HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

40. mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections

41. Image Captioning In the Transformer Age

43. EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

44. Achieving Human Parity on Visual Question Answering

48. Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training

Catalog

Books, media, physical & digital resources