Search

Your search keyword '"He, Conghui"' showing total 228 results

Search Constraints

Start Over You searched for: Author "He, Conghui" Remove constraint Author: "He, Conghui"
228 results on '"He, Conghui"'

Search Results

1. Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

2. MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

3. PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

4. DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

5. LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

6. Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

7. Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning

8. Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models

9. MinerU: An Open-Source Solution for Precise Document Content Extraction

10. BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search

11. Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

12. CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

13. UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

14. CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis

15. Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning

16. Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network

17. SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm

18. Synth-Empathy: Towards High-Quality Synthetic Empathy Data

19. SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

20. Navigating the Data Trading Crossroads: An Interdisciplinary Survey

21. InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

22. KeyVideoLLM: Towards Large-scale Video Keyframe Selection

23. LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

24. DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

25. OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

26. OpenDataLab: Empowering General Artificial Intelligence with Open Datasets

27. DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data

28. A Survey of Multimodal Large Language Model from A Data-centric Perspective

29. FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

30. How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

31. UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

32. InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

33. 3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

34. SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation

35. VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

36. InternLM2 Technical Report

37. Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations

38. LOCR: Location-Guided Transformer for Optical Character Recognition

39. WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

40. ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training

41. SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

42. LongWanjuan: Towards Systematic Measurement for Long Text Quality

43. SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

44. InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

45. Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora

46. Parrot Captions Teach CLIP to Spot Text

47. ShareGPT4V: Improving Large Multi-modal Models with Better Captions

48. Parrot Captions Teach CLIP to Spot Text

49. MMBench: Is Your Multi-modal Model an All-Around Player?

50. OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Catalog

Books, media, physical & digital resources