Search

Your search keyword '"LUO Ping"' showing total 352 results

Search Constraints

Start Over You searched for: Author "LUO Ping" Remove constraint Author: "LUO Ping" Topic computer science - computer vision and pattern recognition Remove constraint Topic: computer science - computer vision and pattern recognition
352 results on '"LUO Ping"'

Search Results

1. Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing

2. MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

3. Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

4. Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

5. TCFormer: Visual Recognition via Token Clustering Transformer

6. When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

7. IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

8. PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

9. Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

10. GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

11. VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

12. Needle In A Multimodal Haystack

13. Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

14. Learning Manipulation by Predicting Interaction

15. Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View

16. Part123: Part-aware 3D Reconstruction from a Single-view Image

17. SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge

18. Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

19. UniFS: Universal Few-shot Instance Perception with Point Representations

20. MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

21. Adapting LLaMA Decoder to Vision Transformer

22. End-to-End Autonomous Driving through V2X Cooperation

23. FlashFace: Human Image Personalization with High-fidelity Identity Preservation

24. DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving

25. AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions

26. GenAD: Generalized Predictive Model for Autonomous Driving

27. PixArt-\Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

28. RegionGPT: Towards Region Understanding Vision Language Model

29. Position: Towards Implicit Prompt For Text-To-Image Models

30. RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

31. AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

32. RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

33. OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

34. PIXART-{\delta}: Fast and Controllable Image Generation with Latent Consistency Models

35. ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

36. Video Understanding with Large Language Models: A Survey

37. UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

38. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

39. DriveLM: Driving with Graph Visual Question Answering

40. Cached Transformers: Improving Transformers with Differentiable Memory Cache

41. SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

42. A Survey of Reasoning with Foundation Models

43. You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception

44. GenTron: Diffusion Transformers for Image and Video Generation

45. MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

46. MLLMs-Augmented Visual-Language Representation Learning

47. MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

48. Large Language Models as Automated Aligners for benchmarking Vision-Language Models

49. DiffusionMat: Alpha Matting as Sequential Refinement Learning

50. Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

Catalog

Books, media, physical & digital resources