Search

Your search keyword '"LUO Ping"' showing total 284 results

Search Constraints

Start Over You searched for: Author "LUO Ping" Remove constraint Author: "LUO Ping" Database arXiv Remove constraint Database: arXiv
284 results on '"LUO Ping"'

Search Results

1. RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)

2. Federated Prediction-Powered Inference from Decentralized Data

3. Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing

4. HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model

5. MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

6. AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

7. Low-Latency Privacy-Preserving Deep Learning Design via Secure MPC

8. Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

9. Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

10. Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

11. TCFormer: Visual Recognition via Token Clustering Transformer

12. When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

13. EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

14. IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

15. PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

16. DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning

17. Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

18. GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

19. VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

20. Needle In A Multimodal Haystack

21. Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

22. Uncovering Limitations of Large Language Models in Information Seeking from Tables

23. Learning Manipulation by Predicting Interaction

24. Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View

25. Part123: Part-aware 3D Reconstruction from a Single-view Image

26. AnalogCoder: Analog Circuit Design via Training-Free Code Generation

27. SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge

28. Score-based Generative Models with Adaptive Momentum

29. KET-QA: A Dataset for Knowledge Enhanced Table Question Answering

30. Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

31. Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs

32. UniFS: Universal Few-shot Instance Perception with Point Representations

33. MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

34. Adapting LLaMA Decoder to Vision Transformer

35. End-to-End Autonomous Driving through V2X Cooperation

36. DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

37. ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models

38. FlashFace: Human Image Personalization with High-fidelity Identity Preservation

39. DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving

40. Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients

41. Zero-shot Generative Linguistic Steganography

42. AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions

43. GenAD: Generalized Predictive Model for Autonomous Driving

44. ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine Translation

45. PixArt-\Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

46. RegionGPT: Towards Region Understanding Vision Language Model

47. Position: Towards Implicit Prompt For Text-To-Image Models

48. RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

49. AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

50. RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

Catalog

Books, media, physical & digital resources