Search

Your search keyword '"LUO Ping"' showing total 4,675 results

Search Constraints

Start Over You searched for: Author "LUO Ping" Remove constraint Author: "LUO Ping"
4,675 results on '"LUO Ping"'

Search Results

51. Position: Towards Implicit Prompt For Text-To-Image Models

52. RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

53. AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

54. RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

55. BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

56. OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

57. PIXART-{\delta}: Fast and Controllable Image Generation with Latent Consistency Models

58. LLaMA Pro: Progressive LLaMA with Block Expansion

59. ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

63. Structural Elucidation and NMR Spectral Assignments of Two C19-Diterpenoid Alkaloids

64. Video Understanding with Large Language Models: A Survey

65. UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

66. InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

67. DriveLM: Driving with Graph Visual Question Answering

68. Cached Transformers: Improving Transformers with Differentiable Memory Cache

69. SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

70. A Survey of Reasoning with Foundation Models

71. You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception

72. GenTron: Diffusion Transformers for Image and Video Generation

73. MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

74. MLLMs-Augmented Visual-Language Representation Learning

75. MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

76. Large Language Models as Automated Aligners for benchmarking Vision-Language Models

77. DiffusionMat: Alpha Matting as Sequential Refinement Learning

78. Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

79. Harvest Video Foundation Models via Efficient Post-Pretraining

80. Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

81. MeanAP-Guided Reinforced Active Learning for Object Detection

82. Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching

83. Guideline Learning for In-context Information Extraction

84. LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

88. PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

89. SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations

90. StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation

91. MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

92. GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition

93. Beyond One-to-One: Rethinking the Referring Image Segmentation

94. OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

95. RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs

96. Foundation Model is Efficient Multimodal Multitask Model Selector

97. RIGID: Recurrent GAN Inversion and Editing of Real Face Videos

98. Exploring Transformers for Open-world Instance Segmentation

99. TinyLVLM-eHub: Towards Comprehensive and Efficient Evaluation for Large Vision-Language Models

100. InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Catalog

Books, media, physical & digital resources