Search

Your search keyword '"Zhou, Pan"' showing total 126 results

Search Constraints

Start Over You searched for: Author "Zhou, Pan" Remove constraint Author: "Zhou, Pan" Topic computer science - computer vision and pattern recognition Remove constraint Topic: computer science - computer vision and pattern recognition
126 results on '"Zhou, Pan"'

Search Results

1. MoExtend: Tuning New Experts for Modality and Task Extension

2. A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

3. GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

4. MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

5. LOVA3: Learning to Visual Question Answering, Asking and Assessment

6. Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World

7. Diffusion Time-step Curriculum for One Image to 3D Generation

8. Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

9. What Makes Good Collaborative Views? Contrastive Mutual Information Maximization for Multi-Agent Perception

10. Few-shot Learner Parameterization by Diffusion Time-steps

11. MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

12. Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

13. Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator

14. Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

15. MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning

16. Instant3D: Instant Text-to-3D Generation

17. F$^2$AT: Feature-Focusing Adversarial Training via Disentanglement of Natural and Perturbed Patterns

18. ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

19. 3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

20. Fast Diffusion Model

21. Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

22. InceptionNeXt: When Inception Meets ConvNeXt

23. MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer

24. You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

25. Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos

26. Contrastive Video Question Answering via Video Graph Transformer

27. Tracking Objects and Activities with Attention for Temporal Sentence Grounding

28. STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition

29. Hypotheses Tree Building for One-Shot Temporal Sentence Localization

30. Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

31. Position-guided Text Prompt for Vision-Language Pre-training

32. MetaFormer Baselines for Vision

33. Towards Sustainable Self-supervised Learning

34. LPT: Long-tailed Prompt Tuning for Image Classification

35. Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

36. Hierarchical Local-Global Transformer for Temporal Sentence Grounding

37. Backdoor Attacks on Crowd Counting

38. Video Graph Transformer for Video Question Answering

39. Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video Grounding

40. Towards Understanding Why Mask-Reconstruction Pretraining Helps in Downstream Tasks

41. Inception Transformer

42. Bandits for Structure Perturbation-based Black-box Attacks to Graph Neural Networks with Theoretical Guarantees

43. Mugs: A Multi-Granular Self-Supervised Learning Framework

44. Self-Promoted Supervision for Few-Shot Transformer

45. Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding

46. Unsupervised Temporal Video Grounding with Deep Semantic Clustering

47. Exploring Motion and Appearance Information for Temporal Sentence Grounding

48. Memory-Guided Semantic Learning Network for Temporal Sentence Grounding

49. SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language Video Localization

50. DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition

Catalog

Books, media, physical & digital resources