Search

Your search keyword '"vision-language model"' showing total 119 results

Search Constraints

Start Over You searched for: Descriptor "vision-language model" Remove constraint Descriptor: "vision-language model"
119 results on '"vision-language model"'

Search Results

8. IMSearch 2.0: Toward User-Centric and Efficient Interactive Multimedia Retrieval System

9. OneDiff: A Generalist Model for Image Difference Captioning

10. Generalizing to Unseen Domains via Text-Guided Augmentation: A Training-Free Approach

11. Quantized Prompt for Efficient Generalization of Vision-Language Models

12. 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

13. Zero-Shot Spatio-Temporal Action Detection by Enhancing Context-Relation Capability of Vision-Language Models

14. Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models

16. Integrating Vision-Tool to Enhance Visual-Question-Answering in Special Domains

17. Federated Prompt Tuning: When is it Necessary?

18. Improving Anomaly Scene Recognition with Large Vision-Language Models

19. LG-Gaze: Learning Geometry-Aware Continuous Prompts for Language-Guided Gaze Estimation

20. E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation

21. Leveraging Temporal Contextualization for Video Action Recognition

22. FlexAttention for Efficient High-Resolution Vision-Language Models

23. Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks

24. TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-spoofing

25. A vision-language model for predicting potential distribution land of soybean double cropping.

26. A vision-language model with multi-granular knowledge fusion in medical imaging.

27. Auto-Rad: End-to-End Report Generation from Lumber Spine MRI Using Vision–Language Model.

28. Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance

29. Multimodal fusion: advancing medical visual question-answering.

30. CLIP-Llama: A New Approach for Scene Text Recognition with a Pre-Trained Vision-Language Model and a Pre-Trained Language Model.

31. LEVIOSA: Natural Language-Based Uncrewed Aerial Vehicle Trajectory Generation.

32. Robotic environmental state recognition with pre-trained vision-language models and black-box optimization.

33. Integrating Vision‐Language Models for Accelerated High‐Throughput Nutrition Screening.

34. A Vision–Language Model-Based Traffic Sign Detection Method for High-Resolution Drone Images: A Case Study in Guyuan, China.

35. A vision-language model for predicting potential distribution land of soybean double cropping

36. Visual information guided multi-modal model for plant disease anomaly detection

38. IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models

39. Application of CLIP for efficient zero-shot learning.

40. CLIP feature-based randomized control using images and text for multiple tasks and robots.

41. IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models.

42. Synergistic Fusion: Vision-Language Models in Advancing Autonomous Driving and Intelligent Transportation Systems

43. LGA: A Language Guide Adapter for Advancing the SAM Model’s Capabilities in Medical Image Segmentation

44. fTSPL: Enhancing Brain Analysis with FMRI-Text Synergistic Prompt Learning

45. Towards a Text-Based Quantitative and Explainable Histopathology Image Analysis

46. Can LLMs’ Tuning Methods Work in Medical Multimodal Domain?

47. BrainSCK: Brain Structure and Cognition Alignment via Knowledge Injection and Reactivation for Diagnosing Brain Disorders

48. KDNet: Leveraging Vision-Language Knowledge Distillation for Few-Shot Object Detection

49. Centered Masking for Language-Image Pre-training

50. MixPrompt: Enhancing Generalizability and Adversarial Robustness for Vision-Language Models via Prompt Fusion

Catalog

Books, media, physical & digital resources