119 results on '"vision-language model"'
Search Results
2. Enhancing visual representation for text-based person searching
3. An enhanced domain generalization method for object detection based on text guided feature disentanglement
4. IndVisSGG: VLM-based scene graph generation for industrial spatial intelligence
5. Large language model-augmented learning for auto-delineation of treatment targets in head-and-neck cancer radiotherapy
6. Context-aware prompt learning for test-time vision recognition with frozen vision-language model
7. Vision–language pre-training for graph-based handwritten mathematical expression recognition
8. IMSearch 2.0: Toward User-Centric and Efficient Interactive Multimedia Retrieval System
9. OneDiff: A Generalist Model for Image Difference Captioning
10. Generalizing to Unseen Domains via Text-Guided Augmentation: A Training-Free Approach
11. Quantized Prompt for Efficient Generalization of Vision-Language Models
12. 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
13. Zero-Shot Spatio-Temporal Action Detection by Enhancing Context-Relation Capability of Vision-Language Models
14. Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
15. CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-Shot Anomaly Detection
16. Integrating Vision-Tool to Enhance Visual-Question-Answering in Special Domains
17. Federated Prompt Tuning: When is it Necessary?
18. Improving Anomaly Scene Recognition with Large Vision-Language Models
19. LG-Gaze: Learning Geometry-Aware Continuous Prompts for Language-Guided Gaze Estimation
20. E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation
21. Leveraging Temporal Contextualization for Video Action Recognition
22. FlexAttention for Efficient High-Resolution Vision-Language Models
23. Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
24. TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-spoofing
25. A vision-language model for predicting potential distribution land of soybean double cropping.
26. A vision-language model with multi-granular knowledge fusion in medical imaging.
27. Auto-Rad: End-to-End Report Generation from Lumber Spine MRI Using Vision–Language Model.
28. Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance
29. Multimodal fusion: advancing medical visual question-answering.
30. CLIP-Llama: A New Approach for Scene Text Recognition with a Pre-Trained Vision-Language Model and a Pre-Trained Language Model.
31. LEVIOSA: Natural Language-Based Uncrewed Aerial Vehicle Trajectory Generation.
32. Robotic environmental state recognition with pre-trained vision-language models and black-box optimization.
33. Integrating Vision‐Language Models for Accelerated High‐Throughput Nutrition Screening.
34. A Vision–Language Model-Based Traffic Sign Detection Method for High-Resolution Drone Images: A Case Study in Guyuan, China.
35. A vision-language model for predicting potential distribution land of soybean double cropping
36. Visual information guided multi-modal model for plant disease anomaly detection
37. GL-MCM: Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection: GL-MCM: Global and Local Maximum Concept Matching for Zero-Shot...
38. IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models
39. Application of CLIP for efficient zero-shot learning.
40. CLIP feature-based randomized control using images and text for multiple tasks and robots.
41. IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models.
42. Synergistic Fusion: Vision-Language Models in Advancing Autonomous Driving and Intelligent Transportation Systems
43. LGA: A Language Guide Adapter for Advancing the SAM Model’s Capabilities in Medical Image Segmentation
44. fTSPL: Enhancing Brain Analysis with FMRI-Text Synergistic Prompt Learning
45. Towards a Text-Based Quantitative and Explainable Histopathology Image Analysis
46. Can LLMs’ Tuning Methods Work in Medical Multimodal Domain?
47. BrainSCK: Brain Structure and Cognition Alignment via Knowledge Injection and Reactivation for Diagnosing Brain Disorders
48. KDNet: Leveraging Vision-Language Knowledge Distillation for Few-Shot Object Detection
49. Centered Masking for Language-Image Pre-training
50. MixPrompt: Enhancing Generalizability and Adversarial Robustness for Vision-Language Models via Prompt Fusion
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.