33,140 results on '"image retrieval"'
Search Results
2. Exploiting deep cross-semantic features for image retrieval
- Author
-
Lu, Zhou, Liu, Guang-Hai, Hu, Jin-Kun, and Li, Zuoyong
- Published
- 2025
- Full Text
- View/download PDF
3. Integrating conceptual and visual representations with domain expertise for scalable visual plagiarism detection
- Author
-
Cui, Shenglan, Liu, Zhixiong, Liu, Fang, Ye, Yunfan, and Zhang, Mohan
- Published
- 2025
- Full Text
- View/download PDF
4. Deep hashing with mutual information: A comprehensive strategy for image retrieval
- Author
-
Chen, Yinqi, Lu, Zhiyi, Zheng, Yangting, Li, Peiwen, Luo, Weijian, and Kang, Shuo
- Published
- 2025
- Full Text
- View/download PDF
5. Deep multi-negative supervised hashing for large-scale image retrieval
- Author
-
Liu, Yingfan, Qiao, Xiaotian, Liu, Zhaoqing, Xia, Xiaofang, Zhang, Yinlong, and Cui, Jiangtao
- Published
- 2025
- Full Text
- View/download PDF
6. Convolution-based Visual Transformer with Dual Space Shifting Attention hashing for image retrieval
- Author
-
Ren, Huan, Cheng, Shuli, Wang, Liejun, and Li, Yongming
- Published
- 2025
- Full Text
- View/download PDF
7. Frequency Decoupling Enhancement and Mamba Depth Extraction-Based Feature Fusion in Transformer Hashing Image Retrieval
- Author
-
Chen, Jiayi, Cheng, Shuli, Wang, Liejun, Li, Yongming, and Zou, Qiang
- Published
- 2025
- Full Text
- View/download PDF
8. High precision 3D reconstruction and target location based on the fusion of visual features and point cloud registration
- Author
-
Chen, Junliang, Wei, Xiaolong, Liang, Xiaoqing, Xu, Haojun, Zhou, Liucheng, He, Weifeng, Ma, Yunpeng, and Yin, Yizhen
- Published
- 2025
- Full Text
- View/download PDF
9. Design of CSD based bi-orthogonal wavelet filter bank for medical image retrieval
- Author
-
Samantaray, Aswini Kumar, Gorre, Pradeep, and Sahoo, Prabodh Kumar
- Published
- 2023
- Full Text
- View/download PDF
10. Unsupervised Effectiveness Estimation Measure Based on Rank Correlation for Image Retrieval
- Author
-
Almeida, Thiago César Castilho, Valem, Lucas Pascotti, Pedronette, Daniel Carlos Guimarães, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bebis, George, editor, Patel, Vishal, editor, Gu, Jinwei, editor, Panetta, Julian, editor, Gingold, Yotam, editor, Johnsen, Kyle, editor, Arefin, Mohammed Safayet, editor, Dutta, Soumya, editor, and Biswas, Ayan, editor
- Published
- 2025
- Full Text
- View/download PDF
11. Cross-View Geo-Localization via Learning Correspondence Semantic Similarity Knowledge
- Author
-
Chen, Guanli, Huang, Guoheng, Yuan, Xiaochen, Chen, Xuhang, Zhong, Guo, Pun, Chi-Man, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ide, Ichiro, editor, Kompatsiaris, Ioannis, editor, Xu, Changsheng, editor, Yanai, Keiji, editor, Chu, Wei-Ta, editor, Nitta, Naoko, editor, Riegler, Michael, editor, and Yamasaki, Toshihiko, editor
- Published
- 2025
- Full Text
- View/download PDF
12. ULTRON: Unifying Local Transformer and Convolution for Large-Scale Image Retrieval
- Author
-
Kweon, Minseong, Park, Jinsun, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cho, Minsu, editor, Laptev, Ivan, editor, Tran, Du, editor, Yao, Angela, editor, and Zha, Hongbin, editor
- Published
- 2025
- Full Text
- View/download PDF
13. Oracle Bone Inscription Image Retrieval Based on Improved ResNet Network
- Author
-
Ding, Jun, Wang, Jiaoyan, Aysa, Alimjan, Xu, Xuebin, Ubul, Kurban, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
14. Fashion Image Retrieval with Occlusion
- Author
-
Sohn, Jimin, Jung, Haeji, Yan, Zhiwen, Masti, Vibha, Li, Xiang, Raj, Bhiksha, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
15. Long-Tailed Hashing with Wasserstein Quantization
- Author
-
Fu, Zujun, Lai, Hanjiang, Pan, Yan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
16. CMAEH: Contrastive Masked Autoencoder Based Hashing for Efficient Image Retrieval
- Author
-
Kumar, Mehul, Sharma, Aditya, Mukherjee, Prerana, Jerripothula, Koteswar Rao, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
17. Learning-Based Sub-image Retrieval in Historical Document Images
- Author
-
Assaker, Joseph, Nicolas, Stéphane, Heutte, Laurent, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
18. Group Testing for Accurate and Efficient Range-Based Near Neighbor Search for Plagiarism Detection
- Author
-
Shah, Harsh, Mittal, Kashish, Rajwade, Ajit, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
19. IRGen: Generative Modeling for Image Retrieval
- Author
-
Zhang, Yidan, Zhang, Ting, Chen, Dong, Wang, Yujing, Chen, Qi, Xie, Xing, Sun, Hao, Deng, Weiwei, Zhang, Qi, Yang, Fan, Yang, Mao, Liao, Qingmin, Wang, Jingdong, Guo, Baining, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
20. CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching
- Author
-
Shafique, Samia, Kong, Shu, Fowlkes, Charless, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
21. UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
- Author
-
Wei, Cong, Chen, Yang, Chen, Haonan, Hu, Hexiang, Zhang, Ge, Fu, Jie, Ritter, Alan, Chen, Wenhu, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
22. MeshVPR: Citywide Visual Place Recognition Using 3D Meshes
- Author
-
Berton, Gabriele, Junglas, Lorenz, Zaccone, Riccardo, Pollok, Thomas, Caputo, Barbara, Masone, Carlo, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
23. Recovering Latent Hierarchical Relationships in Image Datasets Through Hyperbolic Embeddings
- Author
-
Roberts, Ian, Araya, Mauricio, Ñanculef, Ricardo, Mallea, Mario, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hernández-García, Ruber, editor, Barrientos, Ricardo J., editor, and Velastin, Sergio A., editor
- Published
- 2025
- Full Text
- View/download PDF
24. Statewide Visual Geolocalization in the Wild
- Author
-
Fervers, Florian, Bullinger, Sebastian, Bodensteiner, Christoph, Arens, Michael, Stiefelhagen, Rainer, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
25. Unsupervised Multi-criteria Adversarial Detection in Deep Image Retrieval
- Author
-
Xiao, Yanru, Wang, Cong, Gao, Xing, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Duan, Haixin, editor, Debbabi, Mourad, editor, de Carné de Carnavalet, Xavier, editor, Luo, Xiapu, editor, Du, Xiaojiang, editor, and Au, Man Ho Allen, editor
- Published
- 2025
- Full Text
- View/download PDF
26. Performance analysis of image retrieval system using deep learning techniques.
- Author
-
B, Selvalakshmi, K, Hemalatha, S, Kumarganesh, and P, Vijayalakshmi
- Abstract
The image retrieval is the process of retrieving the relevant images to the query image with minimal searching time in internet. The problem of the conventional Content-Based Image Retrieval (CBIR) system is that they produce retrieval results for either colour images or grey scale images alone. Moreover, the CBIR system is more complex which consumes more time period for producing the significant retrieval results. These problems are overcome through the proposed methodologies stated in this work. In this paper, the General Image (GI) and Medical Image (MI) are retrieved using deep learning architecture. The proposed system is designed with feature computation module, Retrieval Convolutional Neural Network (RETCNN) module, and Distance computation algorithm. The distance computation algorithm is used to compute the distances between the query image and the images in the datasets and produces the retrieval results. The average precision and recall for the proposed RETCNN-based CBIRS is 98.98% and 99.15% respectively for GI category, and the average precision and recall for the proposed RETCNN-based CBIRS are 99.04% and 98.89% respectively for MI category. The significance of these experimental results is used to produce the higher image retrieval rate of the proposed system. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
27. Manifold and patch-based unsupervised deep metric learning for fine-grained image retrieval: Manifold and patch-based unsupervised deep metric learning for fine-grained image retrieval: S. Yuan et al.
- Author
-
Yuan, Shi-hao, Feng, Yong, Qiu, A-Gen, Duan, Guo-fan, Zhou, Ming-liang, Qiang, Bao-hua, and Wang, Yong-heng
- Subjects
DEEP learning ,IMAGE retrieval ,ARTIFICIAL intelligence ,COGNITIVE psychology ,IMAGE processing - Abstract
Accurately and swiftly retrieving from fine-grained images is a critical and challenging task. As the key technology for fine-grained image retrieval, deep metric learning aims to learn a mapping space, where samples exhibit two properties: positive concentration and negative separation, facilitating the measurement of similarities between samples. Unsupervised deep metric learning, which obviates the need for labels during training, has garnered widespread attention compared to its supervised counterparts due to its convenience. Current methods in unsupervised deep metric learning face issues such as imbalance in sample construction, difficulty in sample differentiation, and neglect of intrinsic image features. To address these challenges, we propose Manifold and Patch-based Unsupervised Deep Metric Learning (MPUDML) for Fine-Grained Image Retrieval. Specifically, we adopt a manifold similarity-based balanced sampling strategy for constructing more balanced mini-batch samples. Moreover, we leverage soft supervision information obtained from the manifold and cosine similarities between unlabeled images for sample differentiation, effectively reducing the impact of noisy samples. Additionally, we utilize the rich feature information between internal image patches through image patch-level clustering and localization tasks to guide the acquisition of a more comprehensive feature embedding representation, thereby enhancing retrieval performance. Our method, MPUDML, was evaluated against various state-of-the-art unsupervised deep metric learning approaches in fine-grained image retrieval and clustering tasks. Experimental findings indicate that our MPUDML method exceeds other advanced methods in recall (R@K) and Normalized Mutual Information (NMI). [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
28. Deep deterministic policy gradients with a self-adaptive reward mechanism for image retrieval: Deep deterministic policy gradients...: F. Ahmad et al.
- Author
-
Ahmad, Farooq, Zhang, Xinfeng, Tang, Zifang, Sabah, Fahad, Azam, Muhammad, and Sarwar, Raheem
- Abstract
Traditional image retrieval methods often face challenges in adapting to varying user preferences and dynamic datasets. To address these limitations, this research introduces a novel image retrieval framework utilizing deep deterministic policy gradients (DDPG) augmented with a self-adaptive reward mechanism (SARM). The DDPG-SARM framework dynamically adjusts rewards based on user feedback and retrieval context, enhancing the learning efficiency and retrieval accuracy of the agent. Key innovations include dynamic reward adjustment based on user feedback, context-aware reward structuring that considers the specific characteristics of each retrieval task, and an adaptive learning rate strategy to ensure robust and efficient model convergence. Extensive experimentation with the three distinct datasets demonstrates that the proposed framework significantly outperforms traditional methods, achieving the highest retrieval accuracy having 3.38%, 5.26%, and 0.21% improvement overall as compared to the mainstream models over DermaMNIST, PneumoniaMNIST, and OrganMNIST datasets, respectively. The findings contribute to the advancement of reinforcement learning applications in image retrieval, providing a user-centric solution adaptable to various dynamic environments. The proposed method also offers a promising direction for future developments in intelligent image retrieval systems. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
29. Developing a model semantic‐based image retrieval by combining KD‐Tree structure with ontology.
- Author
-
Le, Thanh Manh, Dinh, Nguyen Thi, and Van, Thanh The
- Subjects
- *
IMAGE retrieval , *DATA structures , *MACHINE learning , *DATA modeling - Abstract
The paper proposes an alternative approach to improve the performance of image retrieval. In this work, a framework for image retrieval based on machine learning and semantic retrieval is proposed. In the preprocessing phase, the image is segmented objects by using Graph‐cut, and the feature vectors of objects presented in the image and their visual relationships are extracted using R‐CNN. The feature vectors, visual relationships, and their symbolic labels are stored in KD‐Tree data structures which can be used to predict the label of objects and visual relationships later. To facilitate semantic query, the images use the RDF data model and create an ontology for the symbolic labels annotated. For each query image, after extracting their feature vectors, the KD‐Tree is used to classify the objects and predict their relationship. After that, a SPARQL query is built to extract a set of similar images. The SPARQL query consists of triple statements describing the objects and their relationship which were previously predicted. The evaluation of the framework with the MS‐COCO dataset and Flickr showed that the precision achieved scores of 0.9218 and 0.9370, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
30. Image retrieval by aggregating deep orientation structure features.
- Author
-
Lu, Fen and Liu, Guang-Hai
- Abstract
Aggregating deep features for image retrieval has shown excellent performance in terms of accuracy. However, exploring visual perception properties to activate the dormant discriminative cues of deep convolutional feature maps has received little attention. To address this issue, we present a novel representation, namely the deep orientation aggregation histogram, to image retrieval via aggregating deep orientation structure features. Its main highlights are: (1) A statistical orientation computation model is proposed to detect candidate directions. It will help to use the feature maps to exploit various orientation to provide robust representation. (2) A computed module is proposed to active the discriminative orientation cues hidden in the deep convolutional feature maps. It can boost the representation of deep features with aid of the statistical orientation and their orientation structures. (3) The proposed method can stimulate orientation-selectivity mechanism to provide a strong discriminative yet compact representation. Experimental results on five popular benchmark datasets demonstrated that the proposed method could improve retrieval performance in terms of mAP scores. Furthermore, it outperforms some existing state-of-the-art methods without complex fine-tuning. The proposed method benefits to retrieve the scene images with various color and direction details. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
31. Optimal Illumination Distance Metrics for Person Re-Identification in Complex Lighting Conditions.
- Author
-
Wang, Chao, Wang, Zhongyuan, Hu, Ruimin, Wang, Xiaochen, and Zhou, Wen
- Subjects
IMAGE retrieval ,PEDESTRIANS ,LIGHTING - Abstract
Person re-identification is extensively applied in public security and surveillance. However, environmental factors like time and location often lead to varying lighting conditions in captured pedestrian images, significantly impacting identification accuracy. Current approaches mitigate this issue through lighting transformation techniques, aiming to normalize images to a standard lighting condition for consistent person re-identification results. Yet, these methods overlook the fact that different content may hold distinct identification values under diverse lighting conditions. To address this, we conducted an analysis on the identification distance between images of the same or different pedestrians under pre-defined lighting conditions. From this analysis, we introduce the concept of optimal lighting: a condition where the distance between image pairs is minimized compared to other lighting scenarios. We propose utilizing this optimal lighting distance in the image retrieval process for final ranking. Our study, validated on synthetic datasets Market-IA and Duke-IA, demonstrates that optimal lighting is independent of image texture information. Each image pair exhibits a unique optimal lighting, yet consistently shows a minimum distance value. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
32. Multi-source wafer map retrieval based on contrastive learning for root cause analysis in semiconductor manufacturing.
- Author
-
Hong, Wei-Jyun, Shen, Chia-Yu, and Wu, Pei-Yuan
- Subjects
ROOT cause analysis ,IMAGE retrieval ,TEXTURE mapping ,LABOR supply ,MULTIPLICATION ,SEMICONDUCTOR manufacturing - Abstract
In semiconductor manufacturing, wafer yield is a key success factor to determine profits. Since yield improvement involves capturing the faulty manufacturing steps from hundreds of wafer manufacturing steps, which requires a lot of time and manpower, previous studies employ wafer bin map fault type recognition model or wafer bin map retrieval model for guessing which manufacturing steps likely to fail and examine them first. However, existing methods still lack of the ability to directly capture the faulty manufacturing steps. Capturing the faulty manufacturing steps is critical as engineers can correct the erroneous step among hundreds of manufacturing steps to increase yields at minimal cost. The objective of our study is to find out the historical wafer defect maps that are highly reflective of erroneous manufacturing steps by querying a wafer bin map. A retrieval model emerges to directly rank the wafer defect maps to capture the most possible faulty manufacturing step, namely multi-source wafer map retrieval model. Although existing multi source image retrieval models can eliminate the visual gap between wafer maps from different sources, most of these methods focus on finding semantically related samples (e.g. of the same defect fault type) and ignore the spatial significance that should be considered in multi-source wafer map retrieval, such as various sizes, shapes and locations of defect patterns on wafer maps. This study applies a contrastive learning based approach to address the visual gap between wafer maps from different sources, which also takes spatial information into account. This study also adapts a mixed sample strategy to address the visual gap and applies pixel-wise multiplication between wafer maps from different sources to indicate common defect locations. The effectiveness of our proposed retrieval approach is supported by testing results on in-line production wafer map datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
33. Automatic Image Annotation and Retrieval Using Fuzzy C-Means Clustering with Gaussian Bare Bones White Shark Optimization.
- Author
-
Shivashankar, Janani, Shivashankar, Jahnavi, Bhaskar, Anil Kumar, Rajegowda, Manjunath Doddegowdanakppalu, Kumar, Sampath Satheesh, and Reddy, Shiva Sumanth
- Subjects
FEATURE extraction ,IMAGE retrieval ,EXTRACTION techniques ,HIERARCHICAL clustering (Cluster analysis) ,RESCUE work - Abstract
Image annotation is the process of assigning significant labels to specific parts of an image, making it easier for systems to categorize, interpret, and retrieve visual data. These annotations are then used in the retrieval process to search for and recover images based on tagged data, improving the accuracy and efficiency of identifying relevant images. However, inadequate or inconsistent annotations lead to poor image retrieval performance, decreasing the overall effectiveness. This research proposes the Fuzzy C-Means Clustering-Gaussian Bare Bones White Shark Optimization (FCM-GBWSO) algorithm for automatic image annotation and retrieval. In WSO, GB is incorporated to prevent premature convergence and avoid local optima issues in generating cluster centroids and clustered images. Various feature extraction techniques, such as ResNet50 and Color Moments, are applied to extract features effectively. Feature transformation and Neighbourhood Component Analysis (NCA) are then used to transform the features into similar significance and select the best features. The proposed FCM-GBWSO achieves an average precision of 0.98, 0.96, and 0.97 on the Corel-10k, Caltech 256 datasets, and Corel 1k respectively, outperforming existing methods like hierarchical clustering and Deep Neural Network-based Deep Search and Rescue (DNN-SAR). [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
34. Multi-granularity representation learning for sketch-based dynamic face image retrieval.
- Author
-
Wang, Liang, Dai, Dawei, and Fu, Shiyu
- Abstract
In some specific scenarios, a face sketch can be used to identify a person. However, drawing a face sketch often requires excellent skills and is time-consuming, which seriously hinders its widespread in the actual scenarios. The new framework of sketch less face image retrieval (SLFIR) (Dai et al. 2023) explores to provide some form of interaction between humans and machines during the drawing process to break the above barriers. Considering SLFIR framework, there is big gap between the partial sketch with few strokes and any one whole face photo, resulting in poor performance at the early stage. In this study, we proposed a multi-granularity (MG) representation learning (MGRL) to address the SLFIR problem, in which we learn the representation for the different granularity regions for the partial sketch and its target image. Specifically, (1) a classical triplet network was first adopted to learn the joint embedding space shared between the complete sketch and its target face photo; (2) Then, we divided the partial sketch in the sketch drawing episode into MG regions; Another learnable branch in the triplet network was designed to optimize the representation of the multi-granularity regions; Finally, by combining all the MG regions of the sketches and photos, the final distance was determined. In the experiments, our method outperformed state-of-the-art baseline methods in terms of early retrieval performance on two publicly accessible datasets. Codes are available at [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
35. Image retrieval of colored spun fabrics based on variable weight and semantic features.
- Author
-
Gu, Qian, Xiao, Yaowen, Yuan, Li, Shen, Mande, Zhang, Faquan, Liao, Haibing, and Liu, Junping
- Subjects
IMAGE retrieval ,NEW product development ,PRODUCT management ,STATISTICAL correlation ,ALGORITHMS - Abstract
Multi-level features of images have different contributions to image retrieval. Fixed weight feature fusion cannot give full play to the effectiveness of shallow visual features and semantic features in image retrieval of colored spun fabrics. In view of this fact, this paper proposes an algorithm of colored spun fabrics image retrieval on variable weight and semantic features. The algorithm constructs the similarity correlation coefficient based on the contribution of features at different levels to image retrieval, and obtains the fusion weight coefficient of multi-level features by retrieving the change rate of similarity score of the images and k neighbor images in the sample base. Experimental results show that the proposed algorithm on variable weight feature fusion can bring all features into full play. Compared with the retrieval method that only based on the shallow visual features fusion or the fixed weight feature fusion, the Recall rate of the proposed algorithm increases by 13.15% and 0.54% respectively, and the mean Average Precision of the whole class increases by 7.21% and 1.15% respectively. This research can provide technical support for digital management and product development of colored spun fabrics image, and it has important theoretical research value and broad application prospects. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
36. Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification.
- Author
-
Ding, Yongkang, Li, Jiechen, Wang, Hao, Liu, Ziang, and Wang, Anqi
- Abstract
Clothes-Changing Person Re-Identification is a challenging problem in computer vision, primarily due to the appearance variations caused by clothing changes across different camera views. This poses significant challenges to traditional person re-identification techniques that rely on clothing features. These challenges include the inconsistency of clothing and the difficulty in learning reliable clothing-irrelevant local features. To address this issue, we propose a novel network architecture called the Attention-Enhanced Multimodal Feature Fusion Network (AE-Net). AE-Net effectively mitigates the impact of clothing changes on recognition accuracy by integrating RGB global features, grayscale image features, and clothing-irrelevant features obtained through semantic segmentation. Specifically, global features capture the overall appearance of the person; grayscale image features help eliminate the interference of color in recognition; and clothing-irrelevant features derived from semantic segmentation enforce the model to learn features independent of the person’s clothing. Additionally, we introduce a multi-scale fusion attention mechanism that further enhances the model’s ability to capture both detailed and global structures, thereby improving recognition accuracy and robustness. Extensive experimental results demonstrate that AE-Net outperforms several state-of-the-art methods on the PRCC and LTCC datasets, particularly in scenarios with significant clothing changes. On the PRCC and LTCC datasets, AE-Net achieves Top-1 accuracy rates of 60.4% and 42.9%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
37. Convolutional neural network-based strategies for efficient content-based image retrieval.
- Author
-
Kamatchi, Chinnathambi, Rajendran, Rathiya, Nagarajan, Kopperundevi, Palanisamy, Brinda, Jeyabalan, Deepika, and Paperananthamurugesan, Rama Subramanian
- Subjects
MACHINE learning ,COMPUTER vision ,CONTENT-based image retrieval ,IMAGE retrieval ,CONVOLUTIONAL neural networks ,DEEP learning - Abstract
Recent years have seen a meteoric rise in the usage of enormous image databases due to advancements in multimedia technologies. One of the most critical technologies for image processing nowadays is image retrieval. This study uses convolutional neural networks (CNNs) for content-based image retrieval (CBIR). With the ever-growing number of digital photos, practical methods for retrieving these images are crucial. CNNs are incredibly efficient in many computer vision applications. Improving the efficacy and precision of image retrieval systems is the primary goal of our research into using deep learning. The paper starts with a thorough analysis of the current state of CBIR methods and the difficulties they face. Afterwards, it explores CNN's design and operation, focusing on CNN's capacity to learn hierarchical features from images autonomously. This paper also looks at how the model performs when it alters its hyperparameters, transfer learning techniques, and CNN topologies. The insights obtained from these experiments enhance the comprehension of the elements impacting CNN effectiveness in CBIR. Finally, our study shows that CNNs can change the game for image search by transforming CBIR systems. This research adds to the expanding body of information about using cutting-edge deep learning algorithms to make image retrieval more efficient and accurate. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
38. Deep shared proxy construction hashing for cross-modal remote sensing image fast target retrieval.
- Author
-
Han, Lirong, Paoletti, Mercedes E., Moreno-Álvarez, Sergio, Haut, Juan M., and Plaza, Antonio
- Subjects
- *
IMAGE retrieval , *REMOTE sensing , *OPTICAL images , *DEEP learning , *LAND use - Abstract
The diversity of remote sensing (RS) image modalities has expanded alongside advancements in RS technologies. A plethora of optical, multispectral, and hyperspectral RS images offer rich geographic class information. The ability to swiftly access multiple RS image modalities is crucial for fully harnessing the potential of RS imagery. In this work, an innovative method, called Deep Shared Proxy Construction Hashing (DSPCH), is introduced for cross-modal hyperspectral scene target retrieval using accessible RS images such as optical and sketch. Initially, a shared proxy hash code is generated in the hash space for each land use class. Subsequently, an end-to-end deep hash network is built to generate hash codes for hyperspectral pixels and accessible RS images. Furthermore, a proxy hash loss function is designed to optimize the proposed deep hashing network, aiming to generate hash codes that closely resemble the corresponding proxy hash code. Finally, two benchmark datasets are established for cross-modal hyperspectral and accessible RS image retrieval, allowing us to conduct extensive experiments with these datasets. Our experimental results validate that the novel DSPCH method can efficiently and effectively achieve RS image cross-modal target retrieval, opening up new avenues in the field of cross-modal RS image retrieval. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Enhanced Semantic Natural Scenery Retrieval System Through Novel Dominant Colour and Multi‐Resolution Texture Feature Learning Model.
- Author
-
Pavithra, L. K., Subbulakshmi, P., Paramanandham, Nirmala, Vimal, S., Alghamdi, Norah Saleh, and Dhiman, Gaurav
- Subjects
- *
ENSEMBLE learning , *IMAGE databases , *IMAGE retrieval , *IMAGING systems , *COLOR - Abstract
ABSTRACT A conventional content‐based image retrieval system (CBIR) extracts image features from every pixel of the images, and its depiction of the feature is entirely different from human perception. Additionally, it takes a significant amount of time for retrieval. An optimal combination of appropriate image features is necessary to bridge the semantic gap between user queries and retrieval responses. Furthermore, users should require minimal interactions with the CBIR system to obtain accurate responses. Therefore, the proposed work focuses on extracting highly relevant feature information from a set of images in various natural image databases. Subsequently, a feature‐based learning/classification model is introduced before similarity measure calculations, aiming to minimise retrieval time and the number of comparisons. The proposed work analyses the learning models based on the retrieval system's performance separately for the following features: (i) dominant colour, (ii) multi‐resolution radial difference texture patterns, and a combination of both. The developed work is assessed with other techniques, and the results are reported. The results demonstrate that the implemented ensemble learning model‐based CBIR outperforms the recent CBIR techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Grownbb: Gromov–Wasserstein learning of neural best buddies for cross-domain correspondence.
- Author
-
Tang, Ruolan, Wang, Weiwei, Han, Yu, and Feng, Xiangchu
- Subjects
- *
BEST friends , *COMPUTER vision , *IMAGE retrieval , *SEMANTICS - Abstract
Identifying pixel correspondences between two images is a fundamental task in computer vision, and has been widely used for 3D reconstruction, image morphing, and image retrieval. The neural best buddies (NBB) finds sparse correspondences between cross-domain images, which have semantically related local structures, though could be quite different in semantics as well as appearances. This paper presents a new method for cross-domain image correspondence, called GroWNBB, by incorporating the Gromov–Wasserstein learning into the NBB framework. Specifically, we utilize the NBB as the backbone to search feature matching from deep layer and propagate to low layer. While for each layer, we modify the strategy of NBB by further mapping the matching pairs obtained from the NBB within and across images into graphs, then formulate the matches as optimal transport between graphs, and use Gromov–Wasserstein learning to establish matches between these graphs. Consequently, our approach considers the relationships between images as well as the relationships within images, which makes the correspondence more stable. Our experiments demonstrate that GroWNBB achieves state-of-the-art performance on cross-domain correspondence and outperforms other popular methods in intra-class and same object correspondence estimation. Our code is available at https://github.com/NolanInLowland/GroWNBB. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. A survey on person and vehicle re‐identification.
- Author
-
Wang, Zhaofa, Wang, Liyang, Shi, Zhiping, Zhang, Miaomiao, Geng, Qichuan, and Jiang, Na
- Subjects
- *
COMPUTER vision , *IMAGE registration , *VIDEO surveillance , *CRIMINAL investigation , *IMAGE retrieval , *DEEP learning - Abstract
Person/vehicle re‐identification aims to use technologies such as cross‐camera retrieval to associate the same person (same vehicle) in the surveillance videos at different locations, different times, and images captured by different cameras so as to achieve cross‐surveillance image matching, person retrieval and trajectory tracking. It plays an extremely important role in the fields of intelligent security, criminal investigation etc. In recent years, the rapid development of deep learning technology has significantly propelled the advancement of re‐identification (Re‐ID) technology. An increasing number of technical methods have emerged, aiming to enhance Re‐ID performance. This paper summarises four popular research areas in the current field of re‐identification, focusing on the current research hotspots. These areas include the multi‐task learning domain, the generalisation learning domain, the cross‐modality domain, and the optimisation learning domain. Specifically, the paper analyses various challenges faced within these domains and elaborates on different deep learning frameworks and networks that address these challenges. A comparative analysis of re‐identification tasks from various classification perspectives is provided, introducing mainstream research directions and current achievements. Finally, insights into future development trends are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Image searching in an open photograph archive: search tactics and faced barriers in historical research.
- Author
-
Late, Elina, Ruotsalainen, Hille, and Kumpulainen, Sanna
- Subjects
- *
DIGITAL humanities , *DIGITAL photography , *PHOTOGRAPH collections , *INTERNET content , *IMAGE retrieval - Abstract
During the last decades, cultural heritage collections have been digitized, for example, for the use of academic scholars. However, earlier studies have mainly focused on the use of textual materials. Thus, little is known about how digitized photographs are used and searched in digital humanities. The aim of this paper is to investigate the applied search tactics and perceived barriers when looking for historical photographs from a digital image archive for research and writing tasks. The case archive of this study contains approximately 160,000 historical wartime photographs that are openly available. The study is based on a qualitative interview and demonstration data of 15 expert users of the image collection searching photographs for research and writing tasks. Critical incident questions yielded a total of 37 detailed real-life search examples and 158 expressed barriers to searching. Results show that expert users apply and combine different tactics (keywords, filtering and browsing) for image searching, and rarely using one tactic only is enough. During searching users face various barriers, most of them focusing on keyword searching due to the shortcomings of image metadata. Barriers were mostly in the context of the collection and tools. Although scholars have benefited from the efforts put into digitizing cultural heritage collections, providing digitized content openly online is not enough if there are no sufficient means for accessing the content. Automatic annotation methods are one option for creating metadata to improve the findability of the images. However, a better understanding of human information interaction with image data is needed to better support digitalization in the humanities in this respect. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Secure and efficient content-based image retrieval using dominant local patterns and watermark encryption in cloud computing.
- Author
-
Sucharitha, G., Godavarthi, Deepthi, Ramesh, Janjhyam Venkata Naga, and Khan, M. Ijaz
- Subjects
- *
CONTENT-based image retrieval , *IMAGE encryption , *DATABASES , *IMAGE retrieval , *IMAGE databases - Abstract
The relevance of images in people's daily lives is growing, and content-based image retrieval (CBIR) has received a lot of attention in research. Images are much better at communicating information than text documents. This paper deals with security and efficient retrieval of images based on the texture features extracted by the dominant local patterns of an image in cloud. Here, we proposed a method that supports secure and efficient image retrieval over cloud. The images are encrypted with the watermark before deploying the image database to the cloud, this process prevents from the outflow of sensitive information to the cloud. A reduced dimension feature vector database has been created for all the images using relative directional edge patterns (RDEP), facilitating efficient storage and retrieval. The significance of the specified local pattern for effectively extracting texture information has been demonstrated. A notable level of accuracy has been established when compared to existing algorithms in terms of precision and recall. Additionally, a watermark-based system is proposed to prevent unauthorized query users from illicitly copying and distributing the acquired images to others. An inimitable watermark is entrenched into the image by the encryption module before storing into the cloud. Hence, when an image copy is discovered, the watermark extraction can be used to track down the illegal query image user who circulated the image. The proposed method's significance is assessed by comparing it to other existing feature extractors incorporating watermark encryption. Additionally, the effectiveness of the method is demonstrated across various numbers of watermark bits. Trials and security analyses affirm that the suggested approach is both robust and efficient. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. A novel retrieval strategy for colored spun yarn fabrics based on hand-crafted features and support vector machine.
- Author
-
Lin, Zhongbo, Zhang, Xia, Zhang, Ning, Xiang, Jun, and Pan, Ruru
- Subjects
SPUN yarns ,FEATURE extraction ,SUPPORT vector machines ,IMAGE retrieval ,IMAGE databases - Abstract
The traditional retrieval methods of colored spun yarn fabrics are low in efficiency and accuracy. To solve this problem, this paper proposed a novel retrieval strategy for colored spun yarn fabrics based on hand-crafted features and support vector machine (SVM). First, the SVM classifier was built to automatically classify fabric images. Second, different texture and color features were extracted for solid-color fabrics and stylized fabrics based on hand-crafted feature extraction methods. Third, the retrieval strategy was formulated to realize image retrieval of colored spun yarn fabrics based on the output probability of the predicted sample category. The image database which contains over 6000 images was built to verify the proposed method. Experiments show that the mAP of the proposed method reaches 0.837 and the elapsed time is 0.171 s. Results indicate that the proposed method is feasible and effective, being superior to other methods for image retrieval of colored spun yarn fabrics. The proposed method can provide references for the production crew in the factory to shorten the production cycle and reduce manual labor. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Semantic enhancement and multi-level alignment network for cross-modal retrieval.
- Author
-
Chen, Jia and Zhang, Hong
- Subjects
INFORMATION storage & retrieval systems ,IMAGE registration ,ARTIFICIAL intelligence ,IMAGE processing ,IMAGE retrieval ,SEMANTICS - Abstract
Cross-modal retrieval aims to address heterogeneity and cross-modal semantic associations between multimedia data of different modalities. Image-text retrieval is a key challenge for cross-modal retrieval, which has made great progress through global alignment between images and text, or local alignment between regions and words. However, this challenge still faces three problems. Firstly, text data usually contains words without semantic meaning; and this redundant information interferes with local alignment between text words and image regions. Secondly, existing attention mechanisms focus only on visual features of image regions, while ignoring information about the spatial relationships between individual detected objects in an image, such as relative position and size. This information is often critical for understanding content features in an image. Finally, text words or image regions may have different semantics in different global contexts, so we should consider overall semantic matching and mine deeper semantic information expressed by images and texts. To solve these problems, we proposes Semantic Enhancement and Multi-level Alignment Network (SEMAN) for cross-modal retrieval. Firstly, a multi-head self-attention mechanism after word embedding is introduced to filter the words without semantic meaning in text sentences. Secondly, the image position relation embedding is proposed by modifying the self-attention weight matrix to incorporate the spatial relationship information between image regions. Finally, we introduce a multi-level alignment matching module to understand complex correlations between images and text. Extensive experiments on two benchmark datasets, i.e., Flickr30K and MSCOCO, demonstrate the effectiveness of our SEMAN, achieving state-of-the art performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Anchor-based Domain Adaptive Hashing for unsupervised image retrieval.
- Author
-
Chen, Yonghao, Fang, Xiaozhao, Liu, Yuanyuan, Hu, Xi, Han, Na, and Kang, Peipei
- Abstract
Traditional image retrieval methods suffer from a significant performance degradation when the model is trained on the target dataset and run on another dataset. To address this issue, Domain Adaptive Retrieval (DAR) has emerged as a promising solution, specifically designed to overcome domain shifts in retrieval tasks. However, existing unsupervised DAR methods still face two primary limitations: (1) they under-explore the intrinsic structure among domains, resulting in limited generalization capabilities; and (2) the models are often too complex to be applied to large-scale datasets. To tackle these limitations, we propose a novel unsupervised DAR method named Anchor-based Domain Adaptive Hashing (ADAH). ADAH aims to exploit the commonalities among domains with the assumption that a consensus latent space exists for the source and target domains. To achieve this, an anchor-based similarity reconstruction scheme is proposed, which learns a set of domain-shared anchors and domain-specific anchor graphs, and then reconstructs the similarity matrix with these anchor graphs, thereby effectively exploiting inter- and intra-domain similarity structures. Subsequently, by treating the anchor graphs as feature embeddings, we solve the Distance-Distance Difference Minimization (DDDM) problem between them and their corresponding hash codes. This preserves the similarity structure of the similarity matrix in the hash code. Finally, a two-stage strategy is employed to derive the hash function, ensuring its effectiveness and scalability. Experimental results on four datasets demonstrate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. 基于模糊图像的室内视觉定位系统.
- Author
-
和法旭, 吕梦妍, 张坤鹏, and 张立晔
- Abstract
To address the challenges posed by image blurring in indoor visual localization, a study on the localization of blurred images was conducted by proposing an indoor visual localization system based on blurry images. Firstly, the blurriness level of images was assessed using a blurriness model, and then the images were deblurred using an improved image restoration model to obtain clear images. Secondly, the performance of image retrieval was enhanced using a deep learning-based image retrieval model to improve the accuracy of the localization system. Furthermore, a visual localization algorithm based on the essential matrix, adaptable to multiple features, was employed in conjunction with the SuperPoint + SuperGlue feature extraction algorithm to enhance the system's robustness to changes in illumination and viewpoint, and to improve processing efficiency. The local feature transformers(LoFTR) algorithm was integrated to address scenarios with no or weak texture. The system extracted features and matching information from pairs of original and retrieval images, and estimated the essential matrix using the 5-point method and the random sample consensus algorithm. Finally, the camera's absolute pose of the query image was obtained. Experimental results demonstrate that the system achieves precise localization results, with an average median localization error of 0. 067 m and a camera angle error of 1. 826°. It can be seen that the proposed method effectively addresses the challenges of image blurring in indoor visual localization, significantly improving localization accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. CLIP-Llama: A New Approach for Scene Text Recognition with a Pre-Trained Vision-Language Model and a Pre-Trained Language Model.
- Author
-
Zhao, Xiaoqing, Xu, Miaomiao, Silamu, Wushour, and Li, Yanbing
- Subjects
- *
LANGUAGE models , *TEXT recognition , *ARTIFICIAL intelligence , *IMAGE retrieval , *INCORPORATION , *INTELLIGENT transportation systems - Abstract
This study focuses on Scene Text Recognition (STR), which plays a crucial role in various applications of artificial intelligence such as image retrieval, office automation, and intelligent transportation systems. Currently, pre-trained vision-language models have become the foundation for various downstream tasks. CLIP exhibits robustness in recognizing both regular (horizontal) and irregular (rotated, curved, blurred, or occluded) text in natural images. As research in scene text recognition requires substantial linguistic knowledge, we introduce the pre-trained vision-language model CLIP and the pre-trained language model Llama. Our approach builds upon CLIP's image and text encoders, featuring two encoder–decoder branches: one visual branch and one cross-modal branch. The visual branch provides initial predictions based on image features, while the cross-modal branch refines these predictions by addressing the differences between image features and textual semantics. We incorporate the large language model Llama2-7B in the cross-modal branch to assist in correcting erroneous predictions generated by the decoder. To fully leverage the potential of both branches, we employ a dual prediction and refinement decoding scheme during inference, resulting in improved accuracy. Experimental results demonstrate that CLIP-Llama achieves state-of-the-art performance on 11 STR benchmark tests, showcasing its robust capabilities. We firmly believe that CLIP-Llama lays a solid and straightforward foundation for future research in scene text recognition based on vision-language models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Image–Text Matching Model Based on CLIP Bimodal Encoding.
- Author
-
Zhu, Yihuan, Xu, Honghua, Du, Ailin, and Wang, Bin
- Subjects
NATURAL language processing ,TRANSFORMER models ,COMPUTER vision ,IMAGE retrieval ,VECTOR spaces ,IMAGE registration - Abstract
Image–text matching is a fundamental task in the multimodal research field, connecting computer vision and natural language processing by aligning visual content with corresponding textual descriptions. Accurate matching is critical for applications such as image captioning and text-based image retrieval yet remains challenging due to the differences in data modalities. This paper addresses these challenges by proposing a robust image–text matching model inspired by Contrastive Language–Image Pre-training (CLIP). Our approach employs the Vision Transformer (ViT) model as the image encoder and Bidirectional Encoder Representations from Transformers (Bert) as the text encoder, integrating these into a shared vector space to measure semantic similarity. We enhance the model's training efficiency using the LiT-tuning paradigm to optimize learning through a cosine decay strategy for dynamic adjustment of the learning rate. We validate our method on two benchmark datasets, WuKong and Flickr30k, demonstrating that our model achieves superior performance and significantly improves key evaluation metrics. The results underscore the model's effectiveness in achieving accurate and robust image–text alignment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Deep Multi-Similarity Hashing with Spatial-Enhanced Learning for Remote Sensing Image Retrieval.
- Author
-
Zhang, Huihui, Qin, Qibing, Ge, Meiling, and Huang, Jianyong
- Subjects
REMOTE sensing ,IMAGE retrieval ,DISTANCE education ,STATISTICAL sampling ,PROBLEM solving - Abstract
Remote sensing image retrieval (RSIR) plays a crucial role in remote sensing applications, focusing on retrieving a collection of items that closely match a specified query image. Due to the advantages of low storage cost and fast search speed, deep hashing has been one of the most active research problems in remote sensing image retrieval. However, remote sensing images contain many content-irrelevant backgrounds or noises, and they often lack the ability to capture essential fine-grained features. In addition, existing hash learning often relies on random sampling or semi-hard negative mining strategies to form training batches, which could be overwhelmed by some redundant pairs that slow down the model convergence and compromise the retrieval performance. To solve these problems effectively, a novel Deep Multi-similarity Hashing with Spatial-enhanced Learning, termed DMsH-SL, is proposed to learn compact yet discriminative binary descriptors for remote sensing image retrieval. Specifically, to suppress interfering information and accurately localize the target location, by introducing a spatial enhancement learning mechanism, the spatial group-enhanced hierarchical network is firstly designed to learn the spatial distribution of different semantic sub-features, capturing the noise-robust semantic embedding representation. Furthermore, to fully explore the similarity relationships of data points in the embedding space, the multi-similarity loss is proposed to construct informative and representative training batches, which is based on pairwise mining and weighting to compute the self-similarity and relative similarity of the image pairs, effectively mitigating the effects of redundant and unbalanced pairs. Experimental results on three benchmark datasets validate the superior performance of our approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.