4,152 results on '"Contrastive learning"'
Search Results
2. Region-Aware Distribution Contrast: A Novel Approach to Multi-task Partially Supervised Learning
- Author
-
Li, Meixuan, Li, Tianyu, Wang, Guoqing, Wang, Peng, Yang, Yang, Zou, Jie, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
3. FlowCon: Out-of-Distribution Detection Using Flow-Based Contrastive Learning
- Author
-
Aathreya, Saandeep, Canavan, Shaun, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
4. Click-Gaussian: Interactive Segmentation to Any 3D Gaussians
- Author
-
Choi, Seokhun, Song, Hyeonseop, Kim, Jaechul, Kim, Taehyeong, Do, Hoseok, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
5. Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
- Author
-
Nguyen, Thong, Bin, Yi, Wu, Xiaobao, Dong, Xinshuai, Hu, Zhiyuan, Le, Khoi, Nguyen, Cong-Duy, Ng, See-Kiong, Tuan, Luu Anh, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
6. Contrastive Ground-Level Image and Remote Sensing Pre-training Improves Representation Learning for Natural World Imagery
- Author
-
Huynh, Andy V., Gillespie, Lauren E., Lopez-Saucedo, Jael, Tang, Claire, Sikand, Rohan, Expósito-Alonso, Moisés, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
7. Counterfactual Contrastive Learning: Robust Representations via Causal Image Synthesis
- Author
-
Roschewitz, Mélanie, de Sousa Ribeiro, Fabio, Xia, Tian, Khara, Galvin, Glocker, Ben, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bhattarai, Binod, editor, Ali, Sharib, editor, Rau, Anita, editor, Caramalau, Razvan, editor, Nguyen, Anh, editor, Gyawali, Prashnna, editor, Namburete, Ana, editor, and Stoyanov, Danail, editor
- Published
- 2025
- Full Text
- View/download PDF
8. GarmentAligner: Text-to-Garment Generation via Retrieval-Augmented Multi-level Corrections
- Author
-
Zhang, Shiyue, Chong, Zheng, Zhang, Xujie, Li, Hanhui, Cheng, Yuhao, Yan, Yiqiang, Liang, Xiaodan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
9. QuIIL at T3 Challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View
- Author
-
T. L. Vuong, Trinh, C. Bui, Doanh, Kwak, Jin Tae, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bao, Rina, editor, Grant, Ellen, editor, Kirkpatrick, Andrew, editor, Wachs, Juan, editor, and Ou, Yangming, editor
- Published
- 2025
- Full Text
- View/download PDF
10. Semi-Supervised Proxy Contrastive Generalization Network for Bearing Fault Diagnosis
- Author
-
Song, Qiuyu, Jiang, Xingxing, Wang, Qian, Wang, Jun, Huang, Weiguo, Zhu, Zhongkui, Ceccarelli, Marco, Series Editor, Corves, Burkhard, Advisory Editor, Glazunov, Victor, Advisory Editor, Hernández, Alfonso, Advisory Editor, Huang, Tian, Advisory Editor, Jauregui Correa, Juan Carlos, Advisory Editor, Takeda, Yukio, Advisory Editor, Agrawal, Sunil K., Advisory Editor, Wang, Zuolu, editor, Zhang, Kai, editor, Feng, Ke, editor, Xu, Yuandong, editor, and Yang, Wenxian, editor
- Published
- 2025
- Full Text
- View/download PDF
11. Efficient Multi-modal Human-Centric Contrastive Pre-training with a Pseudo Body-Structured Prior
- Author
-
Meng, Yihang, Cheng, Hao, Wang, Zihua, Zhu, Hongyuan, Lao, Xiuxian, Zhang, Yu, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
12. Robust Contrastive Learning Against Audio-Visual Noisy Correspondence
- Author
-
Zhao, Yihan, Xi, Wei, Bai, Gairui, Liu, Xinhui, Zhao, Jizhong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
13. A Multi-modal Framework with Contrastive Learning and Sequential Encoding for Enhanced Sleep Stage Detection
- Author
-
Wang, Zehui, Zhang, Zhihan, Wang, Hongtao, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
14. Uncertainty-Aware with Negative Samples for Video-Text Retrieval
- Author
-
Song, Weitao, Chen, Weiran, Xu, Jialiang, Ji, Yi, Li, Ying, Liu, Chunping, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
15. ConD2: Contrastive Decomposition Distilling for Multimodal Sentiment Analysis
- Author
-
Yu, Xi, Huang, Wenti, Long, Jun, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
16. Multi-level Distributional Discrepancy Enhancement for Cross Domain Face Forgery Detection
- Author
-
Qiu, Lingyu, Jiang, Ke, Liu, Sinan, Tan, Xiaoyang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
17. Coarse-to-Fine Domain Adaptation for Cross-Subject EEG Emotion Recognition with Contrastive Learning
- Author
-
Ran, Shuang, Zhong, Wei, Hu, Fei, Ye, Long, Zhang, Qin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
18. Dynamic Feature Fusion Based on Consistency and Complementarity of Brain Atlases
- Author
-
Lin, Qiye, Zhao, Jiaqi, Fan, Ruiwen, Zhou, Xuezhong, Xia, Jianan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
19. Anchored Supervised Contrastive Learning for Long-Tailed Medical Image Regression
- Author
-
Li, Zhaoying, Xing, Zhaohu, Liu, Hongying, Zhu, Lei, Wan, Liang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
20. Enhancing Large Foundation Models to Identify Fundus Diseases Based on Contrastive Enhanced Low-Rank Adaptation Prompt
- Author
-
Wang, Meng, Lin, Tian, Xu, Ting, Zou, Ke, Chen, Haoyu, Fu, Huazhu, Cheng, Ching-Yu, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bhavna, Antony, editor, Chen, Hao, editor, Fang, Huihui, editor, Fu, Huazhu, editor, and Lee, Cecilia S., editor
- Published
- 2025
- Full Text
- View/download PDF
21. ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading
- Author
-
Yang, Zhiyuan, Zhang, Bo, Shi, Yufei, Zhong, Ningze, Loh, Johnathan, Fang, Huihui, Xu, Yanwu, Yeo, Si Yong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bhavna, Antony, editor, Chen, Hao, editor, Fang, Huihui, editor, Fu, Huazhu, editor, and Lee, Cecilia S., editor
- Published
- 2025
- Full Text
- View/download PDF
22. Fetal Ultrasound Video Representation Learning Using Contrastive Rubik’s Cube Recovery
- Author
-
Zhang, Kangning, Jiao, Jianbo, Noble, J. Alison, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Gomez, Alberto, editor, Khanal, Bishesh, editor, King, Andrew, editor, and Namburete, Ana, editor
- Published
- 2025
- Full Text
- View/download PDF
23. FISHing in Uncertainty: Synthetic Contrastive Learning for Genetic Aberration Detection
- Author
-
Gutwein, Simon, Kampel, Martin, Taschner-Mandl, Sabine, Licandro, Roxane, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sudre, Carole H., editor, Mehta, Raghav, editor, Ouyang, Cheng, editor, Qin, Chen, editor, Rakic, Marianne, editor, and Wells, William M., editor
- Published
- 2025
- Full Text
- View/download PDF
24. Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
- Author
-
Yang, Hunmin, Jeong, Jongoh, Yoon, Kuk-Jin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
25. Contrastive Learning with Counterfactual Explanations for Radiology Report Generation
- Author
-
Li, Mingjie, Lin, Haokun, Qiu, Liang, Liang, Xiaodan, Chen, Ling, Elsaddik, Abdulmotaleb, Chang, Xiaojun, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
26. Mitigating Dimensional Collapse and Model Drift in Non-IID Data of Federated Learning
- Author
-
Jiang, Ming, Li, Yun, Lu, Yao, Guo, Biao, Zhang, Feng, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhang, Haijun, editor, Li, Xianxian, editor, Hao, Tianyong, editor, Meng, Weizhi, editor, Wu, Zhou, editor, and He, Qian, editor
- Published
- 2025
- Full Text
- View/download PDF
27. DRLN: Disentangled Representation Learning Network for Multimodal Sentiment Analysis
- Author
-
Hou, Jingming, Omar, Nazlia, Tiun, Sabrina, Saad, Saidah, He, Qian, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhang, Haijun, editor, Li, Xianxian, editor, Hao, Tianyong, editor, Meng, Weizhi, editor, Wu, Zhou, editor, and He, Qian, editor
- Published
- 2025
- Full Text
- View/download PDF
28. Clothes Image Retrieval via Learnable FashionCLIP
- Author
-
Sun, Yuan, Zhao, Mingbo, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zhang, Haijun, editor, Li, Xianxian, editor, Hao, Tianyong, editor, Meng, Weizhi, editor, Wu, Zhou, editor, and He, Qian, editor
- Published
- 2025
- Full Text
- View/download PDF
29. Universal representation learning for multivariate time series using the instance-level and cluster-level supervised contrastive learning
- Author
-
Moradinasab, Nazanin, Sharma, Suchetha, Bar-Yoseph, Ronen, Radom-Aizik, Shlomit, C. Bilchick, Kenneth, M. Cooper, Dan, Weltman, Arthur, and Brown, Donald E
- Subjects
Information and Computing Sciences ,Machine Learning ,Brain Disorders ,Multivariate time series data ,Contrastive learning ,Classification ,Interpretability ,Artificial Intelligence and Image Processing ,Data Format ,Information Systems ,Artificial Intelligence & Image Processing ,Information and computing sciences - Abstract
The multivariate time series classification (MTSC) task aims to predict a class label for a given time series. Recently, modern deep learning-based approaches have achieved promising performance over traditional methods for MTSC tasks. The success of these approaches relies on access to the massive amount of labeled data (i.e., annotating or assigning tags to each sample that shows its corresponding category). However, obtaining a massive amount of labeled data is usually very time-consuming and expensive in many real-world applications such as medicine, because it requires domain experts’ knowledge to annotate data. Insufficient labeled data prevents these models from learning discriminative features, resulting in poor margins that reduce generalization performance. To address this challenge, we propose a novel approach: supervised contrastive learning for time series classification (SupCon-TSC). This approach improves the classification performance by learning the discriminative low-dimensional representations of multivariate time series, and its end-to-end structure allows for interpretable outcomes. It is based on supervised contrastive (SupCon) loss to learn the inherent structure of multivariate time series. First, two separate augmentation families, including strong and weak augmentation methods, are utilized to generate augmented data for the source and target networks, respectively. Second, we propose the instance-level, and cluster-level SupCon learning approaches to capture contextual information to learn the discriminative and universal representation for multivariate time series datasets. In the instance-level SupCon learning approach, for each given anchor instance that comes from the source network, the low-variance output encodings from the target network are sampled as positive and negative instances based on their labels. However, the cluster-level approach is performed between each instance and cluster centers among batches, as opposed to the instance-level approach. The cluster-level SupCon loss attempts to maximize the similarities between each instance and cluster centers among batches. We tested this novel approach on two small cardiopulmonary exercise testing (CPET) datasets and the real-world UEA Multivariate time series archive. The results of the SupCon-TSC model on CPET datasets indicate its capability to learn more discriminative features than existing approaches in situations where the size of the dataset is small. Moreover, the results on the UEA archive show that training a classifier on top of the universal representation features learned by our proposed method outperforms the state-of-the-art approaches.
- Published
- 2024
30. Contrastive dissimilarity: optimizing performance on imbalanced and limited data sets.
- Author
-
Teixeira, Lucas O., Bertolini, Diego, Oliveira, Luiz S., Cavalcanti, George D. C., and Costa, Yandre M. G.
- Subjects
- *
RANDOM forest algorithms , *SCARCITY , *CLASSIFICATION , *DATABASES - Abstract
A primary challenge in pattern recognition is imbalanced datasets, resulting in skewed and biased predictions. This problem is exacerbated by limited data availability, increasing the reliance on expensive expert data labeling. The study introduces a novel method called contrastive dissimilarity, which combines dissimilarity-based representation with contrastive learning to improve classification performance in imbalance and data scarcity scenarios. Based on pairwise sample differences, dissimilarity representation excels in situations with numerous overlapping classes and limited samples per class. Unlike traditional methods that use fixed distance functions like Euclidean or cosine, our proposal employs metric learning with contrastive loss to estimate a custom dissimilarity function. We conducted extensive evaluations in 13 databases across multiple training–test splits. The results showed that this approach outperforms traditional models like SVM, random forest, and Naive Bayes, particularly in settings with limited training data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Advancing entity alignment with dangling cases: a structure-aware approach through optimal transport learning and contrastive learning.
- Author
-
Xu, Jin, Li, Yangning, Xie, Xiangjin, Hu, Niu, Li, Yinghui, Zheng, Hai-Tao, and Jiang, Yong
- Subjects
- *
GRAPH neural networks , *KNOWLEDGE graphs - Abstract
Entity alignment (EA) aims to discover the equivalent entities in different knowledge graphs (KGs), which plays an important role in knowledge engineering. Recently, EA with dangling entities has been proposed as a more realistic setting, which assumes that not all entities have corresponding equivalent entities. In this paper, we focus on this setting. Some work has explored this problem by leveraging translation API, pre-trained word embeddings, and other off-the-shelf tools. However, these approaches over-rely on the side information (e.g., entity names) and fail to work when the side information is absent. On the contrary, they still insufficiently exploit the most fundamental graph structure information in KG. To improve the exploitation of the structural information, we propose a novel entity alignment framework called Structure-aware Wasserstein Graph Contrastive Learning (SWGCL), which is refined on three dimensions: (i) Model. We propose a novel Gated Graph Attention Network to capture local and global graph structure attention. (ii) Training. Two learning objectives: contrastive learning and optimal transport learning, are designed to obtain distinguishable entity representations. (iii) Inference. In the inference phase, a PageRank-based method HOSS (Higher-Order Structural Similarity) is proposed to calculate higher-order graph structural similarity. Extensive experiments on two dangling benchmarks demonstrate that our SWGCL outperforms the current state-of-the-art methods with pure structural information in both traditional (relaxed) and dangling (consolidated) settings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Integrating pseudo labeling with contrastive clustering for transformer-based semi-supervised action recognition.
- Author
-
Li, Nannan, Huang, Kan, Wu, Qingtian, and Zhao, Yang
- Subjects
DATA augmentation ,TRANSFORMER models ,RECOGNITION (Psychology) ,GENERALIZATION ,ALGORITHMS - Abstract
Video action recognition with semi-supervised learning is a challenging research topic due to the low-labeling ratio. Previous works mainly tackle the problem with two kinds of approaches: pseudo labeling and contrastive learning. Different from existing approaches that often treat the two parts separately, we propose an integrated learning framework that incorporates pseudo labeling and contrastive clustering in a coherent and mutually beneficial way. On one hand, the contrastive learning aggregates data from the same class into clusters, yielding more reliable pseudo labels for training the classifier; on the other hand, the re-trained classifier predicts categories for unlabeled data, thereby guiding contrastive learning to establish discriminative representations. We theoretically prove that the two iterative operations can be formulated as an E-M algorithm and validate its generalization ability upon the semi-supervised classification task with experiments. Specifically, we construct a MoCo-like structure to implement the proposed learning framework and explore the potential of employing the video tramsformer for semi-supervised action recognition. Furthermore, We also devise a global-local view sampling strategy for video data augmentation, which verifies to facilitate the representation learning and advance the performance. We implement extensive experiments on three video action recognition datasets with a series of data labeling ratios. Compared with state-of-the-art (SOTA) methods, the proposed approach achieves superior or competitive performances. For example, with 1 % labeling ratio, the top-1 accuracy increase to 49.1 % and 52.4 % on UCF-101 and Kinetics-400 datasets, respectively, surpassing SOTA by 2.8 % and 3.3 % . [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Dual Contrastive Learning for Cross-Domain Named Entity Recognition.
- Author
-
Xu, Jingyun, Yu, Junnan, Cai, Yi, and Chua, Tat-Seng
- Abstract
The article presents a dual contrastive learning model, Dual-C that enhances cross-domain named entity recognition (NER) by refining token-level and sentence-level representations to improve entity span detection (ESD) and entity type classification (ETC). It highlights the challenges faced by existing NER methods in distinguishing entities from non-entities and the decomposition of NER tasks into subtasks for better performance.
- Published
- 2024
- Full Text
- View/download PDF
34. Soft Contrastive Sequential Recommendation.
- Author
-
Zhang, Yabin, Wang, Zhenlei, Yu, Wenhui, Hu, Lantao, Jiang, Peng, Gai, Kun, and Chen, Xu
- Abstract
The article presents a novel soft contrastive mathematical framework for sequential recommendation, addressing the limitations of traditional models that rely on human-designed positive and negative samples. Topics include the introduction of adversarial contrastive loss for improved robustness, the exploration of perturbation strategies at the sequence and item levels and extensive experiments conducted on five real-world datasets to demonstrate the model's effectiveness and potential.
- Published
- 2024
- Full Text
- View/download PDF
35. Domain disentanglement and contrastive learning with source-guided sampling for unsupervised domain adaptation person re-identification.
- Author
-
Wu, Cheng-Hsuan, Liu, An-Sheng, Chen, Chiung-Tao, and Fu, Li-Chen
- Abstract
In recent years, fully supervised Person re-id methods have already been well developed. Still, they cannot be easily applied to real-life applications because of the domain gap between real-world databases and training datasets. And annotating ground truth label for the entire surveillance system with multiple cameras and videos are labor-intensive and impracticable in the real application. Besides, as the awareness of the right to privacy is rising, it becomes more challenging to collect sufficient training data from the public. Thence, the difficulty of constructing a new dataset for deployment not only arises from the labor cost of labeling but also because the raw data from the public are hard to come by. To be better adapted to real-life system deployment, we proposed an unsupervised domain adaptation based method, which involves Domain Disentanglement Network and Source-Guided Contrastive learning (SGCL). DD-Net first narrows down the domain gap between two datasets, and then SGCL utilizes the labeled source dataset as the clue to guide the training on the target domain. With these two modules, the knowledge transfer can be completed successfully from the training dataset to real-world scenarios. The conducted experiment shows that the proposed method is competitive with the state-of-the-art methods on two public datasets and even outperforms them under the setting of the small-scale target dataset. Therefore, not only the Person Re-ID, but also the object tracking in video or surveillance system can benefit from our new approach when we went to deploy to different environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Deep question generation model based on dual attention guidance.
- Author
-
Li, Jinhong, Zhang, Xuejie, Wang, Jin, and Zhou, Xiaobing
- Abstract
Question generation refers to the automatic generation of questions by computer systems based on given paragraphs and answers, which is one of the research hotspots in natural language processing. Although previous work has made great progress, there are still some limitations: (1) The rich structural information hidden in word sequences is ignored. (2) Current studies focus on sequence-to-sequence-based neural networks to maximize the use of question-and-answer information in the context. However, the context often contains a large number of redundant and irrelevant sentences, and these models fail to filter redundant information or focus on key sentences. To address these limitations, we use a Graph Convolutional Network (GCN) and a Bidirectional Long Short Term Memory (Bi-LSTM) Network to capture the structure and sequence information of the context simultaneously. Then, we use a contrastive learning strategy for content selection to fuse the document-level and graph-level representations. We also use a dual attention mechanism for the passage and answer. Next, we use the gating mechanism to dynamically assign weights and merge them into context information to support the question decoding by modeling their interaction. We also conduct qualitative and quantitative evaluations on the HotpotQA deep question-centric dataset, and the experimental results show that the proposed model is effective. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Temporal knowledge graph reasoning based on evolutional representation and contrastive learning.
- Author
-
Ma, Qiuying, Zhang, Xuan, Ding, ZiShuo, Gao, Chen, Shang, Weiyi, Nong, Qiong, Ma, Yubin, and Jin, Zhi
- Subjects
KNOWLEDGE graphs ,KNOWLEDGE representation (Information theory) ,DATA reduction ,DATA distribution ,DATA modeling ,AMBIGUITY - Abstract
Temporal knowledge graphs (TKGs) are a form of knowledge representation constructed based on the evolution of events at different time points. It provides an additional perspective by extending the temporal dimension for a range of downstream tasks. Given the evolving nature of events, it is essential for TKGs to reason about non-existent or future events. Most of the existing models divide the graph into multiple time snapshots and predict future events by modeling information within and between snapshots. However, since the knowledge graph inherently suffers from missing data and uneven data distribution, this time-based division leads to a drastic reduction in available data within each snapshot, which makes it difficult to learn high-quality representations of entities and relationships. In addition, the contribution of historical information changes over time, distinguishing its importance to the final results when capturing information that evolves over time. In this paper, we introduce CH-TKG (Contrastive Learning and Historical Information Learning for TKG Reasoning) to addresses issues related to data sparseness and the ambiguity of historical information weights. Firstly, we obtain embedding representations of entities and relationships with evolutionary dependencies by R-GCN and GRU. On this foundation, we introduce a novel contrastive learning method to optimize the representation of entities and relationships within individual snapshots of sparse data. Then we utilize self-attention and copy mechanisms to learn the effects of different historical data on the final inference results. We conduct extensive experiments on four datasets, and the experimental results demonstrate the effectiveness of our proposed model with sparse data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. A multi-view mask contrastive learning graph convolutional neural network for age estimation.
- Author
-
Zhang, Yiping, Shou, Yuntao, Meng, Tao, Ai, Wei, and Li, Keqin
- Subjects
CONVOLUTIONAL neural networks ,GRAPH neural networks ,MACHINE learning ,TRANSFORMER models ,FEATURE extraction - Abstract
The age estimation task aims to use facial features to predict the age of people and is widely used in public security, marketing, identification, and other fields. However, the features are mainly concentrated in facial keypoints, and existing CNN and Transformer-based methods have inflexibility and redundancy for modeling complex irregular structures. Therefore, this paper proposes a multi-view mask contrastive learning graph convolutional neural network (MMCL-GCN) for age estimation. Specifically, the overall structure of the MMCL-GCN network contains a feature extraction stage and an age estimation stage. In the feature extraction stage, we introduce a graph structure to construct face images as input and then design a multi-view mask contrastive learning (MMCL) mechanism to learn complex structural and semantic information about face images. The learning mechanism employs an asymmetric Siamese network architecture, which utilizes an online encoder–decoder structure to reconstruct the missing information from the original graph and utilizes the target encoder to learn latent representations for contrastive learning. Furthermore, to promote the two learning mechanisms better compatible and complementary, we adopt two augmentation strategies and optimize the joint losses. In the age estimation stage, we design a multi-layer extreme learning machine (ML-IELM) with identity mapping to fully use the features extracted by the online encoder. Then, a classifier and a regressor were constructed based on ML-IELM, which were used to identify the age grouping interval and accurately estimate the final age. Extensive experiments show that MMCL-GCN can effectively reduce the error of age estimation on benchmark datasets such as Adience, MORPH-II, and LAP-2016. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Advancing Incremental Few-Shot Video Action Recognition with Cluster Compression and Generative Separation.
- Author
-
Qin, Yanfei, Chu, Renxin, and Liu, Baolin
- Abstract
Few-Shot Class Incremental Learning (FSCIL) is a trending topic in deep learning, addressing the need for models to incrementally learn novel classes, particularly in real-world scenarios where continuously emerging classes come with limited labeled samples. However, the majority of FSCIL research has been dedicated to image classification and object recognition tasks, with limited attention given to video action classification. In this paper, we present a new Cluster Compression and Generative Separation (CCGS) method for Incremental Few-Shot Video Action Recognition (iFSVAR), which introduces contrastive learning to boost the degree of class separation in the base session. Simultaneously, it creates numerous fine-grained classes with diverse semantics, effectively filling the unallocated representation space. Experimental results on UCF101, Kinetics, and Something-Something-V2demonstrate the effectiveness of the framework. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Identifying fraudulent identity documents by analyzing imprinted guilloche patterns.
- Author
-
Al-Ghadi, Musab, Mondal, Tanmoy, Ming, Zuheng, Gomez-Krämer, Petra, Coustaty, Mickaël, Sidere, Nicolas, and Burie, Jean-Christophe
- Subjects
CONVOLUTIONAL neural networks ,PATTERN recognition systems ,DIGITAL technology ,INFORMATION technology security ,TRUST - Abstract
Identity document (ID) verification is crucial in fostering trust in the digital realm, especially with the increasing shift of transactions to online platforms. Our research, building upon our previous work (Al-Ghadi et al. 2023), delves deeper into ID verification by focusing on guilloche patterns. We present two innovative ID verification models leveraging contrastive and adversarial learning. These models enhance guilloche pattern detection, offering new insights into identifying counterfeit IDs. Each approach comprises two main components: (i) guilloche pattern recognition and feature generation using a convolutional neural network (CNN), and (ii) precise classification of input data as authentic or forged. We evaluate our models extensively on the MIDV and FMIDV datasets, achieving accuracy and F1-score results ranging from 68-92% and 75-100%, respectively. Our study, incorporating contrastive and adversarial learning, contributes significantly to the ongoing discourse on ID verification, specifically in analyzing guilloche patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Learning a compact embedding for fine-grained few-shot static gesture recognition.
- Author
-
Hu, Zhipeng, Qiu, Feng, Sun, Haodong, Zhang, Wei, Ding, Yu, Lv, Tangjie, and Fan, Changjie
- Subjects
GESTURE ,ACQUISITION of data ,ANNOTATIONS - Abstract
Gesture recognition and its applications have been widely studied and received much attention in recent years. Existing works on hand gesture recognition aim to train classification models based on several discrete categories, which suffers from time-consuming data collection and low perceptual granularity. Differently, this work proposes a contrastive framework for fine-grained few-shot gesture recognition. To achieve this, we construct a general and compact gesture embedding space to represent arbitrary intricate hand gestures. The embedding distance between hand gestures is consistent with their similarity, reflecting their subtle variations accurately. To learn such an embedding space, we build up a large-scale hand gesture similarity dataset named SimGesture, relying on 944,482 hand image triplets of gesture comparison annotations. Based on SimGesture, we utilize contrastive learning to train a neural network named SimGesNet being capable of projecting arbitrary hand images into a compact gesture embedding space. Our experimental results demonstrate that the learned embedding can be used with great success for few-shot gesture recognition and achieves SOTA results. We also show that our proposed gesture embedding outperforms existing embeddings in representing fine-grained gestures. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. A Depth Awareness and Learnable Feature Fusion Network for Enhanced Geometric Perception in Semantic Correspondence.
- Author
-
Li, Fazeng, Zou, Chunlong, Yun, Juntong, Huang, Li, Liu, Ying, Tao, Bo, and Xie, Yuanmin
- Abstract
Deep learning is becoming the most widely used technology for multi-sensor data fusion. Semantic correspondence has recently emerged as a foundational task, enabling a range of downstream applications, such as style or appearance transfer, robot manipulation, and pose estimation, through its ability to provide robust correspondence in RGB images with semantic information. However, current representations generated by self-supervised learning and generative models are often limited in their ability to capture and understand the geometric structure of objects, which is significant for matching the correct details in applications of semantic correspondence. Furthermore, efficiently fusing these two types of features presents an interesting challenge. Achieving harmonious integration of these features is crucial for improving the expressive power of models in various tasks. To tackle these issues, our key idea is to integrate depth information from depth estimation or depth sensors into feature maps and leverage learnable weights for feature fusion. First, depth information is used to model pixel-wise depth distributions, assigning relative depth weights to feature maps for perceiving an object's structural information. Then, based on a contrastive learning optimization objective, a series of weights are optimized to leverage feature maps from self-supervised learning and generative models. Depth features are naturally embedded into feature maps, guiding the network to learn geometric structure information about objects and alleviating depth ambiguity issues. Experiments on the SPair-71K and AP-10K datasets show that the proposed method achieves scores of 81.8 and 83.3 on the percentage of correct keypoints (PCK) at the 0.1 level, respectively. Our approach not only demonstrates significant advantages in experimental results but also introduces the depth awareness module and a learnable feature fusion module, which enhances the understanding of object structures through depth information and fully utilizes features from various pre-trained models, offering new possibilities for the application of deep learning in RGB and depth data fusion technologies. We will also continue to focus on accelerating model inference and optimizing model lightweighting, enabling our model to operate at a faster speed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Infrared Image Generation Based on Visual State Space and Contrastive Learning.
- Author
-
Li, Bing, Ma, Decao, He, Fang, Zhang, Zhili, Zhang, Daqiao, and Li, Shaopeng
- Abstract
The preparation of infrared reference images is of great significance for improving the accuracy and precision of infrared imaging guidance. However, collecting infrared data on-site is difficult and time-consuming. Fortunately, the infrared images can be obtained from the corresponding visible-light images to enrich the infrared data. To this end, this present work proposes an image translation algorithm that converts visible-light images to infrared images. This algorithm, named V2IGAN, is founded on the visual state space attention module and multi-scale feature contrastive learning loss. Firstly, we introduce a visual state space attention module designed to sharpen the generative network's focus on critical regions within visible-light images. This enhancement not only improves feature extraction but also bolsters the generator's capacity to accurately model features, ultimately enhancing the quality of generated images. Furthermore, the method incorporates a multi-scale feature contrastive learning loss function, which serves to bolster the robustness of the model and refine the detail of the generated images. Experimental results show that the V2IGAN method outperforms existing typical infrared image generation techniques in both subjective visual assessments and objective metric evaluations. This suggests that the V2IGAN method is adept at enhancing the feature representation in images, refining the details of the generated infrared images, and yielding reliable, high-quality results. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification.
- Author
-
Yang, Minghao, Wang, Zehua, Yan, Zizhuo, Wang, Wenxiang, Zhu, Qian, and Jin, Changlong
- Subjects
- *
ARTIFICIAL neural networks , *CONVOLUTIONAL neural networks , *SUPERVISED learning , *MICROBIAL genes , *FEATURE extraction - Abstract
Background: The rapid advancements in deep neural network models have significantly enhanced the ability to extract features from microbial sequence data, which is critical for addressing biological challenges. However, the scarcity and complexity of labeled microbial data pose substantial difficulties for supervised learning approaches. To address these issues, we propose DNASimCLR, an unsupervised framework designed for efficient gene sequence data feature extraction. Results: DNASimCLR leverages convolutional neural networks and the SimCLR framework, based on contrastive learning, to extract intricate features from diverse microbial gene sequences. Pre-training was conducted on two classic large scale unlabelled datasets encompassing metagenomes and viral gene sequences. Subsequent classification tasks were performed by fine-tuning the pretrained model using the previously acquired model. Our experiments demonstrate that DNASimCLR is at least comparable to state-of-the-art techniques for gene sequence classification. For convolutional neural network-based approaches, DNASimCLR surpasses the latest existing methods, clearly establishing its superiority over the state-of-the-art CNN-based feature extraction techniques. Furthermore, the model exhibits superior performance across diverse tasks in analyzing biological sequence data, showcasing its robust adaptability. Conclusions: DNASimCLR represents a robust and database-agnostic solution for gene sequence classification. Its versatility allows it to perform well in scenarios involving novel or previously unseen gene sequences, making it a valuable tool for diverse applications in genomics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. CJE-PCHF: Chinese Joint Entity and Relation Extraction Model Based on Progressive Contrastive Learning and Heterogeneous Feature Fusion.
- Author
-
He, Meng, Bai, Yunli, and Wei, Dongye
- Abstract
The joint extraction of entities and relations is a critical task in information extraction, and its performance directly affects the performance of downstream tasks. However, existing joint extraction models based on deep learning exhibit weak processing capabilities for the phenomenon of multiple pronunciations of one character and multiple characters of one pronunciation when processing Chinese texts, resulting in a performance loss. To address these issues, this paper introduces part-of-speech (POS) and pinyin features to aid the model in learning semantic features that are more contextually appropriate. We propose a Chinese Joint Entity and Relation Extraction Model based on progressive contrastive learning and heterogeneous feature fusion (CJE-PCHF). During model training, an interactive fusion network based on progressive contrastive learning is employed to learn the dependencies between pinyin, POS, and semantic features. This guides the model in heterogeneous feature fusion, capturing higher-order semantic associations between heterogeneous features. On the commonly used DuIE evaluation dataset for joint extraction, our model achieved a significant improvement, with the F1 score increasing by 5.4% compared to the benchmark model CasRel. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Enhanced knowledge graph recommendation algorithm based on multi-level contrastive learning.
- Author
-
Rong, Zhang, Yuan, Liu, and Yang, Li
- Subjects
- *
GRAPH neural networks , *KNOWLEDGE graphs , *RECOMMENDER systems , *LEARNING strategies , *ALGORITHMS - Abstract
Integrating the Knowledge Graphs (KGs) into recommendation systems enhances personalization and accuracy. However, the long-tail distribution of knowledge graphs often leads to data sparsity, which limits the effectiveness in practical applications. To address this challenge, this study proposes a knowledge-aware recommendation algorithm framework that incorporates multi-level contrastive learning. This framework enhances the Collaborative Knowledge Graph (CKG) through a random edge dropout method, which constructs feature representations at three levels: user-user interactions, item-item interactions and user-item interactions. A dynamic attention mechanism is employed in the Graph Attention Networks (GAT) for modeling the KG. Combined with the nonlinear transformation and Momentum Contrast (Moco) strategy for contrastive learning, it can effectively extract high-quality feature information. Additionally, multi-level contrastive learning, as an auxiliary self-supervised task, is jointly trained with the primary supervised task, which further enhances recommendation performance. Experimental results on the MovieLens and Amazon-books datasets demonstrate that this framework effectively improves the performance of knowledge graph-based recommendations, addresses the issue of data sparsity, and outperforms other baseline models across multiple evaluation metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Emotion Recognition Using EEG Signals and Audiovisual Features with Contrastive Learning.
- Author
-
Lee, Ju-Hwan, Kim, Jin-Young, and Kim, Hyoung-Gook
- Abstract
Multimodal emotion recognition has emerged as a promising approach to capture the complex nature of human emotions by integrating information from various sources such as physiological signals, visual behavioral cues, and audio-visual content. However, current methods often struggle with effectively processing redundant or conflicting information across modalities and may overlook implicit inter-modal correlations. To address these challenges, this paper presents a novel multimodal emotion recognition framework which integrates audio-visual features with viewers' EEG data to enhance emotion classification accuracy. The proposed approach employs modality-specific encoders to extract spatiotemporal features, which are then aligned through contrastive learning to capture inter-modal relationships. Additionally, cross-modal attention mechanisms are incorporated for effective feature fusion across modalities. The framework, comprising pre-training, fine-tuning, and testing phases, is evaluated on multiple datasets of emotional responses. The experimental results demonstrate that the proposed multimodal approach, which combines audio-visual features with EEG data, is highly effective in recognizing emotions, highlighting its potential for advancing emotion recognition systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. MGACL: Prediction Drug–Protein Interaction Based on Meta-Graph Association-Aware Contrastive Learning.
- Author
-
Zhang, Pinglu, Lin, Peng, Li, Dehai, Wang, Wanchun, Qi, Xin, Li, Jing, and Xiong, Jianshe
- Abstract
The identification of drug–target interaction (DTI) is crucial for drug discovery. However, how to reduce the graph neural network's false positives due to its bias and negative transfer in the original bipartite graph remains to be clarified. Considering that the impact of heterogeneous auxiliary information on DTI varies depending on the drug and target, we established an adaptive enhanced personalized meta-knowledge transfer network named Meta Graph Association-Aware Contrastive Learning (MGACL), which can transfer personalized heterogeneous auxiliary information from different nodes and reduce data bias. Meanwhile, we propose a novel DTI association-aware contrastive learning strategy that aligns high-frequency drug representations with learned auxiliary graph representations to prevent negative transfer. Our study improves the DTI prediction performance by about 3%, evaluated by analyzing the area under the curve (AUC) and area under the precision–recall curve (AUPRC) compared with existing methods, which is more conducive to accurately identifying drug targets for the development of new drugs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Semi-Supervised Interior Decoration Style Classification with Contrastive Mutual Learning.
- Author
-
Guo, Lichun, Zeng, Hao, Shi, Xun, Xu, Qing, Shi, Jinhui, Bai, Kui, Liang, Shuang, and Hang, Wenlong
- Subjects
- *
CONFIRMATION bias , *INTERIOR decoration , *AUTOMATIC classification , *COGNITIVE styles , *SUPERVISED learning , *SCARCITY - Abstract
Precisely identifying interior decoration styles holds substantial significance in directing interior decoration practices. Nevertheless, constructing accurate models for the automatic classification of interior decoration styles remains challenging due to the scarcity of expert annotations. To address this problem, we propose a novel pseudo-label-guided contrastive mutual learning framework (PCML) for semi-supervised interior decoration style classification by harnessing large amounts of unlabeled data. Specifically, PCML introduces two distinct subnetworks and selectively utilizes the diversified pseudo-labels generated by each for mutual supervision, thereby mitigating the issue of confirmation bias. For labeled images, the inconsistent pseudo-labels generated by the two subnetworks are employed to identify images that are prone to misclassification. We then devise an inconsistency-aware relearning (ICR) regularization model to perform a review training process. For unlabeled images, we introduce a class-aware contrastive learning (CCL) regularization to learn their discriminative feature representations using the corresponding pseudo-labels. Since the use of distinct subnetworks reduces the risk of both models producing identical erroneous pseudo-labels, CCL can reduce the possibility of noise data sampling to enhance the effectiveness of contrastive learning. The performance of PCML is evaluated on five interior decoration style image datasets. For the average AUC, accuracy, sensitivity, specificity, precision, and F1 scores, PCML obtains improvements of 1.67%, 1.72%, 3.65%, 1.0%, 4.61%, and 4.66% in comparison with the state-of-the-art method, demonstrating the superiority of our method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition.
- Author
-
Go, Changhwan, Lee, Young Han, Kim, Taewoo, Park, Nam In, and Chun, Chanjun
- Subjects
- *
ERROR rates , *SPEECH , *SECURITY systems , *CLASSIFICATION , *DEEP learning - Abstract
Speaker recognition is a technology that identifies the speaker in an input utterance by extracting speaker-distinguishable features from the speech signal. Speaker recognition is used for system security and authentication; therefore, it is crucial to extract unique features of the speaker to achieve high recognition rates. Representative methods for extracting these features include a classification approach, or utilizing contrastive learning to learn the speaker relationship between representations and then using embeddings extracted from a specific layer of the model. This paper introduces a framework for developing robust speaker recognition models through contrastive learning. This approach aims to minimize the similarity to hard negative samples—those that are genuine negatives, but have extremely similar features to the positives, leading to potential mistaken. Specifically, our proposed method trains the model by estimating hard negative samples within a mini-batch during contrastive learning, and then utilizes a cross-attention mechanism to determine speaker agreement for pairs of utterances. To demonstrate the effectiveness of our proposed method, we compared the performance of a deep learning model trained with a conventional loss function utilized in speaker recognition with that of a deep learning model trained using our proposed method, as measured by the equal error rate (EER), an objective performance metric. Our results indicate that when trained with the voxceleb2 dataset, the proposed method achieved an EER of 0.98% on the voxceleb1-E dataset and 1.84% on the voxceleb1-H dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.