Descriptor: "Contrastive learning" / Publication Type: Electronic Resources - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Contrastive learning"' showing total 123 results

Start Over Descriptor "Contrastive learning" Publication Type Electronic Resources

123 results on '"Contrastive learning"'

1. An Experimental Assessment of the Stability of Graph Contrastive Learning

Author: Sebus, Siert (author) and Sebus, Siert (author)
Abstract: The Deep Neural Network (DNN) has become a widely popular machine learning architecture thanks to its ability to learn complex behaviors from data. Standard learning strategies for DNNs however rely on the availability of large, labeled datasets. Self-Supervised Learning (SSL) is a style of learning that allows models to also use unlabeled data for training, which is typically much more abundant. SSL is being applied many different data domains such as images and natural language. One such a domain is the domain of graph data. A graph is a data structure describing a network of nodes connected by edges. Graphs are a natural way of presenting many forms of data such as molecules, social networks, and 3D meshes. The style of SSL that has found the most success on graphs is Contrastive Learning (CL). In CL, an encoder is trained to produce semantically rich representations from unlabeled input data by smartly separating task-relevant information in the input from task-irrelevant information. The encoder backbone most commonly used for Graph Contrastive Learning (GCL) is the Graph Convolutional Neural Network (GCNN). While GCNNs are the state of the art on many graph data tasks, they suffer from underfitting when made too deep. This is especially a problem for GCL as it prevents encoder complexity to scale with the large availability of unlabeled data. In this thesis, we investigate this underfitting behavior through the lens of GCNN stability. Stability refers to a model's ability to continue producing consistent outputs, even when its inputs are perturbed slightly. Theoretical work has shown that stability guarantees for GCNNs weaken when their complexity is increased. We confirm experimentally that, in many cases, GCNNs indeed grow less stable when made more complex. This a relevant finding given that learning stable representations is a prerequisite to CL. Additionally, we show in our experiments that, even when trained using CL, stability discrepanc, Computer Science
Published: 2024

2. Fault identification with limited labeled data

Author: Berenji, Amirhossein, Taghiyarrenani, Zahra, Rohani Bastami, Abbas, Berenji, Amirhossein, Taghiyarrenani, Zahra, and Rohani Bastami, Abbas
Abstract: Intelligent fault diagnosis (IFD) based on deep learning methods has shown excellent performance, however, the fact that their implementation requires massive amount of data and lack of sufficient labeled data, limits their real-world application. In this paper, we propose a two-step technique to extract fault discriminative features using unlabeled and a limited number of labeled samples for classification. To this end, we first train an Autoencoder (AE) using unlabeled samples to extract a set of potentially useful features for classification purpose and consecutively, a Contrastive Learning-based post-training is applied to make use of limited available labeled samples to improve the feature set discriminability. Our Experiments—on SEU bearing dataset—show that unsupervised feature learning using AEs improves classification performance. In addition, we demonstrate the effectiveness of the employment of contrastive learning to perform the post-training process; this strategy outperforms Cross-Entropy based post-training in limited labeled information cases. © The Author(s) 2023.
Published: 2024
Full Text: View/download PDF

3. Tracking with Joint-Embedding Predictive Architectures : Learning to track through representation learning

Author: Maus, Rickard and Maus, Rickard
Abstract: Multi-object tracking is a classic engineering problem wherein a system must keep track of the identities of a set of a priori unknown objects through a sequence, for example video. Perfect execution of this task would mean no spurious or missed detections or identities, neither swapped identities. To measure performance of tracking systems, the Higher Order Tracking Accuracy metric is often used, which takes into account both detection and association accuracy. Prior work in monocular vision-based multi-object tracking has integrated deep learning to various degrees, with deep learning based detectors and visual feature extractors being commonplace alongside motion models of varying complexities. These methods have historically combined the usage of position and appearance in their association stage using hand-crafted heuristics, featuring increasingly complex algorithms to achieve higher performance tracking. With an interest in simplifying tracking algorithms, we turn to the field of representation learning. Presenting a novel method using a Joint-Embedding Predictive Architecture, trained through a contrastive objective, we learn object feature embeddings initialized by detections from a pre-trained detector. The results are features that fuse both positional and visual features. Comparing the performance of our method on the complex DanceTrack and relatively simpler MOT17 datasets to that of the most performant heuristic-based alternative, Deep OC-SORT, we see a significant improvement of 66.1 HOTA compared to the 61.3 HOTA of Deep OC-SORT on DanceTrack. On MOT17, which features less complex motion and less training data, heuristics-based methods outperform the proposed and prior learned tracking methods. While the method lags behind the state of the art in complex scenes, which follows the tracking-by-attention paradigm, it presents a novel approach and brings with it a new avenue of possible research., Spårning av multipla objekt är ett typiskt ingenjörsproblem där ett system måste hålla reda på identiteterna hos en uppsättning på förhand okända objekt genom en sekvens, till exempel video. Att perfekt utföra denna uppgift skulle innebära inga felaktiga eller missade detektioner eller identiteter, inte heller utbytta identiteter. För att mäta prestanda hos spårningssystem används ofta metriken HOTA, som tar hänsyn till både detektions- och associationsnoggrannhet. Tidigare arbete inom monokulär vision-baserad flerobjektsspårning har integrerat djupinlärning i olika grad, med detektorer baserade på djupinlärning och visuella funktionsutdragare som är vanliga tillsammans med rörelsemodeller av varierande komplexitet. Dessa metoder har historiskt kombinerat användningen av position och utseende i deras associationsfas med hjälp av handgjorda heuristiker, med alltmer komplexa algoritmer för att uppnå högre prestanda i spårningen. Med ett intresse för att förenkla spårningsalgoritmer, vänder vi oss till fältet för representationsinlärning. Vi presenterar en ny metod som använder en prediktiv arkitektur med gemensam inbäddning, tränad genom ett kontrastivt mål, där vi lär oss objekt representationer initierade av detektioner från en förtränad detektor. Resultatet är en funktion som sammansmälter både position och visuel information. När vi jämför vår metod på det komplexa DanceTrack och det relativt enklare MOT17-datasetet med det mest presterande heuristikbaserade alternativet, Deep OC-SORT, ser vi en betydande förbättring på 66,1 HOTA jämfört med 61,3 HOTA för Deep OC-SORT på DanceTrack. På MOT17, som har mindre komplex rörelse och mindre träningsdata, presterar heuristikbaserade metoder bättre än den föreslagna och tidigare lärande spårningsmetoderna. Även om metoden ligger efter den senaste utvecklingen i komplexa scener, som följer paradigm för spårning-genom-uppmärksamhet, presenterar den ett nytt tillvägagångssätt och för med sig möjligheter för ny forskning.
Published: 2024

4. Using Satellite Images And Self-supervised Deep Learning To Detect Water Hidden Under Vegetation

Author: Iakovidis, Ioannis and Iakovidis, Ioannis
Abstract: In recent years the wide availability of high-resolution satellite images has made the remote monitoring of water resources all over the world possible. While the detection of open water from satellite images is relatively easy, a significant percentage of the water extent of wetlands is covered by vegetation. Convolutional Neural Networks have shown great success in the task of detecting wetlands in satellite images. However, these models require large amounts of manually annotated satellite images, which are slow and expensive to produce. In this paper we use self-supervised training methods to train a Convolutional Neural Network to detect water from satellite images without the use of annotated data. We use a combination of deep clustering and negative sampling based on the paper ”Unsupervised Single-Scene Semantic Segmentation for Earth Observation”, and we expand the paper by changing the clustering loss, the model architecture and implementing an ensemble model. Our final ensemble of self-supervised models outperforms a single supervised model, showing the power of self-supervision., Under de senaste åren har den breda tillgången på högupplösta satellitbilder möjliggjort fjärrövervakning av vattenresurser över hela världen. Även om det är relativt enkelt att upptäcka öppet vatten från satellitbilder, täcks en betydande andel av våtmarkernas vattenutbredning av vegetation. Lyckligtvis kan radarsignaler tränga igenom vegetation, vilket gör det möjligt för oss att upptäcka vatten gömt under vegetation från satellitradarbilder. Under de senaste åren har Convolutional Neural Networks visat stor framgång i denna uppgift. Tyvärr kräver dessa modeller stora mängder manuellt annoterade satellitbilder, vilket är långsamt och dyrt att producera. Självövervakad inlärning är ett område inom maskininlärning som syftar till att träna modeller utan användning av annoterade data. I den här artikeln använder vi självövervakad träningsmetoder för att träna en Convolutional Neural Network-baserad modell för att detektera vatten från satellitbilder utan användning av annoterade data. Vi använder en kombination av djup klustring och kontrastivt lärande baserat på artikeln ”Unsupervised Single-Scene Semantic Segmentation for Earth Observation”. Dessutom utökar vi uppsatsen genom att modifiera klustringsförlusten och modellarkitekturen som används. Efter att ha observerat hög varians i våra modellers prestanda implementerade vi också en ensemblevariant av vår modell för att få mer konsekventa resultat. Vår slutliga ensemble av självövervakade modeller överträffar en enda övervakad modell, vilket visar kraften i självövervakning.
Published: 2024

5. PromptStream: self-supervised news story discovery using topic-aware article representations

Author: Hatefi, Arezoo, Eklund, Anton, Forsman, Mona, Hatefi, Arezoo, Eklund, Anton, and Forsman, Mona
Abstract: Given the importance of identifying and monitoring news stories within the continuous flow of news articles, this paper presents PromptStream, a novel method for unsupervised news story discovery. In order to identify coherent and comprehensive stories across the stream, it is crucial to create article representations that incorporate as much topic-related information from the articles as possible. PromptStream constructs these article embeddings using cloze-style prompting. These representations continually adjust to the evolving context of the news stream through self-supervised learning, employing a contrastive loss and a memory of the most confident article-story assignments from the most recent days. Extensive experiments with real news datasets highlight the notable performance of our model, establishing a new state of the art. Additionally, we delve into selected news stories to reveal how the model’s structuring of the article stream aligns with story progression., Also part of series: LREC proceedings, ISBN: 2522-2686
Published: 2024

6. Leveraging Large Language Models for Firm-Intelligence: A RAG Framework Approach

Author: Wölner-Hanssen, Niclas and Wölner-Hanssen, Niclas
Abstract: In the wake of OpenAI's release of ChatGPT in November 2022, powered by the 175 billion parameter neural network GPT-3, the potential applications of Large Language Models (LLMs) in various sectors have become evident. One such application lies in hedge funds and trading desks where knowledge sharing is paramount. These entities often possess a wealth of firm-specific knowledge that spans different research areas and personnel expertise. Leveraging LLMs on this knowledge is challenging due to its proprietary nature, the immense data and computational demands of training LLMs, and the inherent limitations of LLMs, such as the tendency to fabricate facts. The Retrieval Augmented Generation (RAG) framework, which has recently gained traction (Shi et al., 2023), presents a solution. This thesis explores the potential of creating a firm intelligence unit using the RAG framework, leveraging research reports from Lund University Finance Society's Trading & Quantitative Research (TQR) department as a representative dataset. The envisioned AI Assistant aims to answer questions based on the TQR reports, admit ignorance when necessary, and provide detailed answer sources. This study provides insights into the theory behind LLMs and the implementation of the RAG framework and offers a comprehensive evaluation, discussing results, limitations, and future prospects for firm intelligence units.
Published: 2024

7. KFCC: A differentiation-aware and keyword-guided fine-grain code comment generation model

Author: Zhang, Rui, Qiao, Ziyue, Zhang, Chenghao, Yu, Jianjun, Zhang, Rui, Qiao, Ziyue, Zhang, Chenghao, and Yu, Jianjun
Abstract: An efficient and accurate understanding of the intent of code is an indispensable skill in computer technology, especially in collaborative engineering and experimental reproduction. AI-assisted automated code comment generator, with the goal of generating programmer-readable explanations, has been an emerging hot topic for software project comprehension. Despite promising performances, three critical issues emerged: 1) The summary comment is limited in understanding the fine-grain details of the code. 2) key-word level guidance in the model should be included for better comments generation. 3) performance of the generative model may be dampened by noises in the manual annotation. In response, we propose a novel fine-grain comment generation, a scenario of generating the statement-level comment with the assistance of method-level comment. We also propose KFCC, a differentiation-aware and keyword-guided fine-grain comment generation model. Specifically, the proposed KFCC model generates the statement-level comments by incorporating the key information extracted by the keyword extractor in a gate fusion way. To enhance the effectiveness and robustness of the proposed KFCC model, we propose a differentiation-aware enhancing encoder comprehension, letting the model distinguish significant knowledge via contrastive learning. Extensive experiments conducted on open-source projects demonstrate that the KFCC model achieves outstanding performance in six programming languages (including Ruby, Python, JavaScript, Java, etc.) on the CodeSearchNet benchmark. © 2024 The Authors
Published: 2024

8. Estimation of Degradation Degree in Road Infrastructure Based on Multi-Modal ABN Using Contrastive Learning

Author: Higashi, Takaaki, Ogawa, Naoki, Maeda, Keisuke, Ogawa, Takahiro, Haseyama, Miki, Higashi, Takaaki, Ogawa, Naoki, Maeda, Keisuke, Ogawa, Takahiro, and Haseyama, Miki
Abstract: This study presents a method for distress image classification in road infrastructures introducing self-supervised learning. Self-supervised learning is an unsupervised learning method that does not require class labels. This learning method can reduce annotation efforts and allow the application of machine learning to a large number of unlabeled images. We propose a novel distress image classification method using contrastive learning, which is a type of self-supervised learning. Contrastive learning provides image domain-specific representation, constraining such that similar images are embedded nearby in the latent space. We augment the single input distress image into multiple images by image transformations and construct the latent space, in which the augmented images are embedded close to each other. This provides a domain-specific representation of the damage in road infrastructure using a large number of unlabeled distress images. Finally, the representation obtained by contrastive learning is used to improve the distress image classification performance. The obtained contrastive learning model parameters are used for the distress image classification model. We realize the successful distress image representation by utilizing unlabeled distress images, which have been difficult to use in the past. In the experiments, we use the distress images obtained from the real world to verify the effectiveness of the proposed method for various distress types and confirm the performance improvement.
Published: 2023

9. Representation Learning and Information Fusion : Applications in Biomedical Image Processing

Author: Wetzer, Elisabeth and Wetzer, Elisabeth
Abstract: In recent years Machine Learning and in particular Deep Learning have excelled in object recognition and classification tasks in computer vision. As these methods extract features from the data itself by learning features that are relevant for a particular task, a key aspect of this remarkable success is the amount of data on which these methods train. Biomedical applications face the problem that the amount of training data is limited. In particular, labels and annotations are usually scarce and expensive to obtain as they require biological or medical expertise. One way to overcome this issue is to use additional knowledge about the data at hand. This guidance can come from expert knowledge, which puts focus on specific, relevant characteristics in the images, or geometric priors which can be used to exploit the spatial relationships in the images. This thesis presents machine learning methods for visual data that exploit such additional information and build upon classic image processing techniques, to combine the strengths of both model- and learning-based approaches. The thesis comprises five papers with applications in digital pathology. Two of them study the use and fusion of texture features within convolutional neural networks for image classification tasks. The other three papers study rotational equivariant representation learning, and show that learned, shared representations of multimodal images can be used for multimodal image registration and cross-modality image retrieval.
Published: 2023

10. Information Extraction for Test Identification in Repair Reports in the Automotive Domain

Author: Jie, Huang and Jie, Huang
Abstract: The knowledge of tests conducted on a problematic vehicle is essential for enhancing the efficiency of mechanics. Therefore, identifying the tests performed in each repair case is of utmost importance. This thesis explores techniques for extracting data from unstructured repair reports to identify component tests. The main emphasis is on developing a supervised multi-class classifier to categorize data and extract sentences that describe repair diagnoses and actions. It has been shown that incorporating a category-aware contrastive learning objective can improve the repair report classifier’s performance. The proposed approach involves training a sentence representation model based on a pre-trained model using a category-aware contrastive learning objective. Subsequently, the sentence representation model is further trained on the classification task using a loss function that combines the cross-entropy and supervised contrastive learning losses. By applying this method, the macro F1-score on the test set is increased from 90.45 to 90.73. The attempt to enhance the performance of the repair report classifier using a noisy data classifier proves unsuccessful. The noisy data classifier is trained using a prompt-based fine-tuning method, incorporating open-ended questions and two examples in the prompt. This approach achieves an F1-score of 91.09 and the resulting repair report classification datasets are found easier to classify. However, they do not contribute to an improvement in the repair report classifier’s performance. Ultimately, the repair report classifier is utilized to aid in creating the input necessary for identifying component tests. An information retrieval method is used to conduct the test identification. The incorporation of this classifier and the existing labels when creating queries leads to an improvement in the mean average precision at the top 3, 5, and 10 positions by 0.62, 0.81, and 0.35, respectively, although with a slight decrease of 0.14
Published: 2023

11. Tuned Contrastive Learning

Author: Animesh, Chaitanya, Chandraker, Manmohan1, Animesh, Chaitanya, Animesh, Chaitanya, Chandraker, Manmohan1, and Animesh, Chaitanya
Abstract: In recent times, contrastive learning has become increasingly popular for visual self-supervised representation learning owing to their state-of-the-art (SOTA) performance. Most of the modern contrastive learning methods generalize only to one positive and multiple negatives per anchor. A recent state-of-the-art, supervised contrastive (SupCon) loss, extends self-supervised contrastive learning to supervised setting by generalizing to multiple positives and negatives in a batch and improves upon the cross-entropy loss. In this thesis, we propose a novel contrastive loss function – Tuned Contrastive Learning (TCL) loss, that generalizes to multiple positives and negatives in a batch and offers parameters to tune and improve the gradient responses from hard positives and hard negatives. We provide theoretical analysis of our loss function's gradient response and show mathematically how it is better than that of SupCon loss. We empirically compare our loss function with SupCon loss and cross-entropy loss in a supervised setting on multiple classification-task datasets to show its effectiveness. We also show the stability of our loss function to a range of hyperparameter settings. Unlike SupCon loss that is only applied to supervised setting, we show how to extend TCL to self-supervised setting and empirically compare it with various SOTA self-supervised learning methods. Hence, we show that TCL achieves performance on par with SOTA methods in both supervised and self-supervised settings.
Published: 2023

12. ContrastNER : Contrastive-based Prompt Tuning for Few-shot NER

Author: Layegh, Amirhossein, Payberah, Amir H., Soylu, Ahmet, Roman, Dumitru, Matskin, Mihhail, Layegh, Amirhossein, Payberah, Amir H., Soylu, Ahmet, Roman, Dumitru, and Matskin, Mihhail
Abstract: Prompt-based language models have produced encouraging results in numerous applications, including Named Entity Recognition (NER) tasks. NER aims to identify entities in a sentence and provide their types. However, the strong performance of most available NER approaches is heavily dependent on the design of discrete prompts and a verbalizer to map the model-predicted outputs to entity categories, which are complicated undertakings. To address these challenges, we present ContrastNER, a prompt-based NER framework that employs both discrete and continuous tokens in prompts and uses a contrastive learning approach to learn the continuous prompts and forecast entity types. The experimental results demonstrate that ContrastNER obtains competitive performance to the state-of-the-art NER methods in high-resource settings and outperforms the state-of-the-art models in low-resource circumstances without requiring extensive manual prompt engineering and verbalizer design., Part of ISBN 9798350326970QC 20230919
Published: 2023
Full Text: View/download PDF

13. Utilizing contrastive learning for graph-based active learning of SAR data

Author: Brown, Jason, Zelnio, Edmund1, Garber, Frederick D, Brown, Jason, O'Neill, Riley, Calder, Jeff, Bertozzi, Andrea L, Brown, Jason, Zelnio, Edmund1, Garber, Frederick D, Brown, Jason, O'Neill, Riley, Calder, Jeff, and Bertozzi, Andrea L
Published: 2023

14. Stable Motion Primitives via Imitation and Contrastive Learning

Author: Pérez-Dattari, Rodrigo (author), Kober, J. (author), Pérez-Dattari, Rodrigo (author), and Kober, J. (author)
Abstract: Learning from humans allows nonexperts to program robots with ease, lowering the resources required to build complex robotic solutions. Nevertheless, such data-driven approaches often lack the ability to provide guarantees regarding their learned behaviors, which is critical for avoiding failures and/or accidents. In this work, we focus on reaching/point-to-point motions, where robots must always reach their goal, independently of their initial state. This can be achieved by modeling motions as dynamical systems and ensuring that they are globally asymptotically stable. Hence, we introduce a novel Contrastive Learning loss for training deep neural networks (DNN) that, when used together with an Imitation Learning loss, enforces the aforementioned stability in the learned motions. Differently from previous work, our method does not restrict the structure of its function approximator, enabling its use with arbitrary DNNs and allowing it to learn complex motions with high accuracy. We validate it using datasets and a real robot. In the former case, motions are two- and four-dimensional, modeled as first- and second-order dynamical systems. In the latter, motions are three, four, and six-dimensional, of first and second order, and are used to control a 7-DoF robot manipulator in its end effector space and joint space., Learning & Autonomous Control
Published: 2023
Full Text: View/download PDF

15. Supervised and Unsupervised Deep Learning Models for Flood Detection

Author: Yadav, Ritu and Yadav, Ritu
Abstract: Human civilization has an increasingly powerful influence on the earthsystem. Affected by climate change and land-use change, floods are occurringacross the globe and are expected to increase in the coming years. Currentsituations urge more focus on efficient monitoring of floods and detecting impactedareas. Earth observations are an invaluable source for monitoring theEarth’s surface at a large scale. In particular, the Sentinel-1 Synthetic ApertureRadar (SAR) and Sentinel-2 MultiSpectral Instrument (MSI) missionsoffer high-resolution data with frequent global revisits that are widely usedfor flood detection.Current solutions such as Copernicus Emergency Management Services(CEMS), MODIS (Moderate Resolution Imaging Spectroradiometer) globalflood product, and many others use data from Sentinel and multiple othersatellites to detect floods. Although existing solutions are helpful, they alsohave several limitations. For instance, solutions like MODIS global floodproduct detect floods solely on optical images causing poor or no detection incloudy areas. In addition, these solutions are threshold-based and often requirecriteria-based adjustments. Furthermore, these solutions do not leveragerich spatial information between neighboring pixels and don’t use temporalfeatures of time series data. Therefore, advanced processing algorithms areneeded to provide a reliable method for flood detection.This thesis presents three Deep Learning (DL) models for flood detection.The first two models are supervised segmentation models proposed todetect floods on uni-temporal Sentinel-1 SAR data. The study sites containfloods from Bolivia, Ghana, India, Mekong, Nigeria, Pakistan, Paraguay,Somalia, Spain, Sri Lanka and USA. The third model is an unsupervised spatiotemporalchange detection (CD) model that detects floods on time series ofSentinel-1 SAR data. The study sites contain floods from Slovakia, Somalia,Spain, Bolivia, Mekong, Bosnia, Australia, Scotland and Germany.The two supervise, Den mänskliga civilisationen har ett allt starkare inflytande på jordsystemet. Påverkad av klimatförändringar och förändringar i markanvändningen sker översvämningar över hela världen och förväntas öka under de kommande åren. Nuvarande situationer kräver mer fokus på effektiv övervakning av översvämningar och upptäckt av drabbade områden. Jordobservationer är en ovärderlig källa för att övervaka jordens yta i stor skala. Särskilt Sentinel- 1 Synthetic Aperture Radar (SAR) och Sentinel-2 MultiSpectral Instrument (MSI)-uppdrag erbjuder högupplösta data med frekventa globala återbesök som används ofta för att detektera översvämningar. Aktuella lösningar som Copernicus Emergency Management Services (CEMS), MODIS (Moderate Resolution Imaging Spectroradiometer) globala översvämningsprodukter och många andra använder data från Sentinel och flera andra satelliter för att upptäcka översvämningar. Även om befintliga lösningar är användbara, har de också flera begränsningar. Till exempel upptäcker lösningar som MODIS globala översvämningsprodukt översvämningar enbart på optiska bilder som orsakar dålig eller ingen detektering i molniga områden. Dessutom är dessa lösningar tröskelbaserade och kräver ofta kriteriebaserade justeringar. Dessutom utnyttjar dessa lösningar inte rik rumslig information mellan angränsande pixlar och använder inte tidsseriedata. Därför behövs avancerade bearbetningsalgoritmer för att tillhandahålla en tillförlitlig metod för översvämningsdetektering. Denna avhandling presenterar tre modeller för djupinlärning (DL) för översvämningsdetektering. De två första modellerna är övervakade segmenteringsmodeller som föreslagits för att upptäcka översvämningar på uni-temporala Sentinel-1 SAR-data. Studieplatserna innehåller översvämningar från Bolivia, Ghana, Indien, Mekong, Nigeria, Pakistan, Paraguay, Somalia, Spanien, Sri Lanka och USA. Den tredje modellen är en oövervakad modell för upptäckt av spatiotemporal förändring (CD) som detekterar översvämningar på, QC 20231030
Published: 2023

16. Unsupervised Wafer Map Failure Pattern Recognition with Contrastive Learning

Author: Liu, Kevin (author) and Liu, Kevin (author)
Abstract: This master’s thesis explores the application of Self-Supervised Contrastive Learning (SSCL), specifically the SimCLR algorithm, to enhance feature representation learning from Wafer Bin Maps (WBM) in the semiconductor manufacturing process. The motivation stems from the industry’s growing need for automated defect detection and root-cause analysis as electronic devices become more complex. Traditional manual inspection methods fall short in meeting these demands due to cost and time constraints. The study successfully leverages SSCL to extract meaningful feature representations, optimizing label efficiency and improving defect pattern recognition. Furthermore, a comprehensive pipeline for analysis on Nexperia’s data is established, including data acquisition, preprocessing, training, testing, and interactive visualization of feature spaces. The research contributes to the automation of wafer map inspection, resulting in potential cost savings and enhanced process control in semiconductor manufacture, Electrical Engineering | Embedded Systems
Published: 2023

17. Bayesian Contrastive Learning on Topological Structures

Author: Möllers, Alex (author) and Möllers, Alex (author)
Abstract: In this thesis we develop a Bayesian approach to graph contrastive learning and propose a new uncertainty measure based on the disagreement in likelihood due to different positive samples. Moreover, we extend contrastive learning to simplicial complexes and show that it can be used to generate high-quality representations of edge flow data., Applied Mathematics | Stochastics
Published: 2023

18. Messing With The Gap: On The Modality Gap Phenomenon In Multimodal Contrastive Representation Learning

Author: Al-Jaff, Mohammad and Al-Jaff, Mohammad
Abstract: In machine learning, a sub-field of computer science, a two-tower architecture model is a specialised type of neural network model that encodes paired data from different modalities (like text and images, sound and video, or proteomics and gene expression profiles) into a shared latent representation space. However, when training these models using a specific contrastive loss function, known as the multimodalinfoNCE loss, seems to often lead to a unique geometric phenomenon known as the modality gap. This gap is a clear geometric separation of the embeddings of the modalities in the joint contrastive latent space. This thesis investigates the modality gap in multimodal machine learning, specifically in two-tower neural networks trained with multimodal-infoNCE loss. We examine the adequacy of the current definition of the modality gap, the conditions under which the modality gap phenomenon manifests, and its impact on representation quality and downstream task performance. The approach to address these questions consists of a two-phase experimental strategy. Phase I involves a series of experiments, ranging from toy synthetic simulations to true multimodal machine learning with complex datasets, to explore and characterise the modality gap under varying conditions. Phase II focuses on modifying the modality gap and analysing representation quality, evaluating different loss functions and their impact on the modality gap. This methodical exploration allows us to systematically dissect the emergence and implications of the modality gap phenomenon, providing insights into its impact on downstream tasks, measured with proxy metrics based on semantic clustering in the shared latent representation space and modality-specific linear probe evaluation. Our findings reveal that the modality gap definition proposed by W. Liang et al. 2022, is insufficient. We demonstrate that similar modality gap magnitudes can exhibit varying linear separability between modality embeddings in
Published: 2023

19. ProtoCLIP: Prototypical Contrastive Language Image Pretraining

Author: Chen, Delong, Wu, Zhao, Liu, Fan, Yang, Zaiquan, Zheng, Shaoqiu, Tan, Ying, Zhou, Erjin, Chen, Delong, Wu, Zhao, Liu, Fan, Yang, Zaiquan, Zheng, Shaoqiu, Tan, Ying, and Zhou, Erjin
Abstract: Contrastive language image pretraining (CLIP) has received widespread attention since its learned representations can be transferred well to various downstream tasks. During the training process of the CLIP model, the InfoNCE objective aligns positive image-text pairs and separates negative ones. We show an underlying representation grouping effect during this process: the InfoNCE objective indirectly groups semantically similar representations together via randomly emerged within-modal anchors. Based on this understanding, in this article, prototypical contrastive language image pretraining (ProtoCLIP) is introduced to enhance such grouping by boosting its efficiency and increasing its robustness against the modality gap. Specifically, ProtoCLIP sets up prototype-level discrimination between image and text spaces, which efficiently transfers higher level structural knowledge. Furthermore, prototypical back translation (PBT) is proposed to decouple representation grouping from representation alignment, resulting in effective learning of meaningful representations under a large modality gap. The PBT also enables us to introduce additional external teachers with richer prior language knowledge. ProtoCLIP is trained with an online episodic training strategy, which means it can be scaled up to unlimited amounts of data. We trained our ProtoCLIP on conceptual captions (CCs) and achieved an + 5.81% ImageNet linear probing improvement and an + 2.01% ImageNet zero-shot classification improvement. On the larger YFCC-15M dataset, ProtoCLIP matches the performance of CLIP with 33% of training time.
Published: 2023

20. Equivariant Contrastive Learning for Sequential Recommendation

Author: Zhou, Peilin, Gao, Jingqi, Xie, Yueqi, Ye, Qichen, Hua, Yining, Kim, Jae Boum, Wang, Shoujin, Kim, Sung Hun, Zhou, Peilin, Gao, Jingqi, Xie, Yueqi, Ye, Qichen, Hua, Yining, Kim, Jae Boum, Wang, Shoujin, and Kim, Sung Hun
Abstract: Contrastive learning (CL) benefits the training of sequential recommendation models with informative self-supervision signals. Existing solutions apply general sequential data augmentation strategies to generate positive pairs and encourage their representations to be invariant. However, due to the inherent properties of user behavior sequences, some augmentation strategies, such as item substitution, can lead to changes in user intent. Learning indiscriminately invariant representations for all augmentation strategies might be sub-optimal. Therefore, we propose Equivariant Contrastive Learning for Sequential Recommendation (ECL-SR), which endows SR models with great discriminative power, making the learned user behavior representations sensitive to invasive augmentations (e.g., item substitution) and insensitive to mild augmentations (e.g., feature-level dropout masking). In detail, we use the conditional discriminator to capture differences in behavior due to item substitution, which encourages the user behavior encoder to be equivariant to invasive augmentations. Comprehensive experiments on four benchmark datasets show that the proposed ECL-SR framework achieves competitive performance compared to state-of-the-art SR models. The source code is available at https://github.com/Tokkiu/ECL. © 2023 ACM.
Published: 2023

21. Semantic-enhanced Contrastive Learning for Session-based Recommendation

Author: Liu, Zhicheng, Wang, Yulong, Liu, Tongcun, Zhang, Lei, Li, Wei, Liao, Jianxin, He, Ding, Liu, Zhicheng, Wang, Yulong, Liu, Tongcun, Zhang, Lei, Li, Wei, Liao, Jianxin, and He, Ding
Abstract: Session-based recommendation aims to predict the next clicked item based on the short-term behavior sequence of an anonymous user, which is a challenging task owing to data sparsity. Although contrastive learning has been used extensively to address this problem, existing methods generally consider all other sessions in the mini-batch as negative samples, leading to a limited negative sample space and a failure to distinguish real negative samples from false negative samples. Thus, Semantic-enhanced Contrastive Learning for Session-based Recommendation (SCLRec) is proposed in this study. Specifically, a queue is designed to store session samples, and a momentum encoder is used to ensure high consistency in the large-capacity sample space. Furthermore, a novel semantic-enhanced mechanism is devised to filter out false negative samples and increase the weights of high-confidence negative samples according to semantic similarity scores between sessions, effectively reducing noise and enhancing contrastive learning. Moreover, the gated attention unit is used as the encoder to obtain excellent performance and efficiency compared with traditional attention networks. Extensive experiments on three real-world public datasets demonstrate that the proposed method achieves state-of-the-art performance with a relatively low time complexity. © 2023 Elsevier B.V.
Published: 2023

22. Deep Cross-Modal Retrieval Between Spatial Image and Acoustic Speech

Author: Qian, Xinyuan, Xue, Wei, Zhang, Qiquan, Tao, Ruijie, Li, Haizhou, Qian, Xinyuan, Xue, Wei, Zhang, Qiquan, Tao, Ruijie, and Li, Haizhou
Abstract: Cross-modal Retrieval (CMR) is formulated for the scenarios where the queries and retrieval results are of different modalities. Existing CMR studies mainly focus on the common contextualized information between text transcripts and images, and the synchronized event information in audio-visual recordings. Different from all previous works, in this paper, we investigate the geometric correspondence between images and speech recordings captured in the same space and formulate a novel CMR task, called Spatial Image-Acoustic Retrieval (SIAR). To this end, we first design a novel speech encoder that consists of convolution neural networks and transformer layers, to learn space-aware speech representations. Then, to eliminate the crossmodal inherent discrepancy, we propose the Contrastive Speech Image Retrieval (CSIR) method which uses supervised contrastive learning to attract the same-space cross-modal features while repelling the ones from different spaces. Finally, image and speech features are directly compared and we predict the SIAR result with the maximum similarity. Extensive experiments demonstrate that our proposed speech encoder can recognize space from human speeches with superior performance over the other prevailing networks. It also sets our penultimate goal of speech-to-speech retrieval. Furthermore, our CSIR proposal can successfully perform bi-directional SIAR between spatial images and reverberant speeches with promising results. Code and data will be available. IEEE
Published: 2023

23. A Primer on Contrastive Pretraining in Language Processing:Methods, Lessons Learned, and Perspectives

Author: Rethmeier, Nils, Augenstein, Isabelle, Rethmeier, Nils, and Augenstein, Isabelle
Abstract: Modern natural language processing (NLP) methods employ self-supervised pretraining objectives such as masked language modeling to boost the performance of various downstream tasks. These pretraining methods are frequently extended with recurrence, adversarial, or linguistic property masking. Recently, contrastive self-supervised training objectives have enabled successes in image representation pretraining by learning to contrast input-input pairs of augmented images as either similar or dissimilar. In NLP however, a single token augmentation can invert the meaning of a sentence during input-input contrastive learning, which led to input-output contrastive approaches that avoid the issue by instead contrasting over input-label pairs. In this primer, we summarize recent self-supervised and supervised contrastive NLP pretraining methods and describe where they are used to improve language modeling, zero to few-shot learning, pretraining data-efficiency, and specific NLP tasks. We overview key contrastive learning concepts with lessons learned from prior research and structure works by applications. Finally, we point to open challenges and future directions for contrastive NLP to encourage bringing contrastive NLP pretraining closer to recent successes in image representation pretraining.
Published: 2023

24. Graph Self-supervised Learning with Augmentation-aware Contrastive Learning

Author: Chen, Dong, Zhao, Xiang, Wang, Wei, Tan, Zhen, Xiao, Weidong, Chen, Dong, Zhao, Xiang, Wang, Wei, Tan, Zhen, and Xiao, Weidong
Abstract: Graph self-supervised learning aims to mine useful information from unlabeled graph data, and has been successfully applied to pre-train graph representations. Many existing approaches use contrastive learning to learn powerful embeddings by learning contrastively from two augmented graph views. However, none of these graph contrastive methods fully exploits the diversity of different augmentations, and hence is prone to overfitting and limited generalization ability of learned representations. In this paper, we propose a novel Graph Self-supervised Learning method with Augmentation-aware Contrastive Learning. Our method is based on the finding that the pre-trained model after adding augmentation diversity can achieve better generalization ability. To make full use of the information from the diverse augmentation method, this paper constructs new augmentation-aware prediction task which complementary with the contrastive learning task. Similar to how pre-training requires fast adaptation to different downstream tasks, we simulate train-test adaptation on the constructed tasks for further enhancing the learning ability; this strategy can be deemed as a form of meta-learning. Experimental results show that our method outperforms previous methods and learns better representations for a variety of downstream tasks. © 2023 ACM.
Published: 2023

25. HomoGCL: Rethinking Homophily in Graph Contrastive Learning

Author: Li, Wen-Zhi, Wang, Chang-Dong, Xiong, Hui, Lai, Jian-Huang, Li, Wen-Zhi, Wang, Chang-Dong, Xiong, Hui, and Lai, Jian-Huang
Abstract: Contrastive learning (CL) has become the de-facto learning paradigm in self-supervised learning on graphs, which generally follows the "augmenting-contrasting'' learning scheme. However, we observe that unlike CL in computer vision domain, CL in graph domain performs decently even without augmentation. We conduct a systematic analysis of this phenomenon and argue that homophily, i.e., the principle that "like attracts like'', plays a key role in the success of graph CL. Inspired to leverage this property explicitly, we propose HomoGCL, a model-agnostic framework to expand the positive set using neighbor nodes with neighbor-specific significances. Theoretically, HomoGCL introduces a stricter lower bound of the mutual information between raw node features and node embeddings in augmented views. Furthermore, HomoGCL can be combined with existing graph CL models in a plug-and-play way with light extra computational overhead. Extensive experiments demonstrate that HomoGCL yields multiple state-of-the-art results across six public datasets and consistently brings notable performance improvements when applied to various graph CL methods. Code is avilable at https://github.com/wenzhilics/HomoGCL. © 2023 ACM.
Published: 2023

26. Contrastive Enhanced Slide Filter Mixer for Sequential Recommendation

Author: Du, Xinyu, Yuan, Huanhuan, Zhao, Pengpeng, Fang, Junhua, Liu, Guanfeng, Liu, Yanchi, Sheng, Victor S., Zhou, Xiaofang, Du, Xinyu, Yuan, Huanhuan, Zhao, Pengpeng, Fang, Junhua, Liu, Guanfeng, Liu, Yanchi, Sheng, Victor S., and Zhou, Xiaofang
Abstract: Sequential recommendation (SR) aims to model user preferences by capturing behavior patterns from their item historical interaction data. Most existing methods model user preference in the time domain, omitting the fact that users' behaviors are also influenced by various frequency patterns that are difficult to separate in the entangled chronological items. However, few attempts have been made to train SR in the frequency domain, and it is still unclear how to use the frequency components to learn an appropriate representation for the user. To solve this problem, we shift the viewpoint to the frequency domain and propose a novel Contrastive Enhanced SLIde Filter MixEr for Sequential Recommendation, named SLIME4Rec. Specifically, we design a frequency ramp structure to allow the learnable filter slide on the frequency spectrums across different layers to capture different frequency patterns. Moreover, a Dynamic Frequency Selection (DFS) and a Static Frequency Split (SFS) module are proposed to replace the self-attention module for effectively extracting frequency information in two ways. DFS is used to select helpful frequency components dynamically, and SFS is combined with the dynamic frequency selection module to provide a more fine-grained frequency division. Finally, contrastive learning is utilized to improve the quality of user embedding learned from the frequency domain. Extensive experiments conducted on five widely used benchmark datasets demonstrate our proposed model performs significantly better than the state-of-the-art approaches. Our code is available at https://github.com/sudaada/SLIME4Rec. © 2023 IEEE.
Published: 2023

27. Region-Aware Hierarchical Graph Contrastive Learning for Ride-Hailing Driver Profiling

Author: Chen, Kehua, Han, Jindong, Feng, Siyuan, Zhu, Meixin, Yang, Hai, Chen, Kehua, Han, Jindong, Feng, Siyuan, Zhu, Meixin, and Yang, Hai
Abstract: Driver profiling, which is the process of extracting driver preferences and behavioral patterns from collected driving data, can be performed on a microscopic or macroscopic scale. Microscopic driver profiling, which uses onboard data, can be incorporated into advanced driver assistance systems or used by insurance companies to implement pay-as-you-drive programs. However, transportation network companies (TNCs) typically cannot access onboard data owing to privacy and cost concerns. Therefore, TNCs typically perform macroscopic driver profiling using the raw GPS trajectories generated by smartphones. Accurate profiling of ride-hailing drivers can enhance the user experience and order dispatching process by allowing the TNCs to accurately predict various downstream tasks such as time of arrival of rides. Notably, most of the approaches that use raw GPS trajectories for driver profiling directly leverage trajectory data without any comprehensive analysis, resulting in neglect of the rich regional semantic information and underlying correlations among trajectories. Therefore, we study the ride-hailing driver profiling problem and propose a Hierarchical Graph Contrastive Learning (HGCL) framework that can automatically learn low-dimensional embeddings encoding driver behaviors from raw GPS data. The HGCL framework consists of a hierarchical graph neural network for capturing the regional features of trajectories and a hierarchical contrastive learning strategy aimed at learning high-quality representations at different levels. The effectiveness of the proposed model is evaluated using driver representation embeddings learned from a real-world large-scale dataset for three downstream tasks. The results of extensive experiments demonstrate the efficacy of the proposed HGCL framework for driver profiling. © 2023 Elsevier Ltd
Published: 2023

28. Exploring Feature Representation Learning for Semi-supervised Medical Image Segmentation

Author: Wu, Huimin, Li, Xiaomeng, Cheng, Kwang Ting, Wu, Huimin, Li, Xiaomeng, and Cheng, Kwang Ting
Abstract: This article presents a simple yet effective two-stage framework for semi-supervised medical image segmentation. Unlike prior state-of-the-art semi-supervised segmentation methods that predominantly rely on pseudo supervision directly on predictions, such as consistency regularization and pseudo labeling, our key insight is to explore the feature representation learning with labeled and unlabeled (i.e., pseudo labeled) images to regularize a more compact and better-separated feature space, which paves the way for low-density decision boundary learning and therefore enhances the segmentation performance. A stage-adaptive contrastive learning method is proposed, containing a boundary-aware contrastive loss that takes advantage of the labeled images in the first stage, as well as a prototype-aware contrastive loss to optimize both labeled and pseudo labeled images in the second stage. To obtain more accurate prototype estimation, which plays a critical role in prototype-aware contrastive learning, we present an aleatoric uncertainty-aware method to generate higher quality pseudo labels. Aleatoric-uncertainty adaptive (AUA) adaptively regularizes prediction consistency by taking advantage of image ambiguity, which, given its significance, is underexplored by existing works. Our method achieves the best results on three public medical image segmentation benchmarks.
Published: 2023

29. Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

Author: Chen, Long, Zheng, Yuhang, Niu, Yulei, Zhang, Hanwang, Xiao, Jun, Chen, Long, Zheng, Yuhang, Niu, Yulei, Zhang, Hanwang, and Xiao, Jun
Abstract: Today's VQA models still tend to capture superficial linguistic correlations in the training set and fail to generalize to the test set with different QA distributions. To reduce these language biases, recent VQA works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on diagnostic benchmarks for out-of-distribution testing. However, due to the complex model design, ensemble-based methods are unable to equip themselves with two indispensable characteristics of an ideal VQA model: 1) Visual-explainable: The model should rely on the right visual regions when making decisions. 2) Question-sensitive: The model should be sensitive to the linguistic variations in questions. To this end, we propose a novel model-agnostic Counterfactual Samples Synthesizing and Training (CSST) strategy. After training with CSST, VQA models are forced to focus on all critical objects and words, which significantly improves both visual-explainable and question-sensitive abilities. Specifically, CSST is composed of two parts: Counterfactual Samples Synthesizing (CSS) and Counterfactual Samples Training (CST). CSS generates counterfactual samples by carefully masking critical objects in images or words in questions and assigning pseudo ground-truth answers. CST not only trains the VQA models with both complementary samples to predict respective ground-truth answers, but also urges the VQA models to further distinguish the original samples and superficially similar counterfactual ones. To facilitate the CST training, we propose two variants of supervised contrastive loss for VQA, and design an effective positive and negative sample selection mechanism based on CSS. Extensive experiments have shown the effectiveness of CSST. Particularly, by building on top of model LMH+SAR [1], [2], we achieve record-breaking performance on all out-of-distribution benchmarks ( e.g. , VQA-CP v2, VQA-CP v1, and GQA-OOD).
Published: 2023

30. Meta-Learning with label noise: A step towards label few-shot meta-learning with label noise.

Author: Galjaard, Jeroen (author) and Galjaard, Jeroen (author)
Abstract: Few-shot learning presents the challenging problem of learning a task with only a few provided examples. Gradient-Based Meta-Learners (GBML) offer a solution for learning such few-shot problems. These learners approach the few-shot problem by learning an initial parameterization that requires only a few adaptation steps for new tasks. Although these GMBLs are well-studied with correct training data, few have studied the impact of training them with noisy labels. In this thesis, we show that GMBLs are negatively affected by label noise. We propose a training strategy (BatMan-CLR) leveraging a novel subsampling approach to address the impact of meta-training with label noise. To train and evaluate the different GMBLs, we implement nmfw, a novel framework for extensible training loop definition and few-shot data generation. Our results show that BatMan-CLR is capable of learning few-shot classification models. We show that our approach can effectively mitigate the impact of meta-training label noise. Even with 60% wrong labels BatMan and Man can limit the meta-testing accuracy drop to 2.5, 9.4, and 1.1 percent points, respectively, with existing meta-learners across the Omniglot, CifarFS, and MiniImagenet datasets., https://github.com/JMGaljaard/differentiable-approx, Computer Science
Published: 2023

31. Self-supervised Learning for Tumor Microenvironment Analysis: Addressing Label Scarcity in Multiplexed Immunofluorescence Imaging with Novel Feature Extraction Techniques

Author: Spengler, Daniel (author) and Spengler, Daniel (author)
Abstract: The study of tumor microenvironments (TMEs) and immune cell composition in cancer, a disease characterized by uncontrolled growth and spread of tumor cells, has become increasingly important for understanding tumor progression and patient outcomes. Tools such as the TME-Analyzer enable this kind of research, but their manual workflows highlight a common problem in medical imaging: the scarcity of labeled data. This limits the efficiency and applicability of supervised learning algorithms to improve such medical image analysis tools. Self-supervised learning algorithms offer a promising alternative by learning feature representations without requiring labeled data. This thesis aims to address the issue of label scarcity by exploring the potential of self-supervised learning models for TME analysis involving the classification of individual cells in multiplex immunofluorescence (MxIF) microscopy images of triple-negative breast cancer (TNBC) tissue. To enable the learning of feature representations from MxIF images with an arbitrary number of color channels, this thesis proposes to pre-train an encoder network on every image channel separately according to the SimCLR algorithm and perform classification of multi-channel images by feeding the concatenated feature representation outputs of every channel to a classifier network — referred to as the Siamese configuration. A hyperparameter search is conducted to optimize the SimCLR encoder’s ability to learn high-quality feature representations of individual cells in MxIF images of TNBC tissue. Upon obtaining an optimal set of hyperparameters, the effectiveness of the learned feature representations in improving label-efficiency for individual cell classification is assessed. The results demonstrate that the proposed Siamese configuration improves the accuracy of classifying the inflammation status of TNBC tumor sections by 2.63%. Additionally, the optimal set of hyperparameters identified through the sea, Mechanical Engineering | Systems and Control
Published: 2023

32. Multi-modal learning for predicting the genotype of glioma

Author: Wei, Yiran, Chen, Xi, Zhu, Lei, Zhang, Lipei, Schönlieb, Carola-Bibiane, Price, Stephen, Li, Chao, Wei, Yiran, Chen, Xi, Zhu, Lei, Zhang, Lipei, Schönlieb, Carola-Bibiane, Price, Stephen, and Li, Chao
Abstract: The isocitrate dehydrogenase (IDH) gene mutation is an essential biomarker for the diagnosis and prognosis of glioma. It is promising to better predict glioma genotype by integrating focal tumor image and geometric features with brain network features derived from MRI. Convolutional neural networks show reasonable performance in predicting IDH mutation, which, however, cannot learn from non-Euclidean data, e.g., geometric and network data. In this study, we propose a multi-modal learning framework using three separate encoders to extract features of focal tumor image, tumor geometrics and global brain networks. To mitigate the limited availability of diffusion MRI, we develop a self-supervised approach to generate brain networks from anatomical multi-sequence MRI. Moreover, to extract tumor-related features from the brain network, we design a hierarchical attention module for the brain network encoder. Further, we design a bi-level multi-modal contrastive loss to align the multi-modal features and tackle the domain gap at the focal tumor and global brain. Finally, we propose a weighted population graph to integrate the multi-modal features for genotype prediction. Experimental results on the testing set show that the proposed model outperforms the baseline deep learning models. The ablation experiments validate the performance of different components of the framework. The visualized interpretation corresponds to clinical knowledge with further validation. In conclusion, the proposed learning framework provides a novel approach for predicting the genotype of glioma.
Published: 2023

33. Feature-Level Deeper Self-Attention Network With Contrastive Learning for Sequential Recommendation

Author: Hao, Yongjing, Zhang, Tingting, Zhao, Pengpeng, Liu, Yanchi, Sheng, Victor S., Xu, Jiajie, Liu, Guanfeng, Zhou, Xiaofang, Hao, Yongjing, Zhang, Tingting, Zhao, Pengpeng, Liu, Yanchi, Sheng, Victor S., Xu, Jiajie, Liu, Guanfeng, and Zhou, Xiaofang
Abstract: Sequential recommendation, which aims to recommend next item that the user will likely interact in a near future, has become essential in various Internet applications. Existing methods usually consider the transition patterns between items, but ignore the transition patterns between features of items. We argue that only the item-level sequences cannot reveal the full sequential patterns, while explicit and implicit feature-level sequences can help extract the full sequential patterns. Meanwhile, the item-level sequential recommendation also suffers from limited supervised signal issues. In this paper, we propose a novel model Feature-level Deeper Self-Attention Network with Contrastive Learning (FDSA-CL) for sequential recommendation. Specifically, FDSA-CL first integrates various heterogeneous features of items into feature-level sequences with different weights through a vanilla attention mechanism. After that, FDSA-CL applies separated self-attention blocks on item-level sequences and feature-level sequences, respectively, to model item transition patterns and feature transition patterns. Moreover, we propose contrastive learning and item feature recommendation tasks to capture the embedding commonality and further utilize the beneficial interaction among the two levels, so as to alleviate the sparsity of the supervised signal and extract the most critical information. Finally, we jointly optimize the above tasks. We evaluate the proposed model using two real-world datasets and experimental results show that our model significantly outperforms the state-of-the-art approaches.
Published: 2023

34. Uncertainty-aware Contrastive Distillation for Incremental Semantic Segmentation

Author: Yang, Guanglei, Fini, Enrico, Xu, Dan, Rota, Paolo, Ding, Mingli, Nabi, Moin, Alameda-Pineda, Xavier, Ricci, Elisa, Yang, Guanglei, Fini, Enrico, Xu, Dan, Rota, Paolo, Ding, Mingli, Nabi, Moin, Alameda-Pineda, Xavier, and Ricci, Elisa
Abstract: A fundamental and challenging problem in deep learning is catastrophic forgetting, the tendency of neural networks to fail to preserve the knowledge acquired from old tasks when learning new tasks. This problem has been widely investigated in the research community and several Incremental Learning approaches have been proposed in the past years. While earlier works in computer vision have mostly focused on image classification and object detection, more recently some IL approaches for semantic segmentation have been introduced. These previous works showed that, despite its simplicity, knowledge distillation can be effectively employed to alleviate catastrophic forgetting. In this paper, we follow this research direction and, inspired by recent literature on contrastive learning, we propose a novel distillation framework, Uncertainty-aware Contrastive Distillation. In a nutshell, is operated by introducing a novel distillation loss that takes into account all the images in a mini-batch, enforcing similarity between features associated to all the pixels from the same classes, and pulling apart those corresponding to pixels from different classes. Our experimental results demonstrate the advantage of the proposed distillation technique, which can be used in synergy with previous IL approaches, and leads to state-of-art performance on three commonly adopted benchmarks. IEEE
Published: 2023

35. Improving Cross-View Matching with Self-Supervised Learning

Author: Cui, Jianfeng (author) and Cui, Jianfeng (author)
Abstract: We explored the possibility of improving cross-view matching performance with self-supervised learning techniques and perform interpretations in terms of the embedding space of image features. The effect of pre-training by contrastive learning is verified quantitatively by experiments, and also exhibited by visualization of the feature space., Mechanical Engineering | Vehicle Engineering
Published: 2023

36. Learnable Model Augmentation Contrastive Learning for Sequential Recommendation

Author: Hao, Yongjing, Zhao, Pengpeng, Xian, Xuefeng, Liu, Guanfeng, Zhao, Lei, Liu, Yanchi, Sheng, Victor S., Zhou, Xiaofang, Hao, Yongjing, Zhao, Pengpeng, Xian, Xuefeng, Liu, Guanfeng, Zhao, Lei, Liu, Yanchi, Sheng, Victor S., and Zhou, Xiaofang
Abstract: Sequential Recommendation (SR) methods play a crucial role in recommender systems, which aims to capture users' dynamic interest from their historical interactions. Recently, Contrastive Learning (CL), which has emerged as a successful method for sequential recommendation, utilizes various data augmentations to generate contrastive views to mine supervised signals from data to alleviate data sparsity issues. However, most existing sequential data augmentation methods may destroy semantic sequential interaction characteristics. Meanwhile, they often adopt random operations when generating contrastive views leading to suboptimal performance. To this end, in this paper, we propose a Learnable Model Augmentation Contrastive learning for sequential Recommendation (LMA4Rec). Specifically, LMA4Rec first takes the model-based augmentation method to generate constructive views. Then, LMA4Rec uses Learnable Bernoulli Dropout (LBD) to implement learnable model augmentation operations. Next, contrastive learning is used between the contrastive views to extract supervised signals. Furthermore, a novel multi-positive contrastive learning loss alleviates the supervised sparsity issue. Finally, experiments on public datasets show that our LMA4Rec method effectively improved sequential recommendation performance compared with the state-of-the-art baseline methods. IEEE
Published: 2023

37. What Really Matters for Graph Contrastive Learning-Based Recommendations? A Unified Learning Strategy

Author: Zhou, Hongwei, Gao, Min, Wang, Zongwei, Guo, Linxin, Tao, Yinghui, Li, Wentao, Zhou, Hongwei, Gao, Min, Wang, Zongwei, Guo, Linxin, Tao, Yinghui, and Li, Wentao
Abstract: Graph contrastive learning (GCL) has become increasingly popular in recommendation due to its remarkable ability to reduce reliance on labels. Typically, GCL employs data augmentation methods, e.g., structure perturbation and representation perturbation, and CL loss to enhance performance. Recent studies have shown that structural perturbation plays a minor role in GCL, but there is still a lack of exploration on the representation perturbation. Therefore, we compare the two data perturbations in detail and reveal that both of them have a limited impact on performance. Simply combining recommendation loss and CL loss can produce comparable improvements. Besides, we identify a shared principle between the designs of recommendation loss and CL loss: both aim to optimize representation by increasing similarity between a target node and its positive samples while decreasing similarity with negative samples. Based on these findings, we propose Compact Graph Contrastive Learning (CGCL), a streamlined strategy that eliminates the data augmentation and deeply unifies recommendation loss and CL loss by elegantly incorporating their respective contributions. Leveraging the benefits of the unified strategy, we discover that our model can learn a concentrated mode length distribution of representation, which can enhance the ability to debias and thus improve the performance of the recommendation. This is a novel perspective on representation learning, and we also validate its rationality through rigorous experiments. Our comprehensive study on multiple benchmark datasets demonstrates that CGCL outperforms existing GCL methods. © 2023 IEEE.
Published: 2023

38. A Co-training Approach for Noisy Time Series Learning

Author: Zhang, Weiqi, Zhang, Jianfeng, Li, Jia, Tsung, Fu-gee, Zhang, Weiqi, Zhang, Jianfeng, Li, Jia, and Tsung, Fu-gee
Abstract: In this work, we focus on robust time series representation learning. Our assumption is that real-world time series is noisy and complementary information from different views of the same time series plays an important role while analyzing noisy input. Based on this, we create two views for the input time series through two different encoders. We conduct co-training based contrastive learning iteratively to learn the encoders. Our experiments demonstrate that this co-training approach leads to a significant improvement in performance. Especially, by leveraging the complementary information from different views, our proposed TS-CoT method can mitigate the impact of data noise and corruption. Empirical evaluations on four time series benchmarks in unsupervised and semi-supervised settings reveal that TS-CoT outperforms existing methods. Furthermore, the representations learned by TS-CoT can transfer well to downstream tasks through fine-tuning. © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Published: 2023

39. Spatio-temporal Contrastive Learning-enhanced GNNs for Session-based Recommendation

Author: Wan, Zhongwei, Liu, Xin, Wang, Benyou, Qiu, Jiezhong, Li, Boyu, Guo, Ting, Chen, Guangyong, Wang, Yang, Wan, Zhongwei, Liu, Xin, Wang, Benyou, Qiu, Jiezhong, Li, Boyu, Guo, Ting, Chen, Guangyong, and Wang, Yang
Abstract: Session-based recommendation (SBR) systems aim to utilize the user’s short-term behavior sequence to predict the next item without the detailed user profile. Most recent works try to model the user preference by treating the sessions as between-item transition graphs and utilize various graph neural networks (GNNs) to encode the representations of pair-wise relations among items and their neighbors. Some of the existing GNN-based models mainly focus on aggregating information from the view of spatial graph structure, which ignores the temporal relations within neighbors of an item during message passing and the information loss results in a sub-optimal problem. Other works embrace this challenge by incorporating additional temporal information but lack sufficient interaction between the spatial and temporal patterns. To address this issue, inspired by the uniformity and alignment properties of contrastive learning techniques, we propose a novel framework called Session-based Recommendation with Spatio-temporal Contrastive Learning-enhanced GNNs (RESTC). The idea is to supplement the GNN-based main supervised recommendation task with the temporal representation via an auxiliary cross-view contrastive learning mechanism. Furthermore, a novel global collaborative filtering graph embedding is leveraged to enhance the spatial view in the main task. Extensive experiments demonstrate the significant performance of RESTC compared with the state-of-the-art baselines. We release our source code at https://github.com/SUSTechBruce/RESTC-Source-code. © 2023 Association for Computing Machinery. All rights reserved.
Published: 2023

40. CounterCLR: Counterfactual Contrastive Learning with Non-random Missing Data in Recommendation

Author: Wang, Jun, Li, Haoxuan, Zhang, Chi, Liang, Dongxu, Yu, Enyun, Ou, Wenwu, Wang, Wenjia, Wang, Jun, Li, Haoxuan, Zhang, Chi, Liang, Dongxu, Yu, Enyun, Ou, Wenwu, and Wang, Wenjia
Abstract: Recommender systems are designed to learn user preferences from observed feedback and comprise many fundamental tasks, such as rating prediction and post-click conversion rate (pCVR) prediction. However, the observed feedback usually suffer from two issues: selection bias and data sparsity, where biased and insufficient feedback seriously degrade the performance of recommender systems in terms of accuracy and ranking. Existing solutions for handling the issues, such as data imputation and inverse propensity score, are highly susceptible to additional trained imputation or propensity models. In this work, we propose a novel counterfactual contrastive learning framework for recommendation, named CounterCLR, to tackle the problem of non-random missing data by exploiting the advances in contrast learning. Specifically, the proposed CounterCLR employs a deep representation network, called CauNet, to infer non-random missing data in recommendations and perform user preference modeling by further introducing a selfsupervised contrastive learning task. Our CounterCLR mitigates the selection bias problem without the need for additional models or estimators, while also enhancing the generalization ability in cases of sparse data. Experiments on real -world datasets demonstrate the effectiveness and superiority of our method.
Published: 2023

41. Advancement of Mathematical Methods in Feature Representation Learning for Artificial Intelligence, Data Mining and Robotics.

Author: Gou, Jianping, Du, Lan, Gou, Jianping, Ou, Weihua, and Zeng, Shaoning
Subjects: Computer science, Information technology industries, 3D reconstruction, ADMM, Aspect Level Sentiment Classification, C-MAPSS, Contrasitve Learning, DCNN-BiLSTM, Dempster-Shafer evidence theory, GAT, GCN, Graph Convolutional Networks, KGE, MMD, NMS, Soft-NMS, XSS attack, YOLOX, YoloV4, adversarial equilibrium, adversarial example, adversarial learning, anchor-free, anomaly detection, anti-noise performance, aspect-based sentiment analysis, aspect-level sentiment classification, attention mechanism, background matting, black-box attack, blind image deblurring, collaborative-representation-based classification, commonsense knowledge graph, computer vision, confidence score, contrastive learning, correlation filters, cost-weighted, cross-domain classification, cross-domain sentiment classification, cross-working, cyber-physical, data analysis, decoupling, deep learning, deep neural network, deep reinforcement learning, dependency trees, dependency types, discriminative feature learning, domain adaptation, elastic optical networks, end-to-end, ensemble attack, extension theory, external knowledge, face recognition, feature extraction, feature reuse, feature transformation, fine-tuning, fusion verification, fuzzy k-means, gait adjustment, garbage quantity identification, gated learning, geometric mean metric, graph attention mechanism, graph convolutional networks, graph neural networks, hate speech detection, head detection, hypergraph matching, image aesthetic assessment, image classification, image gradient orientations, image prior, image super-resolution, industrial control systems, information-theoretic metric learning, intelligent design, iterative majorization algorithm, joint semantic learning, kNN, knowledge distillation, knowledge graph embedding, label propagation, large-margin technique, license plate recognition, logarithm norm, low-high level joint task, machine learning, matrix nuclear norm, metric learning, mixed noise removal, models and algorithms, motion deblurring, multi-order attention, multi-output, multi-source domain adaptation, multi-task learning, multi-view stereo, multidimensional scaling, n/a, object detection, pairwise constraint propagation, payloads, pedestrian detection, people counting, plug-and-play, power load forecasting, rainy image recovery, robustness, routing, modulation and spectrum assignment, scheme design, second-order fitting, second-order gradient, semantic, semi-supervised learning, similarity metric, small sample, soft-NMS, sparse channel, sparsity, stability, state reconstruction, state-dependent switching, structure from motion, switched system, syntactic, temporal knowledge graph, time delay, traffic detection, transferability quantification, uncertain temporal knowledge graph, vehicle color recognition, vehicle re-identification, video surveillance, visual tracking, word embedding
Abstract: Summary: The present reprint contains 33 articles accepted and published in the Special Issue entitled "Advancement of Mathematical Methods in Feature Representation Learning for Artificial Intelligence, Data Mining and Robotics, 2022" in the MDPI journal, Mathematics, which covers a wide range of topics connected to the theory and applications of feature representation learning for image processing, artificial intelligence, data mining and robotics. These topics include, among others, elements from image blurring, image aesthetic quality assessment, pedestrian detection, visual tracking, vehicle re-identification, face recognition, 3D reconstruction, the stability of switched systems, domain adaption, deep reinforcement, sentiment analysis, graph convolutional networks, knowledge graphs, geometric metric learning, etc. It is hoped that this reprint will be interesting and useful for those working in the area of image processing, computer vision, machine learning, natural language processing and robotics, as well as for those with backgrounds in machine learning who are willing to become familiar with recent advancements in artificial intelligence, which, today, is present in almost all aspects of human life and activities.

42. Uncertainty-aware Contrastive Distillation for Incremental Semantic Segmentation

Author: Yang, Guanglei, Fini, Enrico, Xu, Dan, Rota, Paolo, Ding, Mingli, Nabi, Moin, Alameda-Pineda, Xavier, Ricci, Elisa, Yang, Guanglei, Fini, Enrico, Xu, Dan, Rota, Paolo, Ding, Mingli, Nabi, Moin, Alameda-Pineda, Xavier, and Ricci, Elisa
Abstract: A fundamental and challenging problem in deep learning is catastrophic forgetting, the tendency of neural networks to fail to preserve the knowledge acquired from old tasks when learning new tasks. This problem has been widely investigated in the research community and several Incremental Learning approaches have been proposed in the past years. While earlier works in computer vision have mostly focused on image classification and object detection, more recently some IL approaches for semantic segmentation have been introduced. These previous works showed that, despite its simplicity, knowledge distillation can be effectively employed to alleviate catastrophic forgetting. In this paper, we follow this research direction and, inspired by recent literature on contrastive learning, we propose a novel distillation framework, Uncertainty-aware Contrastive Distillation. In a nutshell, is operated by introducing a novel distillation loss that takes into account all the images in a mini-batch, enforcing similarity between features associated to all the pixels from the same classes, and pulling apart those corresponding to pixels from different classes. Our experimental results demonstrate the advantage of the proposed distillation technique, which can be used in synergy with previous IL approaches, and leads to state-of-art performance on three commonly adopted benchmarks. IEEE
Published: 2022

43. Adaptive Contrast for Image Regression in Computer-Aided Disease Assessment

Author: Dai, Weihang, Li, Xiaomeng, Chiu, Wan Hang Keith, Kuo, Michael D., Cheng, Kwang Ting, Dai, Weihang, Li, Xiaomeng, Chiu, Wan Hang Keith, Kuo, Michael D., and Cheng, Kwang Ting
Abstract: Image regression tasks for medical applications, such as bone mineral density (BMD) estimation and left-ventricular ejection fraction (LVEF) prediction, play an important role in computer-aided disease assessment. Most deep regression methods train the neural network with a single regression loss function like MSE or L1 loss. In this paper, we propose the first contrastive learning framework for deep image regression, namely AdaCon, which consists of a feature learning branch via a novel adaptive-margin contrastive loss and a regression prediction branch. Our method incorporates label distance relationships as part of the learned feature representations, which allows for better performance in downstream regression tasks. Moreover, it can be used as a plug-and-play module to improve performance of existing regression methods. We demonstrate the effectiveness of AdaCon on two medical image regression tasks, i.e., bone mineral density estimation from X-ray images and left-ventricular ejection fraction prediction from echocardiogram videos. AdaCon leads to relative improvements of 3.3% and 5.9% in MAE over state-of-the-art BMD estimation and LVEF prediction methods, respectively.
Published: 2022

44. CLIP-RS: A Cross-modal Remote Sensing Image Retrieval Based on CLIP, a Northern Virginia Case Study

Author: Djoufack Basso, Larissa and Djoufack Basso, Larissa
Abstract: Satellite imagery research used to be an expensive research topic for companies and organizations due to the limited data and compute resources. As the computing power and storage capacity grows exponentially, a large amount of aerial and satellite images are generated and analyzed everyday for various applications. Current technological advancement and extensive data collection by numerous Internet of Things (IOT) devices and platforms have amplified labeled natural images. Such data availability catalyzed the development and performance of current state-of-the-art image classification and cross-modal models. Despite the abundance of publicly available remote sensing images, very few remote sensing (RS) images are labeled and even fewer are multi-captioned.These scarcities limit the scope of fine tuned state of the art models to at most 38 classes, based on the PatternNet data, one of the largest publicly available labeled RS data. Recent state-of-the art image-to-image retrieval and detection models in RS have shown great results. Because the text-to-image retrieval of RS images is still emerging, it still faces some challenges in the retrieval of those images.These challenges are based on the inaccurate retrieval of image categories that were not present in the training dataset and the retrieval of images from descriptive input. Motivated by those shortcomings in current cross-modal remote sensing image retrieval, we proposed CLIP-RS, a cross-modal remote sensing image retrieval platform. Our proposed framework CLIP-RS is a framework that combines a fine-tuned implementation of a recent state of the art cross-modal and text-based image retrieval model, Contrastive Language Image Pre-training (CLIP) and FAISS (Facebook AI similarity search), a library for efficient similarity search. Our implementation is deployed on a Web App for inference task on text-to-image and image-to-image retrieval of RS images collected via the Mapbox GL JS API. We used the free tier opt
Published: 2022

45. Supervised contrastive learning over prototype-label embeddings for network intrusion detection

Author: López Martín, Manuel and López Martín, Manuel
Abstract: Producción Científica, Contrastive learning makes it possible to establish similarities between samples by comparing their distances in an intermediate representation space (embedding space) and using loss functions designed to attract/repel similar/dissimilar samples. The distance comparison is based exclusively on the sample features. We propose a novel contrastive learning scheme by including the labels in the same embedding space as the features and performing the distance comparison between features and labels in this shared embedding space. Following this idea, the sample features should be close to its ground-truth (positive) label and away from the other labels (negative labels). This scheme allows to implement a supervised classification based on contrastive learning. Each embedded label will assume the role of a class prototype in embedding space, with sample features that share the label gathering around it. The aim is to separate the label prototypes while minimizing the distance between each prototype and its same-class samples. A novel set of loss functions is proposed with this objective. Loss minimization will drive the allocation of sample features and labels in embedding space. Loss functions and their associated training and prediction architectures are analyzed in detail, along with different strategies for label separation. The proposed scheme drastically reduces the number of pair-wise comparisons, thus improving model performance. In order to further reduce the number of pair-wise comparisons, this initial scheme is extended by replacing the set of negative labels by its best single representative: either the negative label nearest to the sample features or the centroid of the cluster of negative labels. This idea creates a new subset of models which are analyzed in detail. The outputs of the proposed models are the distances (in embedding space) between each sample and the label prototypes. These distances can be used to perform classification (minimum distance lab, Ministerio de Ciencia, Innovación y Universidades - Agencia Estatal de Investigación - Fondo Europeo de Desarrollo Regional (grant RTI2018-098958-B-I00)
Published: 2022

46. GMC - Geometric Multimodal Contrastive Representation Learning

Author: Poklukar, Petra, Miguel, Vasco, Yin, Hang, Melo, Francisco S., Paiva, Ana, Kragic, Danica, Poklukar, Petra, Miguel, Vasco, Yin, Hang, Melo, Francisco S., Paiva, Ana, and Kragic, Danica
Abstract: Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method comprised of two main components: i) a two level architecture consisting of modality-specific base encoder, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks., QC 20220614
Published: 2022

47. Evaluating the effects of data augmentations for specific latent features : Using self-supervised learning

Author: Ingemarsson, Markus, Henningsson, Jacob, Ingemarsson, Markus, and Henningsson, Jacob
Abstract: Supervised learning requires labeled data which is cumbersome to produce, making it costly and time-consuming. SimCLR is a self-supervising framework that uses data augmentations to learn without labels. This thesis investigates how well cropping and color distorting augmentations work for two datasets, MPI3D and Causal3DIdent. The representations learned are evaluated using representation similarity analysis. The data augmentations were meant to make the model learn invariant representations of the object shape in the images regarding it as content while ignoring unnecessary features and regarding them as style. As a result, 8 models were created, models A-H. A and E were trained using supervised learning as a benchmark for the remaining self-supervised models. B and C learned invariant features of style instead of learning invariant representations of shape. Model D learned invariant representations of shape. Although, it also regarded style-related factors as content. Model F, G, and H managed to learn invariant representations of shape with varying intensities while regarding the rest of the features as style. The conclusion was that models can learn invariant representations of features related to content using self-supervised learning with the chosen augmentations. However, the augmentation settings must be suitable for the dataset., Övervakad maskininlärning kräver annoterad data, vilket är dyrt och tidskrävande att producera. SimCLR är ett självövervakande maskininlärningsramverk som använder datamodifieringar för att lära sig utan annoteringar. Detta examensarbete utvärderar hur väl beskärning och färgförvrängande datamodifieringar fungerar för två dataset, MPI3D och Causal3DIdent. De inlärda representationerna utvärderas med hjälp av representativ likhetsanalys. Syftet med examensarbetet var att få de självövervakande maskininlärningsmodellerna att lära sig oföränderliga representationer av objektet i bilderna. Meningen med datamodifieringarna var att påverka modellens lärande så att modellen tolkar objektets form som relevant innehåll, men resterande egenskaper som icke-relevant innehåll. Åtta modeller skapades (A-H). A och E tränades med övervakad inlärning och användes som riktmärke för de självövervakade modellerna. B och C lärde sig oföränderliga representationer som bör ha betraktas som irrelevant istället för att lära sig form. Modell D lärde sig oföränderliga representationer av form men också irrelevanta representationer. Modellerna F, G och H lyckades lära sig oföränderliga representationer av form med varierande intensitet, samtidigt som de resterande egenskaperna betraktades som irrelevant. Beskärning och färgförvrängande datamodifieringarna gör således att självövervakande modeller kan lära sig oföränderliga representationer av egenskaper relaterade till relevant innehåll. Specifika inställningar för datamodifieringar måste dock vara lämpliga för datasetet.
Published: 2022

48. Improving BERTScore for Machine Translation Evaluation Through Contrastive Learning

Author: Yousuf, Oreen and Yousuf, Oreen
Abstract: Since the advent of automatic evaluation, tasks within Natural Language Processing (NLP), including Machine Translation, have been able to better utilize both time and labor resources. Later, multilingual pre-trained models (MLMs)have uplifted many languages’ capacity to participate in NLP research. Contextualized representations generated from these MLMs are both influential towards several downstream tasks and have inspired practitioners to better make sense of them. We propose the adoption of BERTScore, coupled with contrastive learning, for machine translation evaluation in lieu of BLEU - the industry leading metric. While BERTScore computes a similarity score for each token in a candidate and reference sentence, it does away with exact matches in favor of computing token similarity using contextual embeddings. We improve BERTScore via contrastive learning-based fine-tuning on MLMs. We use contrastive learning to improve BERTScore across different language pairs in both high and low resource settings (English-Hausa, English-Chinese), across three models (XLM-R, mBERT, and LaBSE) and across three domains (news,religious, combined). We also investigated both the effects of pairing relatively linguistically similar low-resource languages (Somali-Hausa), and data size on BERTScore and the corresponding Pearson correlation to human judgments. We found that reducing the distance between cross-lingual embeddings via contrastive learning leads to BERTScore having a substantially greater correlation to system-level human evaluation than BLEU for mBERT and LaBSE in all language pairs in multiple domains.
Published: 2022

49. Contrastive and attention-based multiple instance learning for the prediction of sentinel lymph node status from histopathologies of primary melanoma tumours

Author: Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group, Hernández Pérez, Carlos, Combalia Escudero, Marc, Puig Sardá, Susana, Malvehy Guilera, Josep, Vilaplana Besler, Verónica, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group, Hernández Pérez, Carlos, Combalia Escudero, Marc, Puig Sardá, Susana, Malvehy Guilera, Josep, and Vilaplana Besler, Verónica
Abstract: Sentinel lymph node status is a crucial prognosis factor for melanomas; nonetheless, the invasive surgery required to obtain it always puts the patient at risk. In this study, we develop a Deep Learning-based approach to predict lymph node metastasis from Whole Slide Images of primary tumours. Albeit very informative, these images come with complexities that hamper their use in machine learning applications, namely their large size and limited datasets. We propose a pre-training strategy based on self-supervised contrastive learning to extract better image feature representations and an attention-based Multiple Instance Learning approach to enhance the model’s performance. With this work, we quantitatively demonstrate that combining both methods improves various classification metrics and qualitatively show that contrastive learning encourages the network to output higher attention scores to tumour tissue and lower scores to image artifacts., Work supported by the Spanish Research Agency (AEI) under project PID2020-116907RB-I00 of the call MCIN/AEI/10.13039/501100011033 and the project 718/C/ 2019 funded by Fundació la Marato de TV3., Peer Reviewed, Postprint (author's final draft)
Published: 2022

50. Finding duplicate offers in the online marketplace catalogue using transformer based methods : An exploration of transformer based methods for the task of entity resolution

Author: Damian, Robert-Andrei and Damian, Robert-Andrei
Abstract: The amount of data available on the web is constantly growing, and e-commerce websites are no exception. Considering the abundance of available information, finding offers for the same product in the catalogue of different retailers represents a challenge. This problem is an interesting one and addresses the needs of multiple actors. A customer is interested in finding the best deal for the product they want to buy. A retailer wants to keep up to date with the competition and adapt its pricing strategy accordingly. Various services already offer the possibility of finding duplicate products in catalogues of e-commerce retailers, but their solutions are based on matching a Global Trade Identification Number (GTIN). This strategy is limited because a GTIN may not be made publicly available by a competitor, may be different for the same product exported by the manufacturer to different markets or may not even exist for low-value products. The field of Entity Resolution (ER), a sub-branch of Natural Language Processing (NLP), focuses on solving the issue of matching duplicate database entries when a deterministic identifier is not available. We investigate various solutions from the the field and present a new model called Spring R-SupCon that focuses on low volume datasets. Our work builds upon the recently introduced model, R-SupCon, introducing a new learning scheme that improves R-SupCon’s performance by up to 74.47% F1 score, and surpasses Ditto by up 12% F1 score for low volume datasets. Moreover, our experiments show that smaller language models can be used for ER with minimal loss in performance. This has the potential to extend the adoption of Transformer-based solutions to companies and markets where datasets are difficult to create, like it is the case for the Swedish marketplace Fyndiq., Mängden data på internet växer konstant och e-handeln är inget undantag. Konsumenter har idag många valmöjligheter varifrån de väljer att göra sina inköp från. Detta gör att det blir svårare och svårare att hitta det bästa erbjudandet. Även för återförsäljare ökar svårigheten att veta vilken konkurrent som har lägst pris. Det finns tillgängliga lösningar på detta problem men de använder produktunika identifierare såsom Global Trade Identification Number (förkortat “GTIN”). Då det finns en rad utmaningar att bara förlita sig på lösningar som baseras på GTIN behövs ett alternativt tillvägagångssätt. GTIN är exempelvis inte en offentlig information och identifieraren kan dessutom vara en annan när samma produkt erbjuds på en annan marknad. Det här projektet undersöker alternativa lösningar som inte är baserade på en deterministisk identifierare. Detta projekt förlitar sig istället på text såsom produktens namn för att fastställa matchningar mellan olika erbjudanden. En rad olika implementeringar baserade på maskininlärning och djupinlärning studeras i detta projekt. Projektet har dock ett särskilt fokus på “Transformer”-baserade språkmodeller såsom BERT. Detta projekt visar hur man generera proprietär data. Projektet föreslår även ett nytt inlärningsschema och bevisar dess fördelar., Le volume des données qui se trouve sur l’internet est en une augmentation constante et les commerces électroniques ne font pas note discordante. Le consommateur a aujourd’hui beaucoup des options quand il decide d’où faire son achat. Trouver le meilleur prix devient de plus en plus difficile. Les entreprises qui gerent cettes plates-formes ont aussi la difficulté de savoir en tous moments lesquels de ses concurrents ont le meilleur prix. Il y-a déjà des solutions en ligne qui ont l’objectif de résoudre ce problème, mais ils utilisent un identifiant de produit unique qui s’appelle Global Trade identification number (ou GTIN). Plusieurs difficultés posent des barriers sur cette solution. Par exemple, GTIN n’est pas public peut-être, ou des GTINs différents peut-être assigne par la fabricante au même produit pour distinguer des marchés différents. Ce projet étudie des solutions alternatives qui ne sont pas basées sur avoir un identifiant unique. On discute des methods qui font la décision en fonction du nom des produits, en utilisant des algorithmes d’apprentissage automatique ou d’apprentissage en profondeur. Le projet se concentre sur des solutions avec ”Transformer” modèles de langages, comme BERT. On voit aussi comme peut-on créer un ensemble de données propriétaire pour enseigner le modèle. Finalement, une nouvelle method d’apprentissage est proposée et analysée.
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

123 results on '"Contrastive learning"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources