Author: "Van De Weijer, Joost" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Van De Weijer, Joost"' showing total 877 results

Start Over Author "Van De Weijer, Joost"

877 results on '"Van De Weijer, Joost"'

1. Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier

Author: Wang, Kai, Yang, Fei, Raducanu, Bogdan, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: With the advent of large pre-trained vision-language models such as CLIP, prompt learning methods aim to enhance the transferability of the CLIP model. They learn the prompt given few samples from the downstream task given the specific class names as prior knowledge, which we term as semantic-aware classification. However, in many realistic scenarios, we only have access to few samples and knowledge of the class names (e.g., when considering instances of classes). This challenging scenario represents the semantic-agnostic discriminative case. Text-to-Image (T2I) personalization methods aim to adapt T2I models to unseen concepts by learning new tokens and endowing these tokens with the capability of generating the learned concepts. These methods do not require knowledge of class names as a semantic-aware prior. Therefore, in this paper, we first explore Textual Inversion and reveal that the new concept tokens possess both generation and classification capabilities by regarding each category as a single concept. However, learning classifiers from single-concept textual inversion is limited since the learned tokens are suboptimal for the discriminative tasks. To mitigate this issue, we propose Multi-Class textual inversion, which includes a discriminative regularization term for the token updating process. Using this technique, our method MC-TI achieves stronger Semantic-Agnostic Classification while preserving the generation capability of these modifier tokens given only few samples per category. In the experiments, we extensively evaluate MC-TI on 12 datasets covering various scenarios, which demonstrates that MC-TI achieves superior results in terms of both classification and generation outcomes., Comment: Accepted in WACV 2025. Code link: https://github.com/wangkai930418/mc_ti
Published: 2024

2. Assessing Open-world Forgetting in Generative Image Model Customization

Author: Laria, Héctor, Gomez-Villa, Alex, Marouf, Imad Eddine, Wang, Kai, Raducanu, Bogdan, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: Recent advances in diffusion models have significantly enhanced image generation capabilities. However, customizing these models with new classes often leads to unintended consequences that compromise their reliability. We introduce the concept of open-world forgetting to emphasize the vast scope of these unintended alterations, contrasting it with the well-studied closed-world forgetting, which is measurable by evaluating performance on a limited set of classes or skills. Our research presents the first comprehensive investigation into open-world forgetting in diffusion models, focusing on semantic and appearance drift of representations. We utilize zero-shot classification to analyze semantic drift, revealing that even minor model adaptations lead to unpredictable shifts affecting areas far beyond newly introduced concepts, with dramatic drops in zero-shot classification of up to 60%. Additionally, we observe significant changes in texture and color of generated content when analyzing appearance drift. To address these issues, we propose a mitigation strategy based on functional regularization, designed to preserve original capabilities while accommodating new concepts. Our study aims to raise awareness of unintended changes due to model customization and advocates for the analysis of open-world forgetting in future research on model customization and finetuning methods. Furthermore, we provide insights for developing more robust adaptation methodologies., Comment: Project page: https://hecoding.github.io/open-world-forgetting/
Published: 2024

3. Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning

Author: Yu, Lu, Tao, Zhe, Yao, Hantao, Van de Weijer, Joost, and Xu, Changsheng
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep neural networks (DNNs) excel on fixed datasets but struggle with incremental and shifting data in real-world scenarios. Continual learning addresses this challenge by allowing models to learn from new data while retaining previously learned knowledge. Existing methods mainly rely on visual features, often neglecting the rich semantic information encoded in text. The semantic knowledge available in the label information of the images, offers important semantic information that can be related with previously acquired knowledge of semantic classes. Consequently, effectively leveraging this information throughout continual learning is expected to be beneficial. To address this, we propose integrating semantic guidance within and across tasks by capturing semantic similarity using text embeddings. We start from a pre-trained CLIP model, employ the \emph{Semantically-guided Representation Learning (SG-RL)} module for a soft-assignment towards all current task classes, and use the Semantically-guided Knowledge Distillation (SG-KD) module for enhanced knowledge transfer. Experimental results demonstrate the superiority of our method on general and fine-grained datasets. Our code can be found in https://github.com/aprilsveryown/semantically-guided-continual-learning.
Published: 2024

4. Exemplar-free Continual Representation Learning via Learnable Drift Compensation

Author: Gomez-Villa, Alex, Goswami, Dipam, Wang, Kai, Bagdanov, Andrew D., Twardowski, Bartlomiej, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Exemplar-free class-incremental learning using a backbone trained from scratch and starting from a small first task presents a significant challenge for continual representation learning. Prototype-based approaches, when continually updated, face the critical issue of semantic drift due to which the old class prototypes drift to different positions in the new feature space. Through an analysis of prototype-based continual learning, we show that forgetting is not due to diminished discriminative power of the feature extractor, and can potentially be corrected by drift compensation. To address this, we propose Learnable Drift Compensation (LDC), which can effectively mitigate drift in any moving backbone, whether supervised or unsupervised. LDC is fast and straightforward to integrate on top of existing continual learning approaches. Furthermore, we showcase how LDC can be applied in combination with self-supervised CL methods, resulting in the first exemplar-free semi-supervised continual learning approach. We achieve state-of-the-art performance in both supervised and semi-supervised settings across multiple datasets. Code is available at \url{https://github.com/alviur/ldc}., Comment: Accepted to ECCV 2024
Published: 2024

5. ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement

Author: Butt, Muhammad Atif, Wang, Kai, Vazquez-Corral, Javier, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Text-to-Image (T2I) generation has made significant advancements with the advent of diffusion models. These models exhibit remarkable abilities to produce images based on textual prompts. Current T2I models allow users to specify object colors using linguistic color names. However, these labels encompass broad color ranges, making it difficult to achieve precise color matching. To tackle this challenging task, named color prompt learning, we propose to learn specific color prompts tailored to user-selected colors. Existing T2I personalization methods tend to result in color-shape entanglement. To overcome this, we generate several basic geometric objects in the target color, allowing for color and shape disentanglement during the color prompt learning. Our method, denoted as ColorPeel, successfully assists the T2I models to peel off the novel color prompts from these colored shapes. In the experiments, we demonstrate the efficacy of ColorPeel in achieving precise color generation with T2I models. Furthermore, we generalize ColorPeel to effectively learn abstract attribute concepts, including textures, materials, etc. Our findings represent a significant step towards improving precision and versatility of T2I models, offering new opportunities for creative applications and design tasks. Our project is available at https://moatifbutt.github.io/colorpeel/., Comment: Accepted in ECCV 2024
Published: 2024

6. The Expanding Scope of the Stability Gap: Unveiling its Presence in Joint Incremental Learning of Homogeneous Tasks

Author: Kamath, Sandesh, Soutif-Cormerais, Albin, van de Weijer, Joost, and Raducanu, Bogdan
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent research identified a temporary performance drop on previously learned tasks when transitioning to a new one. This drop is called the stability gap and has great consequences for continual learning: it complicates the direct employment of continually learning since the worse-case performance at task-boundaries is dramatic, it limits its potential as an energy-efficient training paradigm, and finally, the stability drop could result in a reduced final performance of the algorithm. In this paper, we show that the stability gap also occurs when applying joint incremental training of homogeneous tasks. In this scenario, the learner continues training on the same data distribution and has access to all data from previous tasks. In addition, we show that in this scenario, there exists a low-loss linear path to the next minima, but that SGD optimization does not choose this path. We perform further analysis including a finer batch-wise analysis which could provide insights towards potential solution directions., Comment: Accepted at CVPR 2024 Workshop on Continual Learning in Computer Vision (CLVision)
Published: 2024

7. Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning

Author: Goswami, Dipam, Soutif--Cormerais, Albin, Liu, Yuyang, Kamath, Sandesh, Twardowski, Bartłomiej, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Continual learning methods are known to suffer from catastrophic forgetting, a phenomenon that is particularly hard to counter for methods that do not store exemplars of previous tasks. Therefore, to reduce potential drift in the feature extractor, existing exemplar-free methods are typically evaluated in settings where the first task is significantly larger than subsequent tasks. Their performance drops drastically in more challenging settings starting with a smaller first task. To address this problem of feature drift estimation for exemplar-free methods, we propose to adversarially perturb the current samples such that their embeddings are close to the old class prototypes in the old model embedding space. We then estimate the drift in the embedding space from the old to the new model using the perturbed images and compensate the prototypes accordingly. We exploit the fact that adversarial samples are transferable from the old to the new feature space in a continual learning setting. The generation of these images is simple and computationally cheap. We demonstrate in our experiments that the proposed approach better tracks the movement of prototypes in embedding space and outperforms existing methods on several standard continual learning benchmarks as well as on fine-grained datasets. Code is available at https://github.com/dipamgoswami/ADC., Comment: Accepted at CVPR 2024
Published: 2024

8. An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates

Author: Soutif--Cormerais, Albin, Magistri, Simone, van de Weijer, Joost, and Bagdanov, Andew D.
Subjects: Computer Science - Machine Learning
Abstract: Broad, open source availability of large pretrained foundation models on the internet through platforms such as HuggingFace has taken the world of practical deep learning by storm. A classical pipeline for neural network training now typically consists of finetuning these pretrained network on a small target dataset instead of training from scratch. In the case of large models this can be done even on modest hardware using a low rank training technique known as Low-Rank Adaptation (LoRA). While Low Rank training has already been studied in the continual learning setting, existing works often consider storing the learned adapter along with the existing model but rarely attempt to modify the weights of the pretrained model by merging the LoRA with the existing weights after finishing the training of each task. In this article we investigate this setting and study the impact of LoRA rank on the forgetting of the pretraining foundation task and on the plasticity and forgetting of subsequent ones. We observe that this rank has an important impact on forgetting of both the pretraining and downstream tasks. We also observe that vision transformers finetuned in that way exhibit a sort of ``contextual'' forgetting, a behaviour that we do not observe for residual networks and that we believe has not been observed yet in previous continual learning works.
Published: 2024

9. LocInv: Localization-aware Inversion for Text-Guided Image Editing

Author: Tang, Chuanming, Wang, Kai, Yang, Fei, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large-scale Text-to-Image (T2I) diffusion models demonstrate significant generation capabilities based on textual prompts. Based on the T2I diffusion models, text-guided image editing research aims to empower users to manipulate generated images by altering the text prompts. However, existing image editing techniques are prone to editing over unintentional regions that are beyond the intended target area, primarily due to inaccuracies in cross-attention maps. To address this problem, we propose Localization-aware Inversion (LocInv), which exploits segmentation maps or bounding boxes as extra localization priors to refine the cross-attention maps in the denoising phases of the diffusion process. Through the dynamic updating of tokens corresponding to noun words in the textual input, we are compelling the cross-attention maps to closely align with the correct noun and adjective words in the text prompt. Based on this technique, we achieve fine-grained image editing over particular objects while preventing undesired changes to other regions. Our method LocInv, based on the publicly available Stable Diffusion, is extensively evaluated on a subset of the COCO dataset, and consistently obtains superior results both quantitatively and qualitatively.The code will be released at https://github.com/wangkai930418/DPL, Comment: Accepted by CVPR 2024 Workshop AI4CC
Published: 2024

10. Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-trained Vision Transformers

Author: Goswami, Dipam, Twardowski, Bartłomiej, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Few-shot class-incremental learning (FSCIL) aims to adapt the model to new classes from very few data (5 samples) without forgetting the previously learned classes. Recent works in many-shot CIL (MSCIL) (using all available training data) exploited pre-trained models to reduce forgetting and achieve better plasticity. In a similar fashion, we use ViT models pre-trained on large-scale datasets for few-shot settings, which face the critical issue of low plasticity. FSCIL methods start with a many-shot first task to learn a very good feature extractor and then move to the few-shot setting from the second task onwards. While the focus of most recent studies is on how to learn the many-shot first task so that the model generalizes to all future few-shot tasks, we explore in this work how to better model the few-shot data using pre-trained models, irrespective of how the first task is trained. Inspired by recent works in MSCIL, we explore how using higher-order feature statistics can influence the classification of few-shot classes. We identify the main challenge of obtaining a good covariance matrix from few-shot data and propose to calibrate the covariance matrix for new classes based on semantic similarity to the many-shot base classes. Using the calibrated feature statistics in combination with existing methods significantly improves few-shot continual classification on several FSCIL benchmarks. Code is available at https://github.com/dipamgoswami/FSCIL-Calibration., Comment: Accepted at CLVision workshop (CVPR 2024)
Published: 2024

11. Auxiliary Classifiers Improve Stability and Efficiency in Continual Learning

Author: Szatkowski, Filip, Yang, Fei, Twardowski, Bartłomiej, Trzciński, Tomasz, and van de Weijer, Joost
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Continual learning is crucial for applications in dynamic environments, where machine learning models must adapt to changing data distributions while retaining knowledge of previous tasks. Despite significant advancements, catastrophic forgetting - where performance on earlier tasks degrades as new information is learned - remains a key challenge. In this work, we investigate the stability of intermediate neural network layers during continual learning and explore how auxiliary classifiers (ACs) can leverage this stability to improve performance. We show that early network layers remain more stable during learning, particularly for older tasks, and that ACs applied to these layers can outperform standard classifiers on past tasks. By integrating ACs into several continual learning algorithms, we demonstrate consistent and significant performance improvements on standard benchmarks. Additionally, we explore dynamic inference, showing that AC-augmented continual learning methods can reduce computational costs by up to 60\% while maintaining or exceeding the accuracy of standard methods. Our findings suggest that ACs offer a promising avenue for enhancing continual learning models, providing both improved performance and the ability to adapt the network computation in environments where such flexibility might be required.
Published: 2024

12. Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models

Author: Li, Senmao, van de Weijer, Joost, Hu, Taihang, Khan, Fahad Shahbaz, Hou, Qibin, Wang, Yaxing, and Yang, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The success of recent text-to-image diffusion models is largely due to their capacity to be guided by a complex text prompt, which enables users to precisely describe the desired content. However, these models struggle to effectively suppress the generation of undesired content, which is explicitly requested to be omitted from the generated image in the prompt. In this paper, we analyze how to manipulate the text embeddings and remove unwanted content from them. We introduce two contributions, which we refer to as $\textit{soft-weighted regularization}$ and $\textit{inference-time text embedding optimization}$. The first regularizes the text embedding matrix and effectively suppresses the undesired content. The second method aims to further suppress the unwanted content generation of the prompt, and encourages the generation of desired content. We evaluate our method quantitatively and qualitatively on extensive experiments, validating its effectiveness. Furthermore, our method is generalizability to both the pixel-space diffusion models (i.e. DeepFloyd-IF) and the latent-space diffusion models (i.e. Stable Diffusion)., Comment: ICLR 2024. Our code is available in https://github.com/sen-mao/SuppressEOT
Published: 2024

13. Elastic Feature Consolidation for Cold Start Exemplar-Free Incremental Learning

Author: Magistri, Simone, Trinci, Tomaso, Soutif-Cormerais, Albin, van de Weijer, Joost, and Bagdanov, Andrew D.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Exemplar-Free Class Incremental Learning (EFCIL) aims to learn from a sequence of tasks without having access to previous task data. In this paper, we consider the challenging Cold Start scenario in which insufficient data is available in the first task to learn a high-quality backbone. This is especially challenging for EFCIL since it requires high plasticity, which results in feature drift which is difficult to compensate for in the exemplar-free setting. To address this problem, we propose a simple and effective approach that consolidates feature representations by regularizing drift in directions highly relevant to previous tasks and employs prototypes to reduce task-recency bias. Our method, called Elastic Feature Consolidation (EFC), exploits a tractable second-order approximation of feature drift based on an Empirical Feature Matrix (EFM). The EFM induces a pseudo-metric in feature space which we use to regularize feature drift in important directions and to update Gaussian prototypes used in a novel asymmetric cross entropy loss which effectively balances prototype rehearsal with data from new tasks. Experimental results on CIFAR-100, Tiny-ImageNet, ImageNet-Subset and ImageNet-1K demonstrate that Elastic Feature Consolidation is better able to learn new tasks by maintaining model plasticity and significantly outperform the state-of-the-art., Comment: Accepted at Twelfth International Conference on Learning Representations (ICLR 2024)
Published: 2024

14. Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference

Author: Li, Senmao, Hu, Taihang, van de Weijer, Joost, Khan, Fahad Shahbaz, Liu, Tao, Li, Linxuan, Yang, Shiqi, Wang, Yaxing, Cheng, Ming-Ming, and Yang, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: One of the main drawback of diffusion models is the slow inference time for image generation. Among the most successful approaches to addressing this problem are distillation methods. However, these methods require considerable computational resources. In this paper, we take another approach to diffusion model acceleration. We conduct a comprehensive study of the UNet encoder and empirically analyze the encoder features. This provides insights regarding their changes during the inference process. In particular, we find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps. This insight motivates us to omit encoder computation at certain adjacent time-steps and reuse encoder features of previous time-steps as input to the decoder in multiple time-steps. Importantly, this allows us to perform decoder computation in parallel, further accelerating the denoising process. Additionally, we introduce a prior noise injection method to improve the texture details in the generated image. Besides the standard text-to-image task, we also validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation. Without utilizing any knowledge distillation technique, our approach accelerates both the Stable Diffusion (SD) and DeepFloyd-IF model sampling by 41$\%$ and 24$\%$ respectively, and DiT model sampling by 34$\%$, while maintaining high-quality generation performance., Comment: NeurIPS 2024
Published: 2023

15. ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement

Author: Butt, Muhammad Atif, Wang, Kai, Vazquez-Corral, Javier, van de Weijer, Joost, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

16. Exemplar-Free Continual Representation Learning via Learnable Drift Compensation

Author: Gomez-Villa, Alex, Goswami, Dipam, Wang, Kai, Bagdanov, Andrew D., Twardowski, Bartlomiej, van de Weijer, Joost, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

17. Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models

Author: Rota, Claudio, Buzzelli, Marco, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we address the problem of enhancing perceptual quality in video super-resolution (VSR) using Diffusion Models (DMs) while ensuring temporal consistency among frames. We present StableVSR, a VSR method based on DMs that can significantly enhance the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details. We introduce the Temporal Conditioning Module (TCM) into a pre-trained DM for single image super-resolution to turn it into a VSR method. TCM uses the novel Temporal Texture Guidance, which provides it with spatially-aligned and detail-rich texture information synthesized in adjacent frames. This guides the generative process of the current frame toward high-quality and temporally-consistent results. In addition, we introduce the novel Frame-wise Bidirectional Sampling strategy to encourage the use of information from past to future and vice-versa. This strategy improves the perceptual quality of the results and the temporal consistency across frames. We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos while achieving better temporal consistency compared to existing state-of-the-art methods for VSR. The project page is available at https://github.com/claudiom4sir/StableVSR., Comment: Accepted to ECCV 2024
Published: 2023

18. Continual Learning: Applications and the Road Forward

Author: Verwimp, Eli, Aljundi, Rahaf, Ben-David, Shai, Bethge, Matthias, Cossu, Andrea, Gepperth, Alexander, Hayes, Tyler L., Hüllermeier, Eyke, Kanan, Christopher, Kudithipudi, Dhireesha, Lampert, Christoph H., Mundt, Martin, Pascanu, Razvan, Popescu, Adrian, Tolias, Andreas S., van de Weijer, Joost, Liu, Bing, Lomonaco, Vincenzo, Tuytelaars, Tinne, and van de Ven, Gido M.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Continual learning is a subfield of machine learning, which aims to allow machine learning models to continuously learn on new data, by accumulating knowledge without forgetting what was learned in the past. In this work, we take a step back, and ask: "Why should one care about continual learning in the first place?". We set the stage by examining recent continual learning papers published at four major machine learning conferences, and show that memory-constrained settings dominate the field. Then, we discuss five open problems in machine learning, and even though they might seem unrelated to continual learning at first sight, we show that continual learning will inevitably be part of their solution. These problems are model editing, personalization and specialization, on-device learning, faster (re-)training and reinforcement learning. Finally, by comparing the desiderata from these unsolved problems and the current assumptions in continual learning, we highlight and discuss four future directions for continual learning research. We hope that this work offers an interesting perspective on the future of continual learning, while displaying its potential value and the paths we have to pursue in order to make it successful. This work is the result of the many discussions the authors had at the Dagstuhl seminar on Deep Continual Learning, in March 2023.
Published: 2023

19. AViTMP: A Tracking-Specific Transformer for Single-Branch Visual Tracking

Author: Tang, Chuanming, Wang, Kai, van de Weijer, Joost, Zhang, Jianlin, and Huang, Yongmei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual object tracking is a fundamental component of transportation systems, especially for intelligent driving. Despite achieving state-of-the-art performance in visual tracking, recent single-branch trackers tend to overlook the weak prior assumptions associated with the Vision Transformer (ViT) encoder and inference pipeline in visual tracking. Moreover, the effectiveness of discriminative trackers remains constrained due to the adoption of the dual-branch pipeline. To tackle the inferior effectiveness of vanilla ViT, we propose an Adaptive ViT Model Prediction tracker (AViTMP) to design a customised tracking method. This method bridges the single-branch network with discriminative models for the first time. Specifically, in the proposed encoder AViT encoder, we introduce a tracking-tailored Adaptor module for vanilla ViT and a joint target state embedding to enrich the target-prior embedding paradigm. Then, we combine the AViT encoder with a discriminative transformer-specific model predictor to predict the accurate location. Furthermore, to mitigate the limitations of conventional inference practice, we present a novel inference pipeline called CycleTrack, which bolsters the tracking robustness in the presence of distractors via bidirectional cycle tracking verification. In the experiments, we evaluated AViTMP on eight tracking benchmarks for a comprehensive assessment, including LaSOT, LaSOTExtSub, AVisT, etc. The experimental results unequivocally establish that, under fair comparison, AViTMP achieves state-of-the-art performance, especially in terms of long-term tracking and robustness. The source code will be released at https://github.com/Tchuanm/AViTMP., Comment: IEEE Transactions on Intelligent Vehicles
Published: 2023
Full Text: View/download PDF

20. IterInv: Iterative Inversion for Pixel-Level T2I Models

Author: Tang, Chuanming, Wang, Kai, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Large-scale text-to-image diffusion models have been a ground-breaking development in generating convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques predominantly hinge on DDIM inversion as a prevalent practice rooted in Latent Diffusion Models (LDM). However, the large pretrained T2I models working on the latent space suffer from losing details due to the first compression stage with an autoencoder mechanism. Instead, other mainstream T2I pipeline working on the pixel level, such as Imagen and DeepFloyd-IF, circumvents the above problem. They are commonly composed of multiple stages, typically starting with a text-to-image stage and followed by several super-resolution stages. In this pipeline, the DDIM inversion fails to find the initial noise and generate the original image given that the super-resolution diffusion models are not compatible with the DDIM technique. According to our experimental findings, iteratively concatenating the noisy image as the condition is the root of this problem. Based on this observation, we develop an iterative inversion (IterInv) technique for this category of T2I models and verify IterInv with the open-source DeepFloyd-IF model.Specifically, IterInv employ NTI as the inversion and reconstruction of low-resolution image generation. In stages 2 and 3, we update the latent variance at each timestep to find the deterministic inversion trace and promote the reconstruction process. By combining our method with a popular image editing method, we prove the application prospects of IterInv. The code will be released upon acceptance. The code is available at \url{https://github.com/Tchuanm/IterInv.git}., Comment: Accepted paper at ICME 2024
Published: 2023

21. Technical Report for ICCV 2023 Visual Continual Learning Challenge: Continuous Test-time Adaptation for Semantic Segmentation

Author: Sójka, Damian, Liu, Yuyang, Goswami, Dipam, Cygert, Sebastian, Twardowski, Bartłomiej, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The goal of the challenge is to develop a test-time adaptation (TTA) method, which could adapt the model to gradually changing domains in video sequences for semantic segmentation task. It is based on a synthetic driving video dataset - SHIFT. The source model is trained on images taken during daytime in clear weather. Domain changes at test-time are mainly caused by varying weather conditions and times of day. The TTA methods are evaluated in each image sequence (video) separately, meaning the model is reset to the source model state before the next sequence. Images come one by one and a prediction has to be made at the arrival of each frame. Each sequence is composed of 401 images and starts with the source domain, then gradually drifts to a different one (changing weather or time of day) until the middle of the sequence. In the second half of the sequence, the domain gradually shifts back to the source one. Ground truth data is available only for the validation split of the SHIFT dataset, in which there are only six sequences that start and end with the source domain. We conduct an analysis specifically on those sequences. Ground truth data for test split, on which the developed TTA methods are evaluated for leader board ranking, are not publicly available. The proposed solution secured a 3rd place in a challenge and received an innovation award. Contrary to the solutions that scored better, we did not use any external pretrained models or specialized data augmentations, to keep the solutions as general as possible. We have focused on analyzing the distributional shift and developing a method that could adapt to changing data dynamics and generalize across different scenarios.
Published: 2023

22. Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing

Author: Wang, Kai, Yang, Fei, Yang, Shiqi, Butt, Muhammad Atif, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large-scale text-to-image generative models have been a ground-breaking development in generative AI, with diffusion models showing their astounding ability to synthesize convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques are susceptible to unintended modifications of regions outside the targeted area, such as on the background or on distractor objects which have some semantic or visual relationship with the targeted object. According to our experimental findings, inaccurate cross-attention maps are at the root of this problem. Based on this observation, we propose Dynamic Prompt Learning (DPL) to force cross-attention maps to focus on correct noun words in the text prompt. By updating the dynamic tokens for nouns in the textual input with the proposed leakage repairment losses, we achieve fine-grained image editing over particular objects while preventing undesired changes to other image regions. Our method DPL, based on the publicly available Stable Diffusion, is extensively evaluated on a wide range of images, and consistently obtains superior results both quantitatively (CLIP score, Structure-Dist) and qualitatively (on user-evaluation). We show improved prompt editing results for Word-Swap, Prompt Refinement, and Attention Re-weighting, especially for complex multi-object scenes., Comment: Neurips 2023. The code page: https://github.com/wangkai930418/DPL
Published: 2023

23. FeCAM: Exploiting the Heterogeneity of Class Distributions in Exemplar-Free Continual Learning

Author: Goswami, Dipam, Liu, Yuyang, Twardowski, Bartłomiej, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Exemplar-free class-incremental learning (CIL) poses several challenges since it prohibits the rehearsal of data from previous tasks and thus suffers from catastrophic forgetting. Recent approaches to incrementally learning the classifier by freezing the feature extractor after the first task have gained much attention. In this paper, we explore prototypical networks for CIL, which generate new class prototypes using the frozen feature extractor and classify the features based on the Euclidean distance to the prototypes. In an analysis of the feature distributions of classes, we show that classification based on Euclidean metrics is successful for jointly trained features. However, when learning from non-stationary data, we observe that the Euclidean metric is suboptimal and that feature distributions are heterogeneous. To address this challenge, we revisit the anisotropic Mahalanobis distance for CIL. In addition, we empirically show that modeling the feature covariance relations is better than previous attempts at sampling features from normal distributions and training a linear classifier. Unlike existing methods, our approach generalizes to both many- and few-shot CIL settings, as well as to domain-incremental settings. Interestingly, without updating the backbone network, our method obtains state-of-the-art results on several standard continual learning benchmarks. Code is available at https://github.com/dipamgoswami/FeCAM., Comment: Accepted at NeurIPS 2023
Published: 2023

24. Plasticity-Optimized Complementary Networks for Unsupervised Continual Learning

Author: Gomez-Villa, Alex, Twardowski, Bartlomiej, Wang, Kai, and van de Weijer, Joost
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Continuous unsupervised representation learning (CURL) research has greatly benefited from improvements in self-supervised learning (SSL) techniques. As a result, existing CURL methods using SSL can learn high-quality representations without any labels, but with a notable performance drop when learning on a many-tasks data stream. We hypothesize that this is caused by the regularization losses that are imposed to prevent forgetting, leading to a suboptimal plasticity-stability trade-off: they either do not adapt fully to the incoming data (low plasticity), or incur significant forgetting when allowed to fully adapt to a new SSL pretext-task (low stability). In this work, we propose to train an expert network that is relieved of the duty of keeping the previous knowledge and can focus on performing optimally on the new tasks (optimizing plasticity). In the second phase, we combine this new knowledge with the previous network in an adaptation-retrospection phase to avoid forgetting and initialize a new expert with the knowledge of the old network. We perform several experiments showing that our proposed approach outperforms other CURL exemplar-free methods in few- and many-task split settings. Furthermore, we show how to adapt our approach to semi-supervised continual learning (Semi-SCL) and show that we surpass the accuracy of other exemplar-free Semi-SCL methods and reach the results of some others that use exemplars., Comment: Accepted at WACV2024
Published: 2023

25. Continual Evidential Deep Learning for Out-of-Distribution Detection

Author: Aguilar, Eduardo, Raducanu, Bogdan, Radeva, Petia, and Van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Uncertainty-based deep learning models have attracted a great deal of interest for their ability to provide accurate and reliable predictions. Evidential deep learning stands out achieving remarkable performance in detecting out-of-distribution (OOD) data with a single deterministic neural network. Motivated by this fact, in this paper we propose the integration of an evidential deep learning method into a continual learning framework in order to perform simultaneously incremental object classification and OOD detection. Moreover, we analyze the ability of vacuity and dissonance to differentiate between in-distribution data belonging to old classes and OOD data. The proposed method, called CEDL, is evaluated on CIFAR-100 considering two settings consisting of 5 and 10 tasks, respectively. From the obtained results, we could appreciate that the proposed method, in addition to provide comparable results in object classification with respect to the baseline, largely outperforms OOD detection compared to several posthoc methods on three evaluation metrics: AUROC, AUPR and FPR95., Comment: Accepted at Visual Continual Learning workshop (ICCV2023)
Published: 2023

26. Trust your Good Friends: Source-free Domain Adaptation by Reciprocal Neighborhood Clustering

Author: Yang, Shiqi, Wang, Yaxing, van de Weijer, Joost, Herranz, Luis, Jui, Shangling, and Yang, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Domain adaptation (DA) aims to alleviate the domain shift between source domain and target domain. Most DA methods require access to the source data, but often that is not possible (e.g. due to data privacy or intellectual property). In this paper, we address the challenging source-free domain adaptation (SFDA) problem, where the source pretrained model is adapted to the target domain in the absence of source data. Our method is based on the observation that target data, which might not align with the source domain classifier, still forms clear clusters. We capture this intrinsic structure by defining local affinity of the target data, and encourage label consistency among data with high local affinity. We observe that higher affinity should be assigned to reciprocal neighbors. To aggregate information with more context, we consider expanded neighborhoods with small affinity values. Furthermore, we consider the density around each target sample, which can alleviate the negative impact of potential outliers. In the experimental results we verify that the inherent structure of the target features is an important source of information for domain adaptation. We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood. Finally, we achieve state-of-the-art performance on several 2D image and 3D point cloud recognition datasets., Comment: Accepted by IEEE TPAMI, extended version of conference paper arXiv:2110.04202
Published: 2023

27. ScrollNet: Dynamic Weight Importance for Continual Learning

Author: Yang, Fei, Wang, Kai, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The principle underlying most existing continual learning (CL) methods is to prioritize stability by penalizing changes in parameters crucial to old tasks, while allowing for plasticity in other parameters. The importance of weights for each task can be determined either explicitly through learning a task-specific mask during training (e.g., parameter isolation-based approaches) or implicitly by introducing a regularization term (e.g., regularization-based approaches). However, all these methods assume that the importance of weights for each task is unknown prior to data exposure. In this paper, we propose ScrollNet as a scrolling neural network for continual learning. ScrollNet can be seen as a dynamic network that assigns the ranking of weight importance for each task before data exposure, thus achieving a more favorable stability-plasticity tradeoff during sequential task learning by reassigning this ranking for different tasks. Additionally, we demonstrate that ScrollNet can be combined with various CL methods, including regularization-based and replay-based approaches. Experimental results on CIFAR100 and TinyImagenet datasets show the effectiveness of our proposed method. We release our code at https://github.com/FireFYF/ScrollNet.git., Comment: Accepted at Visual Continual Learning workshop (ICCV2023)
Published: 2023

28. A Comprehensive Empirical Evaluation on Online Continual Learning

Author: Soutif--Cormerais, Albin, Carta, Antonio, Cossu, Andrea, Hurtado, Julio, Hemati, Hamed, Lomonaco, Vincenzo, and Van de Weijer, Joost
Subjects: Computer Science - Machine Learning
Abstract: Online continual learning aims to get closer to a live learning experience by learning directly on a stream of data with temporally shifting distribution and by storing a minimum amount of data from that stream. In this empirical evaluation, we evaluate various methods from the literature that tackle online continual learning. More specifically, we focus on the class-incremental setting in the context of image classification, where the learner must learn new classes incrementally from a stream of data. We compare these methods on the Split-CIFAR100 and Split-TinyImagenet benchmarks, and measure their average accuracy, forgetting, stability, and quality of the representations, to evaluate various aspects of the algorithm at the end but also during the whole training period. We find that most methods suffer from stability and underfitting issues. However, the learned representations are comparable to i.i.d. training under the same computational budget. No clear winner emerges from the results and basic experience replay, when properly tuned and implemented, is a very strong baseline. We release our modular and extensible codebase at https://github.com/AlbinSou/ocl_survey based on the avalanche framework to reproduce our results and encourage future research., Comment: ICCV Visual Continual Learning Workshop 2023 accepted paper
Published: 2023

29. Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection

Author: Yuyang, Liu, Yang, Cong, Dipam, Goswami, Xialei, Liu, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: In incremental learning, replaying stored samples from previous tasks together with current task samples is one of the most efficient approaches to address catastrophic forgetting. However, unlike incremental classification, image replay has not been successfully applied to incremental object detection (IOD). In this paper, we identify the overlooked problem of foreground shift as the main reason for this. Foreground shift only occurs when replaying images of previous tasks and refers to the fact that their background might contain foreground objects of the current task. To overcome this problem, a novel and efficient Augmented Box Replay (ABR) method is developed that only stores and replays foreground objects and thereby circumvents the foreground shift problem. In addition, we propose an innovative Attentive RoI Distillation loss that uses spatial attention from region-of-interest (RoI) features to constrain current model to focus on the most important information from old model. ABR significantly reduces forgetting of previous classes while maintaining high plasticity in current classes. Moreover, it considerably reduces the storage requirements when compared to standard image replay. Comprehensive experiments on Pascal-VOC and COCO datasets support the state-of-the-art performance of our model.
Published: 2023

30. Improving Online Continual Learning Performance and Stability with Temporal Ensembles

Author: Soutif--Cormerais, Albin, Carta, Antonio, and Van de Weijer, Joost
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Neural networks are very effective when trained on large datasets for a large number of iterations. However, when they are trained on non-stationary streams of data and in an online fashion, their performance is reduced (1) by the online setup, which limits the availability of data, (2) due to catastrophic forgetting because of the non-stationary nature of the data. Furthermore, several recent works (Caccia et al., 2022; Lange et al., 2023) arXiv:2205.13452 showed that replay methods used in continual learning suffer from the stability gap, encountered when evaluating the model continually (rather than only on task boundaries). In this article, we study the effect of model ensembling as a way to improve performance and stability in online continual learning. We notice that naively ensembling models coming from a variety of training tasks increases the performance in online continual learning considerably. Starting from this observation, and drawing inspirations from semi-supervised learning ensembling methods, we use a lightweight temporal ensemble that computes the exponential moving average of the weights (EMA) at test time, and show that it can drastically increase the performance and stability when used in combination with several methods from the literature., Comment: CoLLAs 2023 accepted paper
Published: 2023

31. Density Map Distillation for Incremental Object Counting

Author: Wu, Chenshen and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We investigate the problem of incremental learning for object counting, where a method must learn to count a variety of object classes from a sequence of datasets. A na\"ive approach to incremental object counting would suffer from catastrophic forgetting, where it would suffer from a dramatic performance drop on previous tasks. In this paper, we propose a new exemplar-free functional regularization method, called Density Map Distillation (DMD). During training, we introduce a new counter head for each task and introduce a distillation loss to prevent forgetting of previous tasks. Additionally, we introduce a cross-task adaptor that projects the features of the current backbone to the previous backbone. This projector allows for the learning of new features while the backbone retains the relevant features for previous tasks. Finally, we set up experiments of incremental learning for counting new objects. Results confirm that our method greatly reduces catastrophic forgetting and outperforms existing methods., Comment: Accepted at CVPR2023: Workshop on Continual Learning in Computer Vision
Published: 2023

32. Projected Latent Distillation for Data-Agnostic Consolidation in Distributed Continual Learning

Author: Carta, Antonio, Cossu, Andrea, Lomonaco, Vincenzo, Bacciu, Davide, and van de Weijer, Joost
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Neural and Evolutionary Computing
Abstract: Distributed learning on the edge often comprises self-centered devices (SCD) which learn local tasks independently and are unwilling to contribute to the performance of other SDCs. How do we achieve forward transfer at zero cost for the single SCDs? We formalize this problem as a Distributed Continual Learning scenario, where SCD adapt to local tasks and a CL model consolidates the knowledge from the resulting stream of models without looking at the SCD's private data. Unfortunately, current CL methods are not directly applicable to this scenario. We propose Data-Agnostic Consolidation (DAC), a novel double knowledge distillation method that consolidates the stream of SC models without using the original data. DAC performs distillation in the latent space via a novel Projected Latent Distillation loss. Experimental results show that DAC enables forward transfer between SCDs and reaches state-of-the-art accuracy on Split CIFAR100, CORe50 and Split TinyImageNet, both in reharsal-free and distributed CL scenarios. Somewhat surprisingly, even a single out-of-distribution image is sufficient as the only source of data during consolidation.
Published: 2023

33. 3D-Aware Multi-Class Image-to-Image Translation with NeRFs

Author: Li, Senmao, van de Weijer, Joost, Wang, Yaxing, Khan, Fahad Shahbaz, Liu, Meiqin, and Yang, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advances in 3D-aware generative models (3D-aware GANs) combined with Neural Radiance Fields (NeRF) have achieved impressive results. However no prior works investigate 3D-aware GANs for 3D consistent multi-class image-to-image (3D-aware I2I) translation. Naively using 2D-I2I translation methods suffers from unrealistic shape/identity change. To perform 3D-aware multi-class I2I translation, we decouple this learning process into a multi-class 3D-aware GAN step and a 3D-aware I2I translation step. In the first step, we propose two novel techniques: a new conditional architecture and an effective training strategy. In the second step, based on the well-trained multi-class 3D-aware GAN architecture, that preserves view-consistency, we construct a 3D-aware I2I translation system. To further reduce the view-consistency problems, we propose several new techniques, including a U-net-like adaptor network design, a hierarchical representation constrain and a relative regularization loss. In extensive experiments on two datasets, quantitative and qualitative results demonstrate that we successfully perform 3D-aware I2I translation with multi-view consistency., Comment: Accepted by CVPR2023
Published: 2023

34. StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

Author: Li, Senmao, van de Weijer, Joost, Hu, Taihang, Khan, Fahad Shahbaz, Hou, Qibin, Wang, Yaxing, and Yang, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images. They either finetune the model, or invert the image in the latent space of the pretrained model. However, they suffer from two problems: (1) Unsatisfying results for selected regions, and unexpected changes in nonselected regions. (2) They require careful text prompt editing where the prompt should include all visual objects in the input image. To address this, we propose two improvements: (1) Only optimizing the input of the value linear network in the cross-attention layers, is sufficiently powerful to reconstruct a real image. (2) We propose attention regularization to preserve the object-like attention maps after editing, enabling us to obtain accurate style editing without invoking significant structural changes. We further improve the editing technique which is used for the unconditional branch of classifier-free guidance, as well as the conditional one as used by P2P. Extensive experimental prompt-editing results on a variety of images, demonstrate qualitatively and quantitatively that our method has superior editing capabilities than existing and concurrent works.
Published: 2023

35. ICICLE: Interpretable Class Incremental Continual Learning

Author: Rymarczyk, Dawid, van de Weijer, Joost, Zieliński, Bartosz, and Twardowski, Bartłomiej
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Continual learning enables incremental learning of new tasks without forgetting those previously learned, resulting in positive knowledge transfer that can enhance performance on both new and old tasks. However, continual learning poses new challenges for interpretability, as the rationale behind model predictions may change over time, leading to interpretability concept drift. We address this problem by proposing Interpretable Class-InCremental LEarning (ICICLE), an exemplar-free approach that adopts a prototypical part-based approach. It consists of three crucial novelties: interpretability regularization that distills previously learned concepts while preserving user-friendly positive reasoning; proximity-based prototype initialization strategy dedicated to the fine-grained setting; and task-recency bias compensation devoted to prototypical parts. Our experimental results demonstrate that ICICLE reduces the interpretability concept drift and outperforms the existing exemplar-free methods of common class-incremental learning when applied to concept-based models., Comment: Accepted to ICCV 2023
Published: 2023

36. The male bias can be attenuated in reading: on the resolution of anaphoric expressions following gender-fair forms in French

Author: Tibblin, Julia, Granfeldt, Jonas, van de Weijer, Joost, and Gygax, Pascal
Abstract: Despite the increased use of different types of gender-fair forms in French, studies investigating how they are interpreted when presented in a sentence remain few. To fill this gap, we conducted a pre-registered study using a timed sentence evaluation task to examine the possibility of speakers’ establishing an anaphoric relationship between a gendered anaphoric expression (femmes ‘women’ or hommes ‘men’) and non-stereotyped role nouns as antecedents. The antecedents were presented in their masculine form or in one out of three different gender-fair forms (complete double forms: les voisines et voisins ‘the neighbours.FEM and neighbours.MASC’, contracted double forms: les voisin·es ‘the neighbours.MASC·FEM’, or gender-neutral forms: le voisinage ‘the neighbourhood’). In line with previous findings, the masculine form led to a male bias in the participants’ mental representations of gender. All three examined gender-fair forms resolved this bias, but comparisons of the different forms to each other revealed differences between them. The results show that complete double forms lead to equally balanced mental representations of gender while contracted double forms slightly favour representation of women. Finally, gender-neutral forms result in a small male bias, although significantly smaller than the one produced by the masculine form. The results are discussed in relation to the mental models theory and provide new and important insights on how gender-fair forms in French are interpreted.
Published: 2023

37. Towards Label-Efficient Incremental Learning: A Survey

Author: Kilickaya, Mert, van de Weijer, Joost, and Asano, Yuki M.
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: The current dominant paradigm when building a machine learning model is to iterate over a dataset over and over until convergence. Such an approach is non-incremental, as it assumes access to all images of all categories at once. However, for many applications, non-incremental learning is unrealistic. To that end, researchers study incremental learning, where a learner is required to adapt to an incoming stream of data with a varying distribution while preventing forgetting of past knowledge. Significant progress has been made, however, the vast majority of works focus on the fully supervised setting, making these algorithms label-hungry thus limiting their real-life deployment. To that end, in this paper, we make the first attempt to survey recently growing interest in label-efficient incremental learning. We identify three subdivisions, namely semi-, few-shot- and self-supervised learning to reduce labeling efforts. Finally, we identify novel directions that can further enhance label-efficiency and improve incremental learning scalability. Project website: https://github.com/kilickaya/label-efficient-il.
Published: 2023

38. Recursive Focus Prosody

Author: Ishihara, Shinichiro, van de Weijer, Joost, Kim, Jong-Bok, Editor-in-Chief, Huang, Chu-Ren, Editor-in-Chief, Dita, Shirley N., Associate Editor, Harada, Yasunari, Associate Editor, Rau, Victoria, Associate Editor, Wakefield, John, Associate Editor, Xu, Jie, Associate Editor, Ting, Jen, editor, and Hsu, Yu-Yin, editor
Published: 2024
Full Text: View/download PDF

39. Exemplar-free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation

Author: Cotogni, Marco, Yang, Fei, Cusano, Claudio, Bagdanov, Andrew D., and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a new method for exemplar-free class incremental training of ViTs. The main challenge of exemplar-free continual learning is maintaining plasticity of the learner without causing catastrophic forgetting of previously learned tasks. This is often achieved via exemplar replay which can help recalibrate previous task classifiers to the feature drift which occurs when learning new tasks. Exemplar replay, however, comes at the cost of retaining samples from previous tasks which for many applications may not be possible. To address the problem of continual ViT training, we first propose gated class-attention to minimize the drift in the final ViT transformer block. This mask-based gating is applied to class-attention mechanism of the last transformer block and strongly regulates the weights crucial for previous tasks. Importantly, gated class-attention does not require the task-ID during inference, which distinguishes it from other parameter isolation methods. Secondly, we propose a new method of feature drift compensation that accommodates feature drift in the backbone when learning new tasks. The combination of gated class-attention and cascaded feature drift compensation allows for plasticity towards new tasks while limiting forgetting of previous ones. Extensive experiments performed on CIFAR-100, Tiny-ImageNet and ImageNet100 demonstrate that our exemplar-free method obtains competitive results when compared to rehearsal based ViT methods.
Published: 2022

40. Attribution-aware Weight Transfer: A Warm-Start Initialization for Class-Incremental Semantic Segmentation

Author: Goswami, Dipam, Schuster, René, van de Weijer, Joost, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In class-incremental semantic segmentation (CISS), deep learning architectures suffer from the critical problems of catastrophic forgetting and semantic background shift. Although recent works focused on these issues, existing classifier initialization methods do not address the background shift problem and assign the same initialization weights to both background and new foreground class classifiers. We propose to address the background shift with a novel classifier initialization method which employs gradient-based attribution to identify the most relevant weights for new classes from the classifier's weights for the previous background and transfers these weights to the new classifier. This warm-start weight initialization provides a general solution applicable to several CISS methods. Furthermore, it accelerates learning of new classes while mitigating forgetting. Our experiments demonstrate significant improvement in mIoU compared to the state-of-the-art CISS methods on the Pascal-VOC 2012, ADE20K and Cityscapes datasets., Comment: Accepted at WACV 2023
Published: 2022

41. Positive Pair Distillation Considered Harmful: Continual Meta Metric Learning for Lifelong Object Re-Identification

Author: Wang, Kai, Wu, Chenshen, Bagdanov, Andy, Liu, Xialei, Yang, Shiqi, Jui, Shangling, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Lifelong object re-identification incrementally learns from a stream of re-identification tasks. The objective is to learn a representation that can be applied to all tasks and that generalizes to previously unseen re-identification tasks. The main challenge is that at inference time the representation must generalize to previously unseen identities. To address this problem, we apply continual meta metric learning to lifelong object re-identification. To prevent forgetting of previous tasks, we use knowledge distillation and explore the roles of positive and negative pairs. Based on our observation that the distillation and metric losses are antagonistic, we propose to remove positive pairs from distillation to robustify model updates. Our method, called Distillation without Positive Pairs (DwoPP), is evaluated on extensive intra-domain experiments on person and vehicle re-identification datasets, as well as inter-domain experiments on the LReID benchmark. Our experiments demonstrate that DwoPP significantly outperforms the state-of-the-art. The code is here: https://github.com/wangkai930418/DwoPP_code, Comment: BMVC 2022
Published: 2022

42. Attention Distillation: self-supervised vision transformer students need more guidance

Author: Wang, Kai, Yang, Fei, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Self-supervised learning has been widely applied to train high-quality vision transformers. Unleashing their excellent performance on memory and compute constraint devices is therefore an important research topic. However, how to distill knowledge from one self-supervised ViT to another has not yet been explored. Moreover, the existing self-supervised knowledge distillation (SSKD) methods focus on ConvNet based architectures are suboptimal for ViT knowledge distillation. In this paper, we study knowledge distillation of self-supervised vision transformers (ViT-SSKD). We show that directly distilling information from the crucial attention mechanism from teacher to student can significantly narrow the performance gap between both. In experiments on ImageNet-Subset and ImageNet-1K, we show that our method AttnDistill outperforms existing self-supervised knowledge distillation (SSKD) methods and achieves state-of-the-art k-NN accuracy compared with self-supervised learning (SSL) methods learning from scratch (with the ViT-S model). We are also the first to apply the tiny ViT-T model on self-supervised learning. Moreover, AttnDistill is independent of self-supervised learning algorithms, it can be adapted to ViT based SSL methods to improve the performance in future research. The code is here: https://github.com/wangkai930418/attndistill, Comment: BMVC 2022
Published: 2022

43. MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains

Author: Wang, Yaxing, Gonzalez-Garcia, Abel, Wu, Chenshen, Herranz, Luis, Khan, Fahad Shahbaz, Jui, Shangling, Yang, Jian, and van de Weijer, Joost
Published: 2024
Full Text: View/download PDF

44. Main product detection with graph networks for fashion

Author: Yazici, Vacit Oguz, Yu, Longlong, Ramisa, Arnau, Herranz, Luis, and van de Weijer, Joost
Published: 2024
Full Text: View/download PDF

45. OneRing: A Simple Method for Source-free Open-partial Domain Adaptation

Author: Yang, Shiqi, Wang, Yaxing, Wang, Kai, Jui, Shangling, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: In this paper, we investigate Source-free Open-partial Domain Adaptation (SF-OPDA), which addresses the situation where there exist both domain and category shifts between source and target domains. Under the SF-OPDA setting, which aims to address data privacy concerns, the model cannot access source data anymore during target adaptation. We propose a novel training scheme to learn a (n+1)-way classifier to predict the n source classes and the unknown class, where samples of only known source categories are available for training. Furthermore, for target adaptation, we simply adopt a weighted entropy minimization to adapt the source pretrained model to the unlabeled target domain without source data. In experiments, we show our simple method surpasses current OPDA approaches which demand source data during adaptation. When augmented with a closed-set domain adaptation approach during target adaptation, our source-free method further outperforms the current state-of-the-art OPDA method by 2.5%, 7.2% and 13% on Office-31, Office-Home and VisDA respectively., Comment: Updated. It only focuses on source-free open-partial domain adaptation, to avoid any potential misunderstanding
Published: 2022

46. Chapter 5. Can pantomime narrate?

Author: Zlatev, Jordan, primary, Sibierska, Marta, additional, Żywiczyński, Przemysław, additional, van de Weijer, Joost, additional, and Boruta-Żywiczyńska, Monika, additional
Published: 2024
Full Text: View/download PDF

47. MVMO: A Multi-Object Dataset for Wide Baseline Multi-View Semantic Segmentation

Author: Alvarez-Gila, Aitor, van de Weijer, Joost, Wang, Yaxing, and Garrote, Estibaliz
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present MVMO (Multi-View, Multi-Object dataset): a synthetic dataset of 116,000 scenes containing randomly placed objects of 10 distinct classes and captured from 25 camera locations in the upper hemisphere. MVMO comprises photorealistic, path-traced image renders, together with semantic segmentation ground truth for every view. Unlike existing multi-view datasets, MVMO features wide baselines between cameras and high density of objects, which lead to large disparities, heavy occlusions and view-dependent object appearance. Single view semantic segmentation is hindered by self and inter-object occlusions that could benefit from additional viewpoints. Therefore, we expect that MVMO will propel research in multi-view semantic segmentation and cross-view semantic transfer. We also provide baselines that show that new research is needed in such fields to exploit the complementary information of multi-view setups., Comment: 5 pages
Published: 2022

48. Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation

Author: Yang, Shiqi, Wang, Yaxing, Wang, Kai, Jui, Shangling, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method. Code is available in https://github.com/Albert0147/AaD_SFDA., Comment: NeurIPS 2022
Published: 2022

49. Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization

Author: Pelosin, Francesco, Jha, Saurav, Torsello, Andrea, Raducanu, Bogdan, and van de Weijer, Joost
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: In this paper, we investigate the continual learning of Vision Transformers (ViT) for the challenging exemplar-free scenario, with special focus on how to efficiently distill the knowledge of its crucial self-attention mechanism (SAM). Our work takes an initial step towards a surgical investigation of SAM for designing coherent continual learning methods in ViTs. We first carry out an evaluation of established continual learning regularization techniques. We then examine the effect of regularization when applied to two key enablers of SAM: (a) the contextualized embedding layers, for their ability to capture well-scaled representations with respect to the values, and (b) the prescaled attention maps, for carrying value-independent global contextual information. We depict the perks of each distilling strategy on two image recognition benchmarks (CIFAR100 and ImageNet-32) -- while (a) leads to a better overall accuracy, (b) helps enhance the rigidity by maintaining competitive performances. Furthermore, we identify the limitation imposed by the symmetric nature of regularization losses. To alleviate this, we propose an asymmetric variant and apply it to the pooled output distillation (POD) loss adapted for ViTs. Our experiments confirm that introducing asymmetry to POD boosts its plasticity while retaining stability across (a) and (b). Moreover, we acknowledge low forgetting measures for all the compared methods, indicating that ViTs might be naturally inclined continual learner, Comment: Accepted at Continual Learning Workshop (CVPR 2022)
Published: 2022

50. Projected Latent Distillation for Data-Agnostic Consolidation in distributed continual learning

Author: Carta, Antonio, Cossu, Andrea, Lomonaco, Vincenzo, Bacciu, Davide, and van de Weijer, Joost
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

877 results on '"Van De Weijer, Joost"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources