92 results on '"Gonzalez-Garcia, Abel"'
Search Results
2. Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking
- Author
-
Sun, Jingxian, Zhang, Lichao, Zha, Yufei, Gonzalez-Garcia, Abel, Zhang, Peng, Huang, Wei, and Zhang, Yanning
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The target representation learned by convolutional neural networks plays an important role in Thermal Infrared (TIR) tracking. Currently, most of the top-performing TIR trackers are still employing representations learned by the model trained on the RGB data. However, this representation does not take into account the information in the TIR modality itself, limiting the performance of TIR tracking. To solve this problem, we propose to distill representations of the TIR modality from the RGB modality with Cross-Modal Distillation (CMD) on a large amount of unlabeled paired RGB-TIR data. We take advantage of the two-branch architecture of the baseline tracker, i.e. DiMP, for cross-modal distillation working on two components of the tracker. Specifically, we use one branch as a teacher module to distill the representation learned by the model into the other branch. Benefiting from the powerful model in the RGB modality, the cross-modal distillation can learn the TIR-specific representation for promoting TIR tracking. The proposed approach can be incorporated into different baseline trackers conveniently as a generic and independent component. Furthermore, the semantic coherence of paired RGB and TIR images is utilized as a supervised signal in the distillation loss for cross-modal knowledge transfer. In practice, three different approaches are explored to generate paired RGB-TIR patches with the same semantics for training in an unsupervised way. It is easy to extend to an even larger scale of unlabeled training data. Extensive experiments on the LSOTB-TIR dataset and PTB-TIR dataset demonstrate that our proposed cross-modal distillation method effectively learns TIR-specific target representations transferred from the RGB modality. Our tracker outperforms the baseline tracker by achieving absolute gains of 2.3% Success, 2.7% Precision, and 2.5% Normalized Precision respectively., Comment: Accepted at ACM MM 2021. Code and models are available at https://github.com/zhanglichao/cmdTIRtracking
- Published
- 2021
- Full Text
- View/download PDF
3. MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains
- Author
-
Wang, Yaxing, Gonzalez-Garcia, Abel, Wu, Chenshen, Herranz, Luis, Khan, Fahad Shahbaz, Jui, Shangling, and van de Weijer, Joost
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
GANs largely increases the potential impact of generative models. Therefore, we propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs. This is done using a miner network that identifies which part of the generative distribution of each pretrained GAN outputs samples closest to the target domain. Mining effectively steers GAN sampling towards suitable regions of the latent space, which facilitates the posterior finetuning and avoids pathologies of other methods, such as mode collapse and lack of flexibility. Furthermore, to prevent overfitting on small target domains, we introduce sparse subnetwork selection, that restricts the set of trainable neurons to those that are relevant for the target dataset. We perform comprehensive experiments on several challenging datasets using various GAN architectures (BigGAN, Progressive GAN, and StyleGAN) and show that the proposed method, called MineGAN, effectively transfers knowledge to domains with few target images, outperforming existing methods. In addition, MineGAN can successfully transfer knowledge from multiple pretrained GANs., Comment: accepted at IJCV. arXiv admin note: substantial text overlap with arXiv:1912.05270
- Published
- 2021
4. Semi-supervised Learning for Few-shot Image-to-Image Translation
- Author
-
Wang, Yaxing, Khan, Salman, Gonzalez-Garcia, Abel, van de Weijer, Joost, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In the last few years, unpaired image-to-image translation has witnessed remarkable progress. Although the latest methods are able to generate realistic images, they crucially rely on a large number of labeled images. Recently, some methods have tackled the challenging setting of few-shot image-to-image translation, reducing the labeled data requirements for the target domain during inference. In this work, we go one step further and reduce the amount of required labeled data also from the source domain during training. To do so, we propose applying semi-supervised learning via a noise-tolerant pseudo-labeling procedure. We also apply a cycle consistency constraint to further exploit the information from unlabeled images, either from the same dataset or external. Additionally, we propose several structural modifications to facilitate the image translation task under these circumstances. Our semi-supervised method for few-shot image translation, called SEMIT, achieves excellent results on four different datasets using as little as 10% of the source labels, and matches the performance of the main fully-supervised competitor using only 20% labeled data. Our code and models are made public at: https://github.com/yaxingwang/SEMIT., Comment: CVPR2020
- Published
- 2020
5. MineGAN: effective knowledge transfer from GANs to target domains with few images
- Author
-
Wang, Yaxing, Gonzalez-Garcia, Abel, Berga, David, Herranz, Luis, Khan, Fahad Shahbaz, and van de Weijer, Joost
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
One of the attractive characteristics of deep neural networks is their ability to transfer knowledge obtained in one domain to other related domains. As a result, high-quality networks can be trained in domains with relatively little training data. This property has been extensively studied for discriminative networks but has received significantly less attention for generative models. Given the often enormous effort required to train GANs, both computationally as well as in the dataset collection, the re-use of pretrained GANs is a desirable objective. We propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs. This is done using a miner network that identifies which part of the generative distribution of each pretrained GAN outputs samples closest to the target domain. Mining effectively steers GAN sampling towards suitable regions of the latent space, which facilitates the posterior finetuning and avoids pathologies of other methods such as mode collapse and lack of flexibility. We perform experiments on several complex datasets using various GAN architectures (BigGAN, Progressive GAN) and show that the proposed method, called MineGAN, effectively transfers knowledge to domains with few target images, outperforming existing methods. In addition, MineGAN can successfully transfer knowledge from multiple pretrained GANs. Our code is available at: https://github.com/yaxingwang/MineGAN., Comment: CVPR2020
- Published
- 2019
6. Orderless Recurrent Models for Multi-label Classification
- Author
-
Yazici, Vacit Oguz, Gonzalez-Garcia, Abel, Ramisa, Arnau, Twardowski, Bartlomiej, and van de Weijer, Joost
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recurrent neural networks (RNN) are popular for many computer vision tasks, including multi-label classification. Since RNNs produce sequential outputs, labels need to be ordered for the multi-label classification task. Current approaches sort labels according to their frequency, typically ordering them in either rare-first or frequent-first. These imposed orderings do not take into account that the natural order to generate the labels can change for each image, e.g.\ first the dominant object before summing up the smaller objects in the image. Therefore, in this paper, we propose ways to dynamically order the ground truth labels with the predicted label sequence. This allows for the faster training of more optimal LSTM models for multi-label classification. Analysis evidences that our method does not suffer from duplicate generation, something which is common for other models. Furthermore, it outperforms other CNN-RNN models, and we show that a standard architecture of an image encoder and language decoder trained with our proposed loss obtains the state-of-the-art results on the challenging MS-COCO, WIDER Attribute and PA-100K and competitive results on NUS-WIDE., Comment: Accepted to CVPR 2020
- Published
- 2019
7. Active Learning for Deep Detection Neural Networks
- Author
-
Aghdam, Hamed H., Gonzalez-Garcia, Abel, van de Weijer, Joost, and López, Antonio M.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
The cost of drawing object bounding boxes (i.e. labeling) for millions of images is prohibitively high. For instance, labeling pedestrians in a regular urban image could take 35 seconds on average. Active learning aims to reduce the cost of labeling by selecting only those images that are informative to improve the detection network accuracy. In this paper, we propose a method to perform active learning of object detectors based on convolutional neural networks. We propose a new image-level scoring process to rank unlabeled images for their automatic selection, which clearly outperforms classical scores. The proposed method can be applied to videos and sets of still images. In the former case, temporal selection rules can complement our scoring process. As a relevant use case, we extensively study the performance of our method on the task of pedestrian detection. Overall, the experiments show that the proposed method performs better than random selection. Our codes are publicly available at www.gitlab.com/haghdam/deep_active_learning., Comment: Accepted at ICCV 2019
- Published
- 2019
8. Temporal Coherence for Active Learning in Videos
- Author
-
Bengar, Javad Zolfaghari, Gonzalez-Garcia, Abel, Villalonga, Gabriel, Raducanu, Bogdan, Aghdam, Hamed H., Mozerov, Mikhail, Lopez, Antonio M., and van de Weijer, Joost
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Autonomous driving systems require huge amounts of data to train. Manual annotation of this data is time-consuming and prohibitively expensive since it involves human resources. Therefore, active learning emerged as an alternative to ease this effort and to make data annotation more manageable. In this paper, we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our active learning criterion is based on the estimated number of errors in terms of false positives and false negatives. The detections obtained by the object detector are used to define the nodes of a graph and tracked forward and backward to temporally link the nodes. Minimizing an energy function defined on this graphical model provides estimates of both false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active learning for video object detection in road scenes. Finally, we show that our approach outperforms active learning baselines tested on two datasets., Comment: Accepted at ICCVW 2019 (CVRSUAD-Road Scene Understanding and Autonomous Driving)
- Published
- 2019
9. Multi-Modal Fusion for End-to-End RGB-T Tracking
- Author
-
Zhang, Lichao, Danelljan, Martin, Gonzalez-Garcia, Abel, van de Weijer, Joost, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset., Comment: Accepted at ICCVW (VOT) 2019
- Published
- 2019
10. SDIT: Scalable and Diverse Cross-domain Image Translation
- Author
-
Wang, Yaxing, Gonzalez-Garcia, Abel, van de Weijer, Joost, and Herranz, Luis
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recently, image-to-image translation research has witnessed remarkable progress. Although current approaches successfully generate diverse outputs or perform scalable image transfer, these properties have not been combined into a single method. To address this limitation, we propose SDIT: Scalable and Diverse image-to-image translation. These properties are combined into a single generator. The diversity is determined by a latent variable which is randomly sampled from a normal distribution. The scalability is obtained by conditioning the network on the domain attributes. Additionally, we also exploit an attention mechanism that permits the generator to focus on the domain-specific attribute. We empirically demonstrate the performance of the proposed method on face mapping and other datasets beyond faces., Comment: ACM-MM2019 camera ready
- Published
- 2019
11. Learning the Model Update for Siamese Trackers
- Author
-
Zhang, Lichao, Gonzalez-Garcia, Abel, van de Weijer, Joost, Danelljan, Martin, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Siamese approaches address the visual tracking problem by extracting an appearance template from the current frame, which is used to localize the target in the next frame. In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time. While such an approach to updating has led to improved results, its simplicity limits the potential gain likely to be obtained by learning to update. Therefore, we propose to replace the handcrafted update function with a method which learns to update. We use a convolutional neural network, called UpdateNet, which given the initial template, the accumulated template and the template of the current frame aims to estimate the optimal template for the next frame. The UpdateNet is compact and can easily be integrated into existing Siamese trackers. We demonstrate the generality of the proposed approach by applying it to two Siamese trackers, SiamFC and DaSiamRPN. Extensive experiments on VOT2016, VOT2018, LaSOT, and TrackingNet datasets demonstrate that our UpdateNet effectively predicts the new target template, outperforming the standard linear update. On the large-scale TrackingNet dataset, our UpdateNet improves the results of DaSiamRPN with an absolute gain of 3.9% in terms of success score., Comment: Accepted at ICCV 2019
- Published
- 2019
12. Controlling biases and diversity in diverse image-to-image translation
- Author
-
Wang, Yaxing, Gonzalez-Garcia, Abel, van de Weijer, Joost, and Herranz, Luis
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The task of unpaired image-to-image translation is highly challenging due to the lack of explicit cross-domain pairs of instances. We consider here diverse image translation (DIT), an even more challenging setting in which an image can have multiple plausible translations. This is normally achieved by explicitly disentangling content and style in the latent representation and sampling different styles codes while maintaining the image content. Despite the success of current DIT models, they are prone to suffer from bias. In this paper, we study the problem of bias in image-to-image translation. Biased datasets may add undesired changes (e.g. change gender or race in face images) to the output translations as a consequence of the particular underlying visual distribution in the target domain. In order to alleviate the effects of this problem we propose the use of semantic constraints that enforce the preservation of desired image properties. Our proposed model is a step towards unbiased diverse image-to-image translation (UDIT), and results in less unwanted changes in the translated images while still performing the wanted transformation. Experiments on several heavily biased datasets show the effectiveness of the proposed techniques in different domains such as faces, objects, and scenes., Comment: The paper is under consideration at Computer Vision and Image Understanding
- Published
- 2019
13. Technology and Police: A Way to Create Predicting Policing
- Author
-
Gonzalez-Garcia, Abel, Sanchez, Luis Angel Galindo, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Dziech, Andrzej, editor, Mees, Wim, editor, and Niemiec, Marcin, editor
- Published
- 2022
- Full Text
- View/download PDF
14. Saliency for Fine-grained Object Recognition in Domains with Scarce Training Data
- Author
-
Flores, Carola Figueroa, Gonzalez-García, Abel, van de Weijer, Joost, and Raducanu, Bogdan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper investigates the role of saliency to improve the classification accuracy of a Convolutional Neural Network (CNN) for the case when scarce training data is available. Our approach consists in adding a saliency branch to an existing CNN architecture which is used to modulate the standard bottom-up visual features from the original image input, acting as an attentional mechanism that guides the feature extraction process. The main aim of the proposed approach is to enable the effective training of a fine-grained recognition model with limited training samples and to improve the performance on the task, thereby alleviating the need to annotate large dataset. % The vast majority of saliency methods are evaluated on their ability to generate saliency maps, and not on their functionality in a complete vision pipeline. Our proposed pipeline allows to evaluate saliency methods for the high-level task of object recognition. We perform extensive experiments on various fine-grained datasets (Flowers, Birds, Cars, and Dogs) under different conditions and show that saliency can considerably improve the network's performance, especially for the case of scarce training data. Furthermore, our experiments show that saliency methods that obtain improved saliency maps (as measured by traditional saliency benchmarks) also translate to saliency methods that yield improved performance gains when applied in an object recognition pipeline., Comment: Published in Pattern Recognition journal
- Published
- 2018
- Full Text
- View/download PDF
15. Synthetic data generation for end-to-end thermal infrared tracking
- Author
-
Zhang, Lichao, Gonzalez-Garcia, Abel, van de Weijer, Joost, Danelljan, Martin, and Khan, Fahad Shahbaz
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The usage of both off-the-shelf and end-to-end trained deep networks have significantly improved performance of visual tracking on RGB videos. However, the lack of large labeled datasets hampers the usage of convolutional neural networks for tracking in thermal infrared (TIR) images. Therefore, most state of the art methods on tracking for TIR data are still based on handcrafted features. To address this problem, we propose to use image-to-image translation models. These models allow us to translate the abundantly available labeled RGB data to synthetic TIR data. We explore both the usage of paired and unpaired image translation models for this purpose. These methods provide us with a large labeled dataset of synthetic TIR sequences, on which we can train end-to-end optimal features for tracking. To the best of our knowledge we are the first to train end-to-end features for TIR tracking. We perform extensive experiments on VOT-TIR2017 dataset. We show that a network trained on a large dataset of synthetic TIR data obtains better performance than one trained on the available real TIR data. Combining both data sources leads to further improvement. In addition, when we combine the network with motion features we outperform the state of the art with a relative gain of over 10%, clearly showing the efficiency of using synthetic data to train end-to-end TIR trackers.
- Published
- 2018
- Full Text
- View/download PDF
16. Image-to-image translation for cross-domain disentanglement
- Author
-
Gonzalez-Garcia, Abel, van de Weijer, Joost, and Bengio, Yoshua
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep image translation methods have recently shown excellent results, outputting high-quality images covering multiple modes of the data distribution. There has also been increased interest in disentangling the internal representations learned by deep methods to further improve their performance and achieve a finer control. In this paper, we bridge these two objectives and introduce the concept of cross-domain disentanglement. We aim to separate the internal representation into three parts. The shared part contains information for both domains. The exclusive parts, on the other hand, contain only factors of variation that are particular to each domain. We achieve this through bidirectional image translation based on Generative Adversarial Networks and cross-domain autoencoders, a novel network component. Our model offers multiple advantages. We can output diverse samples covering multiple modes of the distributions of both domains, perform domain-specific image transfer and interpolation, and cross-domain retrieval without the need of labeled data, only paired images. We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets., Comment: Accepted to NIPS 2018
- Published
- 2018
17. Transferring GANs: generating images from limited data
- Author
-
Wang, Yaxing, Wu, Chenshen, Herranz, Luis, van de Weijer, Joost, Gonzalez-Garcia, Abel, and Raducanu, Bogdan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Transferring the knowledge of pretrained networks to new domains by means of finetuning is a widely used practice for applications based on discriminative models. To the best of our knowledge this practice has not been studied within the context of generative deep networks. Therefore, we study domain adaptation applied to image generation with generative adversarial networks. We evaluate several aspects of domain adaptation, including the impact of target domain size, the relative distance between source and target domain, and the initialization of conditional GANs. Our results show that using knowledge from pretrained networks can shorten the convergence time and can significantly improve the quality of the generated images, especially when the target data is limited. We show that these conclusions can also be drawn for conditional GANs even when the pretrained model was trained without conditioning. Our results also suggest that density may be more important than diversity and a dataset with one or few densely sampled classes may be a better source model than more diverse datasets such as ImageNet or Places., Comment: ECCV2018-camera ready
- Published
- 2018
18. Image context for object detection, object context for part detection
- Author
-
Gonzalez-Garcia, Abel, Ferrari, Vittorio, and Komura, Taku
- Subjects
object detection ,automatic image ,object class detection ,window classifiers ,convolutional neural networks - Abstract
Objects and parts are crucial elements for achieving automatic image understanding. The goal of the object detection task is to recognize and localize all the objects in an image. Similarly, semantic part detection attempts to recognize and localize the object parts. This thesis proposes four contributions. The first two make object detection more efficient by using active search strategies guided by image context. The last two involve parts. One of them explores the emergence of parts in neural networks trained for object detection, whereas the other improves on part detection by adding object context. First, we present an active search strategy for efficient object class detection. Modern object detectors evaluate a large set of windows using a window classifier. Instead, our search sequentially chooses what window to evaluate next based on all the information gathered before. This results in a significant reduction on the number of necessary window evaluations to detect the objects in the image. We guide our search strategy using image context and the score of the classifier. In our second contribution, we extend this active search to jointly detect pairs of object classes that appear close in the image, exploiting the valuable information that one class can provide about the location of the other. This leads to an even further reduction on the number of necessary evaluations for the smaller, more challenging classes. In the third contribution of this thesis, we study whether semantic parts emerge in Convolutional Neural Networks trained for different visual recognition tasks, especially object detection. We perform two quantitative analyses that provide a deeper understanding of their internal representation by investigating the responses of the network filters. Moreover, we explore several connections between discriminative power and semantics, which provides further insights on the role of semantic parts in the network. Finally, the last contribution is a part detection approach that exploits object context. We complement part appearance with the object appearance, its class, and the expected relative location of the parts inside it. We significantly outperform approaches that use part appearance alone in this challenging task.
- Published
- 2018
19. Objects as context for detecting their semantic parts
- Author
-
Gonzalez-Garcia, Abel, Modolo, Davide, and Ferrari, Vittorio
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We present a semantic part detection approach that effectively leverages object information.We use the object appearance and its class as indicators of what parts to expect. We also model the expected relative location of parts inside the objects based on their appearance. We achieve this with a new network module, called OffsetNet, that efficiently predicts a variable number of part locations within a given object. Our model incorporates all these cues to detect parts in the context of their objects. This leads to considerably higher performance for the challenging task of part detection compared to using part appearance alone (+5 mAP on the PASCAL-Part dataset). We also compare to other part detection methods on both PASCAL-Part and CUB200-2011 datasets.
- Published
- 2017
20. Do semantic parts emerge in Convolutional Neural Networks?
- Author
-
Gonzalez-Garcia, Abel, Modolo, Davide, and Ferrari, Vittorio
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Semantic object parts can be useful for several visual recognition tasks. Lately, these tasks have been addressed using Convolutional Neural Networks (CNN), achieving outstanding results. In this work we study whether CNNs learn semantic parts in their internal representation. We investigate the responses of convolutional filters and try to associate their stimuli with semantic parts. We perform two extensive quantitative analyses. First, we use ground-truth part bounding-boxes from the PASCAL-Part dataset to determine how many of those semantic parts emerge in the CNN. We explore this emergence for different layers, network depths, and supervision levels. Second, we collect human judgements in order to study what fraction of all filters systematically fire on any semantic part, even if not annotated in PASCAL-Part. Moreover, we explore several connections between discriminative power and semantics. We find out which are the most discriminative filters for object recognition, and analyze whether they respond to semantic parts or to other image patches. We also investigate the other direction: we determine which semantic parts are the most discriminative and whether they correspond to those parts emerging in the network. This enables to gain an even deeper understanding of the role of semantic parts in the network.
- Published
- 2016
21. Controlling biases and diversity in diverse image-to-image translation
- Author
-
Wang, Yaxing, Gonzalez-Garcia, Abel, Herranz, Luis, and van de Weijer, Joost
- Published
- 2021
- Full Text
- View/download PDF
22. An active search strategy for efficient object class detection
- Author
-
Gonzalez-Garcia, Abel, Vezhnevets, Alexander, and Ferrari, Vittorio
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Object class detectors typically apply a window classifier to all the windows in a large set, either in a sliding window manner or using object proposals. In this paper, we develop an active search strategy that sequentially chooses the next window to evaluate based on all the information gathered before. This results in a substantial reduction in the number of classifier evaluations and in a more elegant approach in general. Our search strategy is guided by two forces. First, we exploit context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set. This enables to jump across distant regions in the image (e.g. observing a sky region suggests that cars might be far below) and is done efficiently in a Random Forest framework. Second, we exploit the score of the classifier to attract the search to promising areas surrounding a highly scored window, and to keep away from areas near low scored ones. Our search strategy can be applied on top of any classifier as it treats it as a black-box. In experiments with R-CNN on the challenging SUN2012 dataset, our method matches the detection accuracy of evaluating all windows independently, while evaluating 9x fewer windows.
- Published
- 2014
23. The Sixth Visual Object Tracking VOT2018 Challenge Results
- Author
-
Kristan, Matej, Leonardis, Aleš, Matas, Jiří, Felsberg, Michael, Pflugfelder, Roman, Zajc, Luka Čehovin, Vojír̃, Tomáš, Bhat, Goutam, Lukežič, Alan, Eldesokey, Abdelrahman, Fernández, Gustavo, García-Martín, Álvaro, Iglesias-Arias, Álvaro, Alatan, A. Aydin, González-García, Abel, Petrosino, Alfredo, Memarmoghadam, Alireza, Vedaldi, Andrea, Muhič, Andrej, He, Anfeng, Smeulders, Arnold, Perera, Asanka G., Li, Bo, Chen, Boyu, Kim, Changick, Xu, Changsheng, Xiong, Changzhen, Tian, Cheng, Luo, Chong, Sun, Chong, Hao, Cong, Kim, Daijin, Mishra, Deepak, Chen, Deming, Wang, Dong, Wee, Dongyoon, Gavves, Efstratios, Gundogdu, Erhan, Velasco-Salido, Erik, Khan, Fahad Shahbaz, Yang, Fan, Zhao, Fei, Li, Feng, Battistone, Francesco, De Ath, George, Subrahmanyam, Gorthi R. K. S., Bastos, Guilherme, Ling, Haibin, Galoogahi, Hamed Kiani, Lee, Hankyeol, Li, Haojie, Zhao, Haojie, Fan, Heng, Zhang, Honggang, Possegger, Horst, Li, Houqiang, Lu, Huchuan, Zhi, Hui, Li, Huiyun, Lee, Hyemin, Chang, Hyung Jin, Drummond, Isabela, Valmadre, Jack, Martin, Jaime Spencer, Chahl, Javaan, Choi, Jin Young, Li, Jing, Wang, Jinqiao, Qi, Jinqing, Sung, Jinyoung, Johnander, Joakim, Henriques, Joao, Choi, Jongwon, van de Weijer, Joost, Herranz, Jorge Rodríguez, Martínez, José M., Kittler, Josef, Zhuang, Junfei, Gao, Junyu, Grm, Klemen, Zhang, Lichao, Wang, Lijun, Yang, Lingxiao, Rout, Litu, Si, Liu, Bertinetto, Luca, Chu, Lutao, Che, Manqiang, Maresca, Mario Edoardo, Danelljan, Martin, Yang, Ming-Hsuan, Abdelpakey, Mohamed, Shehata, Mohamed, Kang, Myunggu, Lee, Namhoon, Wang, Ning, Miksik, Ondrej, Moallem, P., Vicente-Moñivar, Pablo, Senna, Pedro, Li, Peixia, Torr, Philip, Raju, Priya Mariam, Ruihe, Qian, Wang, Qiang, Zhou, Qin, Guo, Qing, Martín-Nieto, Rafael, Gorthi, Rama Krishna, Tao, Ran, Bowden, Richard, Everson, Richard, Wang, Runling, Yun, Sangdoo, Choi, Seokeon, Vivas, Sergio, Bai, Shuai, Huang, Shuangping, Wu, Sihang, Hadfield, Simon, Wang, Siwen, Golodetz, Stuart, Ming, Tang, Xu, Tianyang, Zhang, Tianzhu, Fischer, Tobias, Santopietro, Vincenzo, Štruc, Vitomir, Wei, Wang, Zuo, Wangmeng, Feng, Wei, Wu, Wei, Zou, Wei, Hu, Weiming, Zhou, Wengang, Zeng, Wenjun, Zhang, Xiaofan, Wu, Xiaohe, Wu, Xiao-Jun, Tian, Xinmei, Li, Yan, Lu, Yan, Law, Yee Wei, Wu, Yi, Demiris, Yiannis, Yang, Yicai, Jiao, Yifan, Li, Yuhong, Zhang, Yunhua, Sun, Yuxuan, Zhang, Zheng, Zhu, Zheng, Feng, Zhen-Hua, Wang, Zhihui, He, Zhiqun, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Leal-Taixé, Laura, editor, and Roth, Stefan, editor
- Published
- 2019
- Full Text
- View/download PDF
24. Erratum to “Self-Supervised Cross-Modal Distillation for Thermal Infrared Tracking”
- Author
-
Zha, Yufei, primary, Zhang, Lichao, additional, Sun, Jingxian, additional, Gonzalez-Garcia, Abel, additional, Zhang, Peng, additional, and Huang, Wei, additional
- Published
- 2024
- Full Text
- View/download PDF
25. Saliency for fine-grained object recognition in domains with scarce training data
- Author
-
Flores, Carola Figueroa, Gonzalez-Garcia, Abel, van de Weijer, Joost, and Raducanu, Bogdan
- Published
- 2019
- Full Text
- View/download PDF
26. Do Semantic Parts Emerge in Convolutional Neural Networks?
- Author
-
Gonzalez-Garcia, Abel, Modolo, Davide, and Ferrari, Vittorio
- Published
- 2018
- Full Text
- View/download PDF
27. MineGAN plus plus : Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains
- Author
-
Wang, Yaxing, Gonzalez-Garcia, Abel, Wu, Chenshen, Herranz, Luis, Khan, Fahad, Jui, Shangling, Yang, Jian, van de Weijer, Joost, Wang, Yaxing, Gonzalez-Garcia, Abel, Wu, Chenshen, Herranz, Luis, Khan, Fahad, Jui, Shangling, Yang, Jian, and van de Weijer, Joost
- Abstract
Given the often enormous effort required to train GANs, both computationally as well as in dataset collection, the re-use of pretrained GANs largely increases the potential impact of generative models. Therefore, we propose a novel knowledge transfer method for generative models based on mining the knowledge that is most beneficial to a specific target domain, either from a single or multiple pretrained GANs. This is done using a miner network that identifies which part of the generative distribution of each pretrained GAN outputs samples closest to the target domain. Mining effectively steers GAN sampling towards suitable regions of the latent space, which facilitates the posterior finetuning and avoids pathologies of other methods, such as mode collapse and lack of flexibility. Furthermore, to prevent overfitting on small target domains, we introduce sparse subnetwork selection, that restricts the set of trainable neurons to those that are relevant for the target dataset. We perform comprehensive experiments on several challenging datasets using various GAN architectures (BigGAN, Progressive GAN, and StyleGAN) and show that the proposed method, called MineGAN, effectively transfers knowledge to domains with few target images, outperforming existing methods. In addition, MineGAN can successfully transfer knowledge from multiple pretrained GANs. MineGAN., Funding Agencies|Huawei Kirin Solution; MCIN/AEI [PID2019-104174GB-I00, PID2021-128178OB-I00]; ERDF A way of making Europe; Ramon y Cajal fellowship - MCIN/AEI [RYC2019-027020-I]; CERCA Programme of Generalitat de Catalunya; Youth Foundation [62202243]
- Published
- 2023
- Full Text
- View/download PDF
28. Transferring GANs: Generating Images from Limited Data
- Author
-
Wang, Yaxing, primary, Wu, Chenshen, additional, Herranz, Luis, additional, van de Weijer, Joost, additional, Gonzalez-Garcia, Abel, additional, and Raducanu, Bogdan, additional
- Published
- 2018
- Full Text
- View/download PDF
29. Self-Supervised Cross-Modal Distillation for Thermal Infrared Tracking
- Author
-
Zha, Yufei, primary, Sun, Jingxian, additional, Zhang, Peng, additional, Zhang, Lichao, additional, Gonzalez-Garcia, Abel, additional, and Huang, Wei, additional
- Published
- 2022
- Full Text
- View/download PDF
30. Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking
- Author
-
Sun, Jingxian, primary, Zhang, Lichao, additional, Zha, Yufei, additional, Gonzalez-Garcia, Abel, additional, Zhang, Peng, additional, Huang, Wei, additional, and Zhang, Yanning, additional
- Published
- 2021
- Full Text
- View/download PDF
31. Capacidades prospectivas y de defensa en la lucha contra el Ciberterrorismo
- Author
-
Gonzalez-Garcia, Abel, primary and Girao González, Francisco José, additional
- Published
- 2020
- Full Text
- View/download PDF
32. Semi-Supervised Learning for Few-Shot Image-to-Image Translation
- Author
-
Wang, Yaxing, primary, Khan, Salman, additional, Gonzalez-Garcia, Abel, additional, van de Weijer, Joost, additional, and Khan, Fahad Shahbaz, additional
- Published
- 2020
- Full Text
- View/download PDF
33. MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images
- Author
-
Wang, Yaxing, primary, Gonzalez-Garcia, Abel, additional, Berga, David, additional, Herranz, Luis, additional, Khan, Fahad Shahbaz, additional, and van de Weijer, Joost, additional
- Published
- 2020
- Full Text
- View/download PDF
34. Orderless Recurrent Models for Multi-Label Classification
- Author
-
Oguz Yazici, Vacit, primary, Gonzalez-Garcia, Abel, additional, Ramisa, Arnau, additional, Twardowski, Bartlomiej, additional, and van de Weijer, Joost, additional
- Published
- 2020
- Full Text
- View/download PDF
35. Emerging technologies for the proposal and design of a MOOC on social entrepreneurship
- Author
-
Ramirez-Montoya, Maria-Soledad, primary, Gonzalez-Padron, Jose-Guadalupe, additional, Muzquiz-Flores, Marlene, additional, Gonzalez-Garcia, Abel, additional, Romero-Rodriguez, Jose-Maria, additional, and Aznar-Diaz, Inmaculada, additional
- Published
- 2020
- Full Text
- View/download PDF
36. Validation of instruments to measure social entrepreneurship competence. The OpenSocialLab project
- Author
-
Gonzalez-Garcia, Abel, primary, Romero-Rodriguez, Luis-Miguel, additional, Romero-Rodriguez, Jose-Maria, additional, and Ramirez-Montoya, Maria-Soledad, additional
- Published
- 2020
- Full Text
- View/download PDF
37. The Seventh Visual Object Tracking VOT2019 Challenge Results
- Author
-
Kristanl, Matej, Matas, Jiri, Leonardis, Ales, Felsberg, Michael, Pflugfelder, Roman, Kamarainen, Joni-Kristian, Zajc, Luka Cehovin, Drbohlav, Ondrej, Lukezic, Alan, Berg, Amanda, Eldesokey, Abdelrahman, Kapyla, Jani, Fernandez, Gustavo, Gonzalez-Garcia, Abel, Memarrnoghadam, Alireza, Lu, Andong, He, Anfeng, Varfolomieiev, Anton, Chan, Antoni, Tripathi, Ardhendu Shekhar, Smeulders, Arnold, Pedasingu, Bala Suraj, Chen, Bao Xin, Zhang, Baopeng, Wu, Baoyuan, Li, Bi, He, Bin, Yan, Bin, Bai, Bing, Li, Bing, Li, Bo, Kim, Bycong Hak, Ma, Chao, Fang, Chen, Qian, Chen, Chen, Cheng, Li, Chenglong, Zhang, Chengquan, Tsai, Chi-Yi, Luo, Chong, Micheloni, Christian, Zhang, Chunhui, Tao, Dacheng, Gupta, Deepak, Song, Dejia, Wang, Dong, Gavves, Efstratios, Yi, Eunu, Khan, Fahad Shahbaz, Zhang, Fangyi, Wang, Fei, Zhao, Fei, De Ath, George, Bhat, Goutam, Chen, Guanqi, Wang, Guangting, Li, Guoxuan, Cevikalp, Hakan, Du, Hao, Zhao, Haojie, Saribas, Hasan, Jung, Ho Min, Bai, Hongliang, Yu, Hongyuan, Peng, Houwen, Lu, Huchuan, Li, Hui, Li, Jiakun, Li, Jianhu, Fu, Jianlong, Chen, Jie, Gao, Jie, Zhao, Jie, Tang, Jin, Li, Jing, Wu, Jingjing, Liu, Jingtuo, Wang, Jinqiao, Qi, Jingqing, Zhang, Jingyue, Tsotsos, John K., Lee, John Hyuk, van de Weijer, Joost, Kittler, Josef, Lee, Jun Ha, Zhuang, Junfei, Zhang, Kangkai, wang, Kangkang, Dai, Kenan, Chen, Lei, Liu, Lei, Guo, Leida, Zhang, Li, Wang, Liang, Wang, Liangliang, Zhang, Lichao, Wang, Lijun, Zhou, Lijun, Zheng, Linyu, Rout, Litu, Van Gool, Luc, Bertinetto, Luca, Danelljan, Martin, Dunnhofer, Matteo, Ni, Meng, Kim, Min Young, Tang, Ming, Yang, Ming-Hsuan, Paluru, Naveen, Martine, Niki, Xu, Pengfei, Zhang, Pengfei, Zheng, Pengkun, Zhang, Pengyu, Torr, Philip H. S., Wang, Qi Zhang Qiang, Gua, Qing, Timofte, Radu, Gorthi, Rama Krishna, Everson, Richard, Han, Ruize, Zhang, Ruohan, You, Shan, Zhao, Shao-Chuan, Zhao, Shengwei, Li, Shihu, Li, Shikun, Ge, Shiming, Bai, Shuai, Guan, Shuosen, Xing, Tengfei, Xu, Tianyang, Yang, Tianyu, Zhang, Ting, Vojir, Tomas, Feng, Wei, Hu, Weiming, Wang, Weizhao, Tang, Wenjie, Zeng, Wenjun, Liu, Wenyu, Chen, Xi, Qiu, Xi, Bai, Xiang, Wu, Xiao-Jun, Yang, Xiaoyun, Chen, Xier, Li, Xin, Sun, Xing, Chen, Xingyu, Tian, Xinmei, Tang, Xu, Zhu, Xue-Feng, Huang, Yan, Chen, Yanan, Lian, Yanchao, Gu, Yang, Liu, Yang, Chen, Yanjie, Zhang, Yi, Xu, Yinda, Wang, Yingming, Li, Yingping, Zhou, Yu, Dong, Yuan, Xu, Yufei, Zhang, Yunhua, Li, Yunkun, Luo, Zeyu Wang Zhao, Zhang, Zhaoliang, Feng, Zhen-Hua, He, Zhenyu, Song, Zhichao, Chen, Zhihao, Zhang, Zhipeng, Wu, Zhirong, Xiong, Zhiwei, Huang, Zhongjian, Teng, Zhu, Ni, Zihan, Kristanl, Matej, Matas, Jiri, Leonardis, Ales, Felsberg, Michael, Pflugfelder, Roman, Kamarainen, Joni-Kristian, Zajc, Luka Cehovin, Drbohlav, Ondrej, Lukezic, Alan, Berg, Amanda, Eldesokey, Abdelrahman, Kapyla, Jani, Fernandez, Gustavo, Gonzalez-Garcia, Abel, Memarrnoghadam, Alireza, Lu, Andong, He, Anfeng, Varfolomieiev, Anton, Chan, Antoni, Tripathi, Ardhendu Shekhar, Smeulders, Arnold, Pedasingu, Bala Suraj, Chen, Bao Xin, Zhang, Baopeng, Wu, Baoyuan, Li, Bi, He, Bin, Yan, Bin, Bai, Bing, Li, Bing, Li, Bo, Kim, Bycong Hak, Ma, Chao, Fang, Chen, Qian, Chen, Chen, Cheng, Li, Chenglong, Zhang, Chengquan, Tsai, Chi-Yi, Luo, Chong, Micheloni, Christian, Zhang, Chunhui, Tao, Dacheng, Gupta, Deepak, Song, Dejia, Wang, Dong, Gavves, Efstratios, Yi, Eunu, Khan, Fahad Shahbaz, Zhang, Fangyi, Wang, Fei, Zhao, Fei, De Ath, George, Bhat, Goutam, Chen, Guanqi, Wang, Guangting, Li, Guoxuan, Cevikalp, Hakan, Du, Hao, Zhao, Haojie, Saribas, Hasan, Jung, Ho Min, Bai, Hongliang, Yu, Hongyuan, Peng, Houwen, Lu, Huchuan, Li, Hui, Li, Jiakun, Li, Jianhu, Fu, Jianlong, Chen, Jie, Gao, Jie, Zhao, Jie, Tang, Jin, Li, Jing, Wu, Jingjing, Liu, Jingtuo, Wang, Jinqiao, Qi, Jingqing, Zhang, Jingyue, Tsotsos, John K., Lee, John Hyuk, van de Weijer, Joost, Kittler, Josef, Lee, Jun Ha, Zhuang, Junfei, Zhang, Kangkai, wang, Kangkang, Dai, Kenan, Chen, Lei, Liu, Lei, Guo, Leida, Zhang, Li, Wang, Liang, Wang, Liangliang, Zhang, Lichao, Wang, Lijun, Zhou, Lijun, Zheng, Linyu, Rout, Litu, Van Gool, Luc, Bertinetto, Luca, Danelljan, Martin, Dunnhofer, Matteo, Ni, Meng, Kim, Min Young, Tang, Ming, Yang, Ming-Hsuan, Paluru, Naveen, Martine, Niki, Xu, Pengfei, Zhang, Pengfei, Zheng, Pengkun, Zhang, Pengyu, Torr, Philip H. S., Wang, Qi Zhang Qiang, Gua, Qing, Timofte, Radu, Gorthi, Rama Krishna, Everson, Richard, Han, Ruize, Zhang, Ruohan, You, Shan, Zhao, Shao-Chuan, Zhao, Shengwei, Li, Shihu, Li, Shikun, Ge, Shiming, Bai, Shuai, Guan, Shuosen, Xing, Tengfei, Xu, Tianyang, Yang, Tianyu, Zhang, Ting, Vojir, Tomas, Feng, Wei, Hu, Weiming, Wang, Weizhao, Tang, Wenjie, Zeng, Wenjun, Liu, Wenyu, Chen, Xi, Qiu, Xi, Bai, Xiang, Wu, Xiao-Jun, Yang, Xiaoyun, Chen, Xier, Li, Xin, Sun, Xing, Chen, Xingyu, Tian, Xinmei, Tang, Xu, Zhu, Xue-Feng, Huang, Yan, Chen, Yanan, Lian, Yanchao, Gu, Yang, Liu, Yang, Chen, Yanjie, Zhang, Yi, Xu, Yinda, Wang, Yingming, Li, Yingping, Zhou, Yu, Dong, Yuan, Xu, Yufei, Zhang, Yunhua, Li, Yunkun, Luo, Zeyu Wang Zhao, Zhang, Zhaoliang, Feng, Zhen-Hua, He, Zhenyu, Song, Zhichao, Chen, Zhihao, Zhang, Zhipeng, Wu, Zhirong, Xiong, Zhiwei, Huang, Zhongjian, Teng, Zhu, and Ni, Zihan
- Abstract
The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOT-ST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on "real-time" short-term tracking in RGB, (iii) VOT-LT2019 focused on long-term tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard short-term, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website(1)., Funding Agencies|Slovenian research agencySlovenian Research Agency - Slovenia [J2-8175, P2-0214, P2-0094]; Czech Science Foundation Project GACR [P103/12/G084]; MURI project - MoD/DstlMURI; EPSRCEngineering & Physical Sciences Research Council (EPSRC) [EP/N019415/1]; WASP; VR (ELLIIT, LAST, and NCNN); SSF (SymbiCloud); AIT Strategic Research Programme; Faculty of Computer Science, University of Ljubljana, Slovenia
- Published
- 2019
- Full Text
- View/download PDF
38. Synthetic Data Generation for End-to-End Thermal Infrared Tracking
- Author
-
Zhang, Lichao, Gonzalez-Garcia, Abel, van de Weijer, Joost, Danelljan, Martin, Khan, Fahad, Zhang, Lichao, Gonzalez-Garcia, Abel, van de Weijer, Joost, Danelljan, Martin, and Khan, Fahad
- Abstract
The usage of both off-the-shelf and end-to-end trained deep networks have significantly improved the performance of visual tracking on RGB videos. However, the lack of large labeled datasets hampers the usage of convolutional neural networks for tracking in thermal infrared (TIR) images. Therefore, most state-of-the-art methods on tracking for TIR data are still based on handcrafted features. To address this problem, we propose to use image-to-image translation models. These models allow us to translate the abundantly available labeled RGB data to synthetic TIR data. We explore both the usage of paired and unpaired image translation models for this purpose. These methods provide us with a large labeled dataset of synthetic TIR sequences, on which we can train end-to-end optimal features for tracking. To the best of our knowledge, we are the first to train end-to-end features for TIR tracking. We perform extensive experiments on the VOT-TIR2017 dataset. We show that a network trained on a large dataset of synthetic TIR data obtains better performance than one trained on the available real TIR data. Combining both data sources leads to further improvement. In addition, when we combine the network with motion features, we outperform the state of the art with a relative gain of over 10%, clearly showing the efficiency of using synthetic data to train end-to-end TIR trackers., Funding Agencies|CIIISTERA Project M2CR of the Spanish Ministry [PCIN-2015-251, TIN2016-79717-R]; ACCIO Agency; CERCA Programme/Generalitat de Catalunya; CENIIT [18.14]; VR Starting Grant [2016-05543]
- Published
- 2019
- Full Text
- View/download PDF
39. SDIT
- Author
-
Wang, Yaxing, primary, Gonzalez-Garcia, Abel, additional, van de Weijer, Joost, additional, and Herranz, Luis, additional
- Published
- 2019
- Full Text
- View/download PDF
40. Learning the Model Update for Siamese Trackers
- Author
-
Zhang, Lichao, primary, Gonzalez-Garcia, Abel, additional, Weijer, Joost Van De, additional, Danelljan, Martin, additional, and Khan, Fahad Shahbaz, additional
- Published
- 2019
- Full Text
- View/download PDF
41. Multi-Modal Fusion for End-to-End RGB-T Tracking
- Author
-
Zhang, Lichao, primary, Danelljan, Martin, additional, Gonzalez-Garcia, Abel, additional, van de Weijer, Joost, additional, and Shahbaz Khan, Fahad, additional
- Published
- 2019
- Full Text
- View/download PDF
42. Temporal Coherence for Active Learning in Videos
- Author
-
Zolfaghari Bengar, Javad, primary, Gonzalez-Garcia, Abel, additional, Villalonga, Gabriel, additional, Raducanu, Bogdan, additional, Habibi Aghdam, Hamed, additional, Mozerov, Mikhail, additional, Lopez, Antonio M., additional, and van de Weijer, Joost, additional
- Published
- 2019
- Full Text
- View/download PDF
43. Active Learning for Deep Detection Neural Networks
- Author
-
Aghdam, Hamed H., primary, Gonzalez-Garcia, Abel, additional, Lopez, Antonio, additional, and Weijer, Joost, additional
- Published
- 2019
- Full Text
- View/download PDF
44. The Seventh Visual Object Tracking VOT2019 Challenge Results
- Author
-
Kristan, Matej, primary, Matas, Jiri, additional, Leonardis, Ales, additional, Felsberg, Michael, additional, Pflugfelder, Roman, additional, Kamarainen, Joni-Kristian, additional, Cehovin Zajc, Luka, additional, Drbohlav, Ondrej, additional, Lukezic, Alan, additional, Berg, Amanda, additional, Eldesokey, Abdelrahman, additional, Kapyla, Jani, additional, Fernandez, Gustavo, additional, Gonzalez-Garcia, Abel, additional, Memarmoghadam, Alireza, additional, Lu, Andong, additional, He, Anfeng, additional, Varfolomieiev, Anton, additional, Chan, Antoni, additional, Tripathi, Ardhendu Shekhar, additional, Smeulders, Arnold, additional, Pedasingu, Bala Suraj, additional, Chen, Bao Xin, additional, Zhang, Baopeng, additional, Wu, Baoyuan, additional, Li, Bi, additional, He, Bin, additional, Yan, Bin, additional, Bai, Bing, additional, Li, Bing, additional, Li, Bo, additional, Kim, Byeong Hak, additional, Ma, Chao, additional, Fang, Chen, additional, Qian, Chen, additional, Chen, Cheng, additional, Li, Chenglong, additional, Zhang, Chengquan, additional, Tsai, Chi-Yi, additional, Luo, Chong, additional, Micheloni, Christian, additional, Zhang, Chunhui, additional, Tao, Dacheng, additional, Gupta, Deepak, additional, Song, Dejia, additional, Wang, Dong, additional, Gavves, Efstratios, additional, Yi, Eunu, additional, Khan, Fahad Shahbaz, additional, Zhang, Fangyi, additional, Wang, Fei, additional, Zhao, Fei, additional, Ath, George De, additional, Bhat, Goutam, additional, Chen, Guangqi, additional, Wang, Guangting, additional, Li, Guoxuan, additional, Cevikalp, Hakan, additional, Du, Hao, additional, Zhao, Haojie, additional, Saribas, Hasan, additional, Jung, Ho Min, additional, Bai, Hongliang, additional, Yu, Hongyuan, additional, Peng, Houwen, additional, Lu, Huchuan, additional, Li, Hui, additional, Li, Jiakun, additional, Li, Jianhua, additional, Fu, Jianlong, additional, Chen, Jie, additional, Gao, Jie, additional, Zhao, Jie, additional, Tang, Jin, additional, Li, Jing, additional, Wu, Jingjing, additional, Liu, Jingtuo, additional, Wang, Jinqiao, additional, Qi, Jinqing, additional, Zhang, Jinyue, additional, Tsotsos, John K., additional, Lee, Jong Hyuk, additional, Weijer, Joost van de, additional, Kittler, Josef, additional, Lee, Jun Ha, additional, Zhuang, Junfei, additional, Zhang, Kangkai, additional, Wang, Kangkang, additional, Dai, Kenan, additional, Chen, Lei, additional, Liu, Lei, additional, Guo, Leida, additional, Zhang, Li, additional, Wang, Liang, additional, Wang, Liangliang, additional, Zhang, Lichao, additional, Wang, Lijun, additional, Zhou, Lijun, additional, Zheng, Linyu, additional, Rout, Litu, additional, Van Gool, Luc, additional, Bertinetto, Luca, additional, Danelljan, Martin, additional, Dunnhofer, Matteo, additional, Ni, Meng, additional, Kim, Min Young, additional, Tang, Ming, additional, Yang, Ming-Hsuan, additional, Paluru, Naveen, additional, Martinel, Niki, additional, Xu, Pengfei, additional, Zhang, Pengfei, additional, Zheng, Pengkun, additional, Zhang, Pengyu, additional, Torr, Philip H.S., additional, Wang, Qi Zhang Qiang, additional, Guo, Qing, additional, Timofte, Radu, additional, Gorthi, Rama Krishna, additional, Everson, Richard, additional, Han, Ruize, additional, Zhang, Ruohan, additional, You, Shan, additional, Zhao, Shao-Chuan, additional, Zhao, Shengwei, additional, Li, Shihu, additional, Li, Shikun, additional, Ge, Shiming, additional, Bai, Shuai, additional, Guan, Shuosen, additional, Xing, Tengfei, additional, Xu, Tianyang, additional, Yang, Tianyu, additional, Zhang, Ting, additional, Vojir, Tomas, additional, Feng, Wei, additional, Hu, Weiming, additional, Wang, Weizhao, additional, Tang, Wenjie, additional, Zeng, Wenjun, additional, Liu, Wenyu, additional, Chen, Xi, additional, Qiu, Xi, additional, Bai, Xiang, additional, Wu, Xiao-Jun, additional, Yang, Xiaoyun, additional, Chen, Xier, additional, Li, Xin, additional, Sun, Xing, additional, Chen, Xingyu, additional, Tian, Xinmei, additional, Tang, Xu, additional, Zhu, Xue-Feng, additional, Huang, Yan, additional, Chen, Yanan, additional, Lian, Yanchao, additional, Gu, Yang, additional, Liu, Yang, additional, Chen, Yanjie, additional, Zhang, Yi, additional, Xu, Yinda, additional, Wang, Yingming, additional, Li, Yingping, additional, Zhou, Yu, additional, Dong, Yuan, additional, Xu, Yufei, additional, Zhang, Yunhua, additional, Li, Yunkun, additional, Luo, Zeyu Wang Zhao, additional, Zhang, Zhaoliang, additional, Feng, Zhen-Hua, additional, He, Zhenyu, additional, Song, Zhichao, additional, Chen, Zhihao, additional, Zhang, Zhipeng, additional, Wu, Zhirong, additional, Xiong, Zhiwei, additional, Huang, Zhongjian, additional, Teng, Zhu, additional, and Ni, Zihan, additional
- Published
- 2019
- Full Text
- View/download PDF
45. Synthetic Data Generation for End-to-End Thermal Infrared Tracking
- Author
-
Zhang, Lichao, primary, Gonzalez-Garcia, Abel, additional, van de Weijer, Joost, additional, Danelljan, Martin, additional, and Khan, Fahad Shahbaz, additional
- Published
- 2019
- Full Text
- View/download PDF
46. Objects as Context for Detecting Their Semantic Parts
- Author
-
Gonzalez-Garcia, Abel, primary, Modolo, Davide, additional, and Ferrari, Vittorio, additional
- Published
- 2018
- Full Text
- View/download PDF
47. Do Semantic Parts Emerge in Convolutional Neural Networks?
- Author
-
Gonzalez-Garcia, Abel, primary, Modolo, Davide, additional, and Ferrari, Vittorio, additional
- Published
- 2017
- Full Text
- View/download PDF
48. An active search strategy for efficient object class detection
- Author
-
Gonzalez-Garcia, Abel, primary, Vezhnevets, Alexander, additional, and Ferrari, Vittorio, additional
- Published
- 2015
- Full Text
- View/download PDF
49. An active search strategy for efficient object class detection.
- Author
-
Gonzalez-Garcia, Abel, Vezhnevets, Alexander, and Ferrari, Vittorio
- Published
- 2015
- Full Text
- View/download PDF
50. Transferring and learning representations for image generation and translation
- Author
-
Wang, Yaxing, Weijer, Joost van de, Herranz Arribas|, Luis, Gonzalez Garcia, Abel, and Universitat Autònoma de Barcelona. Departament de Ciències de la Computació
- Subjects
Generació d'imatges ,Tecnologies ,Visió per computador ,Intel·ligència artificial ,Generación de imágenes ,Image generation ,Computer vision ,Artificial iIntelligence ,Visión por computador ,Inteligencia artificial - Abstract
La generació d'imatges és una de les tasques més atractives, fascinants i complexes de la visió per computador. Dels diferents mètodes per la generació d'imatges, les xarxes generatives adversaries (o també anomenades ""GANs"") juguen un paper crucial. Els mètodes generatius més comuns basats en GANs es poden dividir en dos apartats. El primer, simplement anomenat generatiu, utilitza soroll aleatori i sintetitza una imatge per tal de seguir la mateixa distribució que les imatges d'entrenament. En el segon apartat trobem la traducció d'imatge a imatge, on el seu objectiu consiteix en transferir la imatge d'un domini origen a un que és indistingible d'un domini objectiu. Els mètodes d'aquesta categoria de traducció d'imatge a imatge es poden subdividir en emparellats o no emparellats, depenent de si requereixen que les dades siguin emparellades o no. En aquesta tesi, l'objectiu consisteix en resoldre alguns dels reptes tant en la generació d'imatges com en la traducció d'imatge a imatge. Les GANs depenen en gran part de l'accés a una gran quantitat de dades, i fallen al generar imatges realistes a partir del soroll aleatori quan s'apliquen a dominis amb poques imatges. Per solucionar aquest problema, la solució proposada consisteix en transferir el coneixement d'un model entrenat a partir d'un conjunt de dades amb moltes imatges (domini origen) a un entrenat amb dades limitades (domini objectiu). Hem trobat que tant les GANs com les GANs condicionals poden beneficiar-se dels models entrenats amb grans conjunts de dades. Els nostres experiments mostren que transferir el discriminador és més important que fer-ho per el cas del generador. Utilitzar tant el generador com el discriminador resulta en un millor rendiment. No obstant, aquest mètode sufreix d'overfitting, donat que actualitzem tots els paràmetres per adaptar el mètode a les dades de l'objectiu. Proposem una arquitectura nova, feta a mesura per tal de resoldre la transferència de coneixement per el cas de dominis objectius amb molt poques imatges. El nostre mètode explora eficientment quina part de l'espai latent està més relacionat amb el domini objectiu. Adicionalment, el mètode proposat és capaç de transferir el coneixement a partir de múltiples GANs pre-entrenades. Tot i que la traducció de imatge a imatge ha conseguit rendiments extraordinaris, ha d'enfrentarse a diferents problemes. Primer, per el cas de la traducció entre dominis complexes (on les traduccions són entre diferents modalitats) s'ha vist que els mètodes de traducció de imatge a imatge requereixen dades emparellades. Demostrem que únicament quan algunes de les traduccions disposen de la informació (i.e. durant l'entrenament), podem inferir les traduccions restants (on les parelles no estan disponibles). Proposem un mètode nou en el cual alineem diferents codificadors y decodificadors d'imatge d'una manera que ens permet obtenir la traducció simplement encadenant el codificador d'origen amb el decodificador objectiu, encara que aquests no hagin interactuat durant la fase d'entrenament (i.e. sense disposar d'aquesta informació). Segon, existeix el esbiaixament en la traducció de imatge a imatge. Els datasets esbiaixats inevitablement contenen canvis no desitjats, això es deu a que el dataset objectiu té una distribució visual subjacent. Proposem l'ús de restriccions semàntiques curosament dissenyades per reduir els efectes de l'esbiaixament. L'ús de la restricció semàntica implica la preservació de les propietats de les imatges desitjades. Finalment, els mètodes actuals fallen en generar resultats diversos o en realitzar transferència de coneixement escalable a un únic model. Per aliviar aquest problema, proposem una manera escalable i diversa per a la traducció de imatge a imatge. Utilitzem el soroll aleatori per el control de la diversitat. La escalabilitat és determinada a partir del condicionament de la etiqueta del domini. La generación de imágenes es una de las tareas más atractivas, fascinantes y complejas en la visión por computador. De los diferentes métodos para la generación de imágenes, las redes generativas adversarias (o también llamadas ""GANs"") juegan un papel crucial. Los modelos generativos más comunes basados en GANs se pueden dividir en dos apartados. El primero, simplemente llamado generativo, utiliza como entrada ruido aleatorio y sintetiza una imagen que sigue la misma distribución que las imágenes de entrenamiento. En el segundo apartado encontramos la traducción de imagen a imagen, cuyo objetivo consiste en transferir la imagen de un dominio origen a uno que es indistinguible del dominio objetivo. Los métodos de esta categoria de traducción de imagen a imagen se pueden subdividir en emparejados o no emparejados, dependiendo de si se requiere que los datos sean emparejados o no. En esta tesis, el objetivo consiste en resolver algunos de los retos tanto en la generación de imágenes como en la traducción de imagen a imagen. Las GANs dependen en gran parte del acceso a gran cantidad de datos, y fallan al generar imágenes realistas a partir de ruido aleatorio cuando se aplican a dominios con pocas imágenes. Para solucionar este problema, proponemos transferir el conocimiento de un modelo entrenado a partir de un conjunto de datos con muchas imágenes (dominio origen) a uno entrenado con datos limitados (dominio objetivo). Encontramos que tanto las GANs como las GANs condicionales pueden beneficiarse de los modelos entrenados con grandes conjuntos de datos. Nuestros experimentos muestran que transferir el discriminador es más importante que hacerlo para el generador. Usar tanto el generador como el discriminador resulta en un mayor rendimiento. Sin embargo, este método sufre de overfitting, dado que actualizamos todos los parámetros para adaptar el modelo a los datos del objetivo. Para ello proponemos una arquitectura nueva, hecha a medida para resolver la transferencia de conocimiento en el caso de dominios objetivo con muy pocas imágenes. Nuestro método explora eficientemente qué parte del espacio latente está más relacionado con el dominio objetivo. Adicionalmente, el método propuesto es capaz de transferir el conocimiento a partir de múltiples GANs pre-entrenadas. Aunque la traducción de imagen a imagen ha conseguido rendimientos extraordinarios, tiene que enfrentarse a diferentes problemas. Primero, para el caso de la traducción entre dominios complejos (cuyas traducciones son entre diferentes modalidades) se ha observado que los métodos de traducción de imagen a imagen requieren datos emparejados. Demostramos que únicamente cuando algunas de las traducciones disponen de esta información, podemos inferir las traducciones restantes. Proponemos un método nuevo en el cual alineamos diferentes codificadores y decodificadores de imagen de una manera que nos permite obtener la traducción simplemente encadenando el codificador de origen con el decodificador objetivo, aún cuando estos no han interactuado durante la fase de entrenamiento (i.e. sin disponer de dicha información). Segundo, existe el problema del sesgo en la traducción de imagen a imagen. Los conjuntos de datos sesgados inevitablemente contienen cambios no deseados, eso se debe a que el dataset objetivo tiene una distribución visual subyacente. Proponemos el uso de restricciones semánticas cuidadosamente diseñadas para reducir los efectos del sesgo. El uso de la restricción semántica implica la preservación de las propiedades de imagen deseada. Finalmente, los métodos actuales fallan en generar resultados diversos o en realizar transferencia de conocimiento escalables a un único modelo. Para aliviar este problema, proponemos una manera escalable y diversa para la traducción de imagen a imagen. Para ello utilizamos ruido aleatorio para el control de la diversidad. La escalabilidad es determinada a partir del condicionamiento de la etiqueta del dominio. Image generation is arguably one of the most attractive, compelling, and challenging tasks in computer vision. Among the methods which perform image generation, generative adversarial networks (GANs) play a key role. The most common image generation models based on GANs can be divided into two main approaches. The first one, called simply image generation takes random noise as an input and synthesizes an image which follows the same distribution as the images in the training set. The second class, which is called image-to-image translation, aims to map an image from a source domain to one that is indistinguishable from those in the target domain. Image-to-image translation methods can further be divided into paired and unpaired image-to-image translation based on whether they require paired data or not. In this thesis, we aim to address some challenges of both image generation and image-to-image generation. GANs highly rely upon having access to vast quantities of data, and fail to generate realistic images from random noise when applied to domains with few images. To address this problem, we aim to transfer knowledge from a model trained on a large dataset (source domain) to the one learned on limited data (target domain). We find that both GANs and conditional GANs can benefit from models trained on large datasets. Our experiments show that transferring the discriminator is more important than the generator. Using both the generator and discriminator results in the best performance. We found, however, that this method suffers from overfitting, since we update all parameters to adapt to the target data. We propose a novel architecture, which is tailored to address knowledge transfer to very small target domains. Our approach effectively explores which part of the latent space is more related to the target domain. Additionally, the proposed method is able to transfer knowledge from multiple pretrained GANs. Although image-to-image translation has achieved outstanding performance, it still faces several problems. First, for translation between complex domains (such as translations between different modalities) image-to-image translation methods require paired data. We show that when only some of the pairwise translations have been seen (i.e. during training), we can infer the remaining unseen translations (where training pairs are not available). We propose a new approach where we align multiple encoders and decoders in such a way that the desired translation can be obtained by simply cascading the source encoder and the target decoder, even when they have not interacted during the training stage (i.e. unseen). Second, we address the issue of bias in image-to-image translation. Biased datasets unavoidably contain undesired changes, which are due to the fact that the target dataset has a particular underlying visual distribution. We use carefully designed semantic constraints to reduce the effects of the bias. The semantic constraint aims to enforce the preservation of desired image properties. Finally, current approaches fail to generate diverse outputs or perform scalable image transfer in a single model. To alleviate this problem, we propose a scalable and diverse image-to-image translation. We employ random noise to control the diversity. The scalabitlity is determined by conditioning the domain label.
- Published
- 2020
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.