Author: "Hospedales, Timothy M." / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

1. Amortised Invariance Learning for Contrastive Self-Supervision

Author: Chavhan, Ruchika, Stuehmer, Jan, Heggan, Calum, Yaghoobi, Mehrdad, and Hospedales, Timothy M
Abstract: Contrastive self-supervised learning methods famously produce high quality transferable representations by learning invariances to different data augmentations. Invariances established during pre-training can be interpreted as strong inductive biases. However these may or may not be helpful, depending on if they match the invariance requirements of downstream tasks or not. This has led to several attempts to learn task-specific invariances during pre-training, however, these methods are highly compute intensive and tedious to train. We introduce the notion of amortized invariance learning for contrastive self supervision. In the pre-training stage, we parameterize the feature extractor by differentiable invariance hyper-parameters that control the invariances encoded by the representation. Then, for any downstream task, both linear readout and task-specific invariance requirements can be efficiently and effectively learned by gradient-descent. We evaluate the notion of amortized invariances for contrastive learning over two different modalities: vision and audio, on two widely-used contrastive learning methods in vision: SimCLR and MoCo-v2 with popular architectures like ResNets and Vision Transformers, and SimCLR with ResNet-18 for audio. We show that our amortized features provide a reliable way to learn diverse downstream tasks with different invariance requirements, while using a single feature and avoiding task-specific pre-training. This provides an exciting perspective that opens up new horizons in the field of general purpose representation learning.
Published: 2023

2. MEDFAIR: Benchmarking Fairness for Medical Imaging

Author: Zong, Yongshuo, Yang, Yongxin, and Hospedales, Timothy M
Abstract: A multitude of work has shown that machine learning-based medical diagnosis systems can be biased against certain subgroups of people. This has motivated a growing number of bias mitigation algorithms that aim to address fairness issues in machine learning. However, it is difficult to compare their effectiveness in medical imaging for two reasons. First, there is little consensus on the criteria to assess fairness. Second, existing bias mitigation algorithms are developed under different settings, e.g., datasets, model selection strategies, backbones, and fairness metrics, making a direct comparison and evaluation based on existing results impossible. In this work, we introduce MEDFAIR, a framework to benchmark the fairness of machine learning models for medical imaging. MEDFAIR covers eleven algorithms from various categories, ten datasets from different imaging modalities, and three model selection criteria. Through extensive experiments, we find that the under-studied issue of model selection criterion can have a significant impact on fairness outcomes; while in contrast, state-of-the-art bias mitigation algorithms do not significantly improve fairness outcomes over empirical risk minimization (ERM) in both in-distribution and out-of-distribution settings. We evaluate fairness from various perspectives and make recommendations for different medical application scenarios that require different ethical principles. Our framework provides a reproducible and easy-to-use entry point for the development and evaluation of future bias mitigation algorithms in deep learning. Code is available at https://github.com/ys-zong/MEDFAIR.
Published: 2023

3. Domain Generalisation via Domain Adaptation: An Adversarial Fourier Amplitude Approach

Author: Kim, Minyoung, Li, Da, and Hospedales, Timothy M
Abstract: We tackle the domain generalisation (DG) problem by posing it as a domain adaptation (DA) task where we adversarially synthesise the worst-case `target' domain and adapt a model to that worst-case domain, thereby improving the model’s robustness. To synthesise data that is challenging yet semantics-preserving, we generate Fourier amplitude images and combine them with source domain phase images, exploiting the widely believed conjecture from signal processing that amplitude spectra mainly determines image style, while phase data mainly captures image semantics. To synthesise a worst-case domain for adaptation, we train the classifier and the amplitude generator adversarially. Specifically, we exploit the maximum classifier discrepancy (MCD) principle from DA that relates the target domain performance to the discrepancy of classifiers in the model hypothesis space. By Bayesian hypothesis modeling, we express the model hypothesis space effectively as a posterior distribution over classifiers given the source domains, making adversarial MCD minimisation feasible. On the DomainBed benchmark including the large-scale DomainNet dataset, the proposed approach yields significantly improved domain generalisation performance over the state-of-the-art.
Published: 2023

4. ChiroDiff: Modelling chirographic data with Diffusion Models

Author: Das, Ayan, Yang, Yongxin, Hospedales, Timothy M, Xiang, Tao, and Song, Yi-Zhe
Abstract: Generative modelling over continuous-time geometric constructs, a.k.a chirographic data such as handwriting, sketches, drawings etc., have been accomplished through autoregressive distributions. Such strictly-ordered discrete factorization however falls short of capturing key properties of chirographic data -- it fails to build holistic understanding of the temporal concept due to one-way visibility (causality). Consequently, temporal data has been modelled as discrete token sequences of fixed sampling rate instead of capturing the true underlying concept. In this paper, we introduce a powerful model-class namely Denoising\ Diffusion\ Probabilistic\ Models or DDPMs for chirographic data that specifically addresses these flaws. Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate up to a good extent. Moreover, we show that many important downstream utilities (e.g. conditional sampling, creative mixing) can be flexibly implemented using ChiroDiff. We further show some unique use-cases like stochastic vectorization, de-noising/healing, abstraction are also possible with this model-class. We perform quantitative and qualitative evaluation of our framework on relevant datasets and found it to be better or on par with competing approaches.
Published: 2023

5. Self-Supervised Representation Learning: Introduction, advances, and challenges

Author: Ericsson, Linus, Gouk, Henry, Loy, Chen Change, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Applied Mathematics, Signal Processing, Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML), Electrical and Electronic Engineering, Machine Learning (cs.LG)
Abstract: Self-supervised representation learning (SSRL) methods aim to provide powerful, deep feature learning without the requirement of large annotated data sets, thus alleviating the annotation bottleneck—one of the main barriers to the practical deployment of deep learning today. These techniques have advanced rapidly in recent years, with their efficacy approaching and sometimes surpassing fully supervised pretraining alternatives across a variety of data modalities, including image, video, sound, text, and graphs. This article introduces this vibrant area, including key concepts, the four main families of approaches and associated state-of-the-art techniques, and how self-supervised methods are applied to diverse modalities of data. We further discuss practical considerations including workflows, representation transferability, and computational cost. Finally, we survey major open challenges in the field, that provide fertile ground for future work.
Published: 2022
Full Text: View/download PDF

6. Fairness meets Cross-Domain Learning: a new perspective on Models and Metrics

Author: Iurada, Leonardo, Bucci, Silvia, Hospedales, Timothy M., and Tommasi, Tatiana
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning-based recognition systems are deployed at scale for several real-world applications that inevitably involve our social life. Although being of great support when making complex decisions, they might capture spurious data correlations and leverage sensitive attributes (e.g. age, gender, ethnicity). How to factor out this information while keeping a high prediction performance is a task with still several open questions, many of which are shared with those of the domain adaptation and generalization literature which focuses on avoiding visual domain biases. In this work, we propose an in-depth study of the relationship between cross-domain learning (CD) and model fairness by introducing a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks. After having highlighted the limits of the current evaluation metrics, we introduce a new Harmonic Fairness (HF) score to assess jointly how fair and accurate every model is with respect to a reference baseline. Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter. Overall, our work paves the way for a more systematic analysis of fairness problems in computer vision. Code available at: https://github.com/iurada/fairness_crossdomain
Published: 2023
Full Text: View/download PDF

7. Accelerating Self-Supervised Learning via Efficient Training Strategies

Author: Kocyigit, Mustafa Taha, Hospedales, Timothy M, and Bilen, Hakan
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently the focus of the computer vision community has shifted from expensive supervised learning towards self-supervised learning of visual representations. While the performance gap between supervised and self-supervised has been narrowing, the time for training self-supervised deep networks remains an order of magnitude larger than its supervised counterparts, which hinders progress, imposes carbon cost, and limits societal benefits to institutions with substantial resources. Motivated by these issues, this paper investigates reducing the training time of recent self-supervised methods by various model-agnostic strategies that have not been used for this problem. In particular, we study three strategies: an extendable cyclic learning rate schedule, a matching progressive augmentation magnitude and image resolutions schedule, and a hard positive mining strategy based on augmentation difficulty. We show that all three methods combined lead up to 2.7 times speed-up in the training time of several self-supervised methods while retaining comparable performance to the standard self-supervised learning setting.
Published: 2022
Full Text: View/download PDF

8. MetaAudio: A Few-Shot Audio Classification Benchmark

Author: Heggan, Calum, Budgett, Sam, Hospedales, Timothy M, and Yaghoobi Vaighan, Mehrdad
Abstract: Currently available benchmarks for few-shot learning (machine learning with few training examples) are limited in the domains they cover, primarily focusing on image classification. This work aims to alleviate this reliance on image-based benchmarks by offering the first comprehensive, public and fully reproducible audio based alternative, covering a variety of sound domains and experimental settings. We compare the few-shot classification performance of a variety of techniques on seven audio datasets (spanning environmental sounds to human-speech). Extending this, we carry out in-depth analyses of joint training (where all datasets are used during training) and cross-dataset adaptation protocols, establishing the possibility of a generalised audio few-shot classification algorithm. Our experimentation shows gradient-based meta-learning methods such as MAML and Meta-Curvature consistently outperform both metric and baseline methods. We also demonstrate that the joint training routine helps overall generalisation for the environmental sound databases included, as well as being a somewhat-effective method of tackling the cross-dataset/domain setting.
Published: 2022
Full Text: View/download PDF

9. Fisher SAM: Information Geometry and Sharpness Aware Minimisation

Author: Kim, Minyoung, Li, Da, Hu, Shell Xu, Hospedales, Timothy M, Chaudhuri, Kamalika, Jegelka, Stefanie, Song, Le, Szepesvari, Csaba, Niu, Gang, and Sabato, Sivan
Abstract: Recent sharpness-aware minimisation (SAM) is known to find flat minima which is beneficial for better generalisation with improved robustness. SAM essentially modifies the loss function by reporting the maximum loss value within the small neighborhood around the current iterate. However, it uses the Euclidean ball to define the neighborhood, which can be inaccurate since loss functions for neural networks are typically defined over probability distributions (e.g., class predictive probabilities), rendering the parameter space non Euclidean. In this paper we consider the information geometry of the model parameter space when defining the neighborhood, namely replacing SAM’s Euclidean balls with ellipsoids induced by the Fisher information. Our approach, dubbed Fisher SAM, defines more accurate neighborhood structures that conform to the intrinsic metric of the underlying statistical manifold. For instance, SAM may probe the worst-case loss value at either a too nearby or inappropriately distant point due to the ignorance of the parameter space geometry, which is avoided by our Fisher SAM. Another recent Adaptive SAM approach stretches/shrinks the Euclidean ball in accordance with the scale of the parameter magnitudes. This might be dangerous, potentially destroying the neighborhood structure. We demonstrate improved performance of the proposed Fisher SAM on several benchmark datasets/tasks.
Published: 2022

10. Vision-based system identification and 3D keypoint discovery using dynamics constraints

Author: Jaques, Miguel, Asenov, Martin, Burke, Michael, Hospedales, Timothy M, Firoozi, Roya, Mehr, Negar, Yel, Esen, Antonova, Rika, Bohg, Jeannette, Schwager, Mac, and Kochendorfer, Mykel
Subjects: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
Abstract: This paper introduces V-SysId, a novel method that enables simultaneous keypoint discovery, 3D system identification, and extrinsic camera calibration from an unlabeled video taken from a static camera, using only the family of equations of motion of the object of interest as weak supervision. V-SysId takes keypoint trajectory proposals and alternates between maximum likelihood parameter estimation and extrinsic camera calibration, before applying a suitable selection criterion to identify the track of interest. This is then used to train a keypoint tracking model using supervised learning. Results on a range of settings (robotics, physics, physiology) highlight the utility of this approach.
Published: 2022

11. Visual Representation Learning over Latent Domains

Author: Deecke, Lucas, Hospedales, Timothy M, and Bilen, Hakan
Subjects: transfer learning, latent domains, computer vision
Abstract: A fundamental shortcoming of deep neural networks is their specialization to a single task and domain. While multi-domain learning enables the learning of compact models that span multiple visual domains, these rely on the presence of domain labels, in turn requiring laborious curation of datasets. This paper proposes a less explored, but highly realistic new setting called latent domain learning: learning over data from different domains, without access to domain annotations. Experiments show that this setting is particularly challenging for standard models and existing multi-domain approaches, calling for new customized solutions: a sparse adaptation strategy is formulated which adaptively accounts for latent domains in data, and significantly enhances learning in such settings. Our method can be paired seamlessly with existing models, and boosts performance in conceptually related tasks, e.g. empirical fairness problems and long-tailed recognition.
Published: 2022

12. SketchODE: Learning neural sketch representation in continuous time

Author: Das, Ayan, Yang, Yongxin, Hospedales, Timothy M, Xiang, Tao, and Song, Yi-Zhe
Subjects: Free-form, Sketch, Chirography, Neural ODE
Abstract: Learning meaningful representations for chirographic drawing data such as sketches, handwriting, and flowcharts is a gateway for understanding and emulating human creative expression. Despite being inherently continuous-time data, existing works have treated these as discrete-time sequences, disregarding their true nature. In this work, we model such data as continuous-time functions and learn compact representations by virtue of Neural Ordinary Differential Equations. To this end, we introduce the first continuous-time Seq2Seq model and demonstrate some remarkable properties that set it apart from traditional discrete-time analogues. We also provide solutions for some practical challenges for such models, including introducing a family of parameterized ODE dynamics & continuous-time data augmentation particularly suitable for the task. Our models are validated on several datasets including VectorMNIST, DiDi and Quick, Draw!.
Published: 2022

13. Searching for Robustness: Loss Learning for Noisy Classification Tasks

Author: Gao, Boyan, Gouk, Henry, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: We present a "learning to learn" approach for automatically constructing white-box classification loss functions that are robust to label noise in the training data. We parameterize a flexible family of loss functions using Taylor polynomials, and apply evolutionary strategies to search for noise-robust losses in this space. To learn re-usable loss functions that can apply to new tasks, our fitness function scores their performance in aggregate across a range of training dataset and architecture combinations. The resulting white-box loss provides a simple and fast "plug-and-play" module that enables effective noise-robust learning in diverse downstream tasks, without requiring a special training procedure or network architecture. The efficacy of our method is demonstrated on a variety of datasets with both synthetic and real label noise, where we compare favourably to previous work.
Published: 2022
Full Text: View/download PDF

14. Meta Mirror Descent: Optimiser Learning for Fast Convergence

Author: Gao, Boyan, Gouk, Henry, Lee, Hae Beom, and Hospedales, Timothy M.
Subjects: Computer Science::Machine Learning, FOS: Computer and information sciences, Computer Science - Machine Learning, Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control, Machine Learning (cs.LG)
Abstract: Optimisers are an essential component for training machine learning models, and their design influences learning speed and generalisation. Several studies have attempted to learn more effective gradient-descent optimisers via solving a bi-level optimisation problem where generalisation error is minimised with respect to optimiser parameters. However, most existing optimiser learning methods are intuitively motivated, without clear theoretical support. We take a different perspective starting from mirror descent rather than gradient descent, and meta-learning the corresponding Bregman divergence. Within this paradigm, we formalise a novel meta-learning objective of minimising the regret bound of learning. The resulting framework, termed Meta Mirror Descent (MetaMD), learns to accelerate optimisation speed. Unlike many meta-learned optimisers, it also supports convergence and generalisation guarantees and uniquely does so without requiring validation data. We evaluate our framework on a variety of tasks and architectures in terms of convergence rate and generalisation error and demonstrate strong performance.
Published: 2022
Full Text: View/download PDF

15. Fisher SAM: Information Geometry and Sharpness Aware Minimisation

Author: Kim, Minyoung, Li, Da, Hu, Shell Xu, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)
Abstract: Recent sharpness-aware minimisation (SAM) is known to find flat minima which is beneficial for better generalisation with improved robustness. SAM essentially modifies the loss function by reporting the maximum loss value within the small neighborhood around the current iterate. However, it uses the Euclidean ball to define the neighborhood, which can be inaccurate since loss functions for neural networks are typically defined over probability distributions (e.g., class predictive probabilities), rendering the parameter space non Euclidean. In this paper we consider the information geometry of the model parameter space when defining the neighborhood, namely replacing SAM's Euclidean balls with ellipsoids induced by the Fisher information. Our approach, dubbed Fisher SAM, defines more accurate neighborhood structures that conform to the intrinsic metric of the underlying statistical manifold. For instance, SAM may probe the worst-case loss value at either a too nearby or inappropriately distant point due to the ignorance of the parameter space geometry, which is avoided by our Fisher SAM. Another recent Adaptive SAM approach stretches/shrinks the Euclidean ball in accordance with the scale of the parameter magnitudes. This might be dangerous, potentially destroying the neighborhood structure. We demonstrate improved performance of the proposed Fisher SAM on several benchmark datasets/tasks.
Published: 2022
Full Text: View/download PDF

16. EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization

Author: Bohdal, Ondrej, Yang, Yongxin, and Hospedales, Timothy M
Abstract: Gradient-based meta-learning and hyperparameter optimization have seen significant progress recently, enabling practical end-to-end training of neural networks together with many hyperparameters. Nevertheless, existing approaches are relatively expensive as they need to compute second-order derivatives and store a longer computational graph. This cost prevents scaling them to larger network architectures. We present EvoGrad, a new approach to meta-learning that draws upon evolutionary techniques to more efficiently compute hypergradients. EvoGrad estimates hypergradient with respect to hyperparameters without calculating second-order gradients, or storing a longer computational graph, leading to significant improvements in efficiency. We evaluate EvoGrad on two substantial recent meta-learning applications, namely cross-domain few-shot learning with feature-wise transformations and noisy label learning with MetaWeightNet. The results show that EvoGrad significantly improves efficiency and enables scaling meta-learning to bigger CNN architectures such as from ResNet18 to ResNet34.
Published: 2021

17. Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks

Author: Ericsson, Linus, Gouk, Henry, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Self-supervised learning is a powerful paradigm for representation learning on unlabelled images. A wealth of effective new methods based on instance matching rely on data-augmentation to drive learning, and these have reached a rough agreement on an augmentation scheme that optimises popular recognition benchmarks. However, there is strong reason to suspect that different tasks in computer vision require features to encode different (in)variances, and therefore likely require different augmentation strategies. In this paper, we measure the invariances learned by contrastive methods and confirm that they do learn invariance to the augmentations used and further show that this invariance largely transfers to related real-world changes in pose and lighting. We show that learned invariances strongly affect downstream task performance and confirm that different downstream tasks benefit from polar opposite (in)variances, leading to performance loss when the standard augmentation strategy is used. Finally, we demonstrate that a simple fusion of representations with complementary invariances ensures wide transferability to all the diverse downstream tasks considered., Code available at https://github.com/linusericsson/ssl-invariances
Published: 2021

18. Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

Author: Zhao, Chenyang, Hospedales, Timothy M, Balasubramanian, Vineeth N., and Tsang, Ivor
Subjects: deep reinforcement learning, domain randomisation, mutual learning
Abstract: In reinforcement learning, domain randomisation is an increasingly popular technique for learning more general policies that are robust to domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variance in gradient estimation and unstable learning process. To address this issue, we present a peer-to-peer online distillation strategy for RL termed P2PDRL, where multiple workers are each assigned to a different environment, and exchange knowledge through mutual regularisation based on Kullback–Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation to new environments at testing.
Published: 2021

19. Towards Unsupervised Sketch-based Image Retrieval

Author: Hu, Conghui, Yang, Yongxin, Li, Yunpeng, Hospedales, Timothy M., and Song, Yi-Zhe
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: The practical value of existing supervised sketch-based image retrieval (SBIR) algorithms is largely limited by the requirement for intensive data collection and labeling. In this paper, we present the first attempt at unsupervised SBIR to remove the labeling cost (both category annotations and sketch-photo pairings) that is conventionally needed for training. Existing single-domain unsupervised representation learning methods perform poorly in this application, due to the unique cross-domain (sketch and photo) nature of the problem. We therefore introduce a novel framework that simultaneously performs sketch-photo domain alignment and semantic-aware representation learning. Technically this is underpinned by introducing joint distribution optimal transport (JDOT) to align data from different domains, which we extend with trainable cluster prototypes and feature memory banks to further improve scalability and efficacy. Extensive experiments show that our framework achieves excellent performance in the new unsupervised setting, and performs comparably or better than state-of-the-art in the zero-shot setting.
Published: 2021

20. Distance-Based Regularisation of Deep Networks for Fine-Tuning

Author: Gouk, Henry, Hospedales, Timothy M, and Pontil, Massimiliano
Abstract: We investigate approaches to regularisation during fine-tuning of deep neural networks. First we provide a neural network generalisation bound based on Rademacher complexity that uses the distance the weights have moved from their initial values. This bound has no direct dependence on the number of weights and compares favourably to other bounds when applied to convolutional networks. Our bound is highly relevant for fine-tuning, because providing a network with a good initialisation based on transfer learning means that learning can modify the weights less, and hence achieve tighter generalisation. Inspired by this, we develop a simple yet effective fine-tuning algorithm that constrains the hypothesis class to a small sphere centred on the initial pre-trained weights, thus obtaining provably better generalisation performance than conventional transfer learning. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results. It outperforms both state of the art fine-tuning competitors, and penalty-based alternatives that we show do not directly constrain the radius of the search space.
Published: 2021

21. FedH2L: Federated Learning with Model and Statistical Heterogeneity

Author: Li, Yiying, Zhou, Wei, Wang, Huaimin, Mi, Haibo, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Machine Learning (cs.LG)
Abstract: Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy. Mainstream FL approaches require each participant to share a common network architecture and further assume that data are are sampled IID across participants. However, in real-world deployments participants may require heterogeneous network architectures; and the data distribution is almost certainly non-uniform across participants. To address these issues we introduce FedH2L, which is agnostic to both the model architecture and robust to different data distributions across participants. In contrast to approaches sharing parameters or gradients, FedH2L relies on mutual distillation, exchanging only posteriors on a shared seed set between participants in a decentralized manner. This makes it extremely bandwidth efficient, model agnostic, and crucially produces models capable of performing well on the whole data distribution when learning from heterogeneous silos.
Published: 2021

22. Tensor Composition Net for Visual Relationship Prediction

Author: Qiang, Yuting, Yang, Yongxin, Zhang, Xueting, Guo, Yanwen, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a novel Tensor Composition Net (TCN) to predict visual relationships in images. Visual Relationship Prediction (VRP) provides a more challenging test of image understanding than conventional image tagging and is difficult to learn due to a large label-space and incomplete annotation. The key idea of our TCN is to exploit the low-rank property of the visual relationship tensor, so as to leverage correlations within and across objects and relations and make a structured prediction of all visual relationships in an image. To show the effectiveness of our model, we first empirically compare our model with Multi-Label Image Classification (MLIC) methods, eXtreme Multi-label Classification (XMC) methods, and VRD methods. We then show that thanks to our tensor (de)composition layer, our model can predict visual relationships which have not been seen in the training dataset. We finally show our TCN's image-level visual relationship prediction provides a simple and efficient mechanism for relation-based image-retrieval even compared with VRD methods.
Published: 2020

23. Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Author: Zhou, Wei, Li, Yiying, Yang, Yongxin, Wan, Huaimin, and Hospedales, Timothy M
Abstract: Off-Policy Actor-Critic (OffP-AC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a flexible and augmented meta-critic that observes the learning process and meta-learns an additional loss for the actor that accelerates and improves actor-critic learning. Compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning benefits to a variety of continuous control tasks when combined with contemporary OffP-AC methods DDPG, TD3 and SAC.
Published: 2020

24. BézierSketch: A Generative Model for Scalable Vector Sketches

Author: Das, Ayan, Yang, Yongxin, Hospedales, Timothy M, Xiang, Tao, and Song, Yi-Zhe
Subjects: Bézier curve, Sketch generation, Scalable graphics, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: The study of neural generative models of human sketches is a fascinating contemporary modeling problem due to the links between sketch image generation and the human drawing process. The landmark SketchRNN provided breakthrough by sequentially generating sketches as a sequence of waypoints. However this leads to low-resolution image generation, and failure to model long sketches. In this paper we present BézierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. To this end, we first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit Bézier curve. This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches, while producing scalable high-resolution results. We report qualitative and quantitative results on the Quick, Draw! benchmark.
Published: 2020
Full Text: View/download PDF

25. Online Meta-Learning for Multi-Source and Semi-Supervised Domain Adaptation

Author: Li, Da and Hospedales, Timothy M
Subjects: meta-learning, domain adaptation
Abstract: Domain adaptation (DA) is the topical problem of adapting models from labelled source datasets so that they perform well on target datasets where only unlabelled or partially labelled data is available. Many methods have been proposed to address this problem through different ways to minimise the domain shift between source and target datasets. In this paper we take an orthogonal perspective and propose a framework to further enhance performance by meta-learning the initial conditions of existing DA algorithms. This is challenging compared to the more widely considered setting of few-shot meta-learning, due to the length of the computation graph involved. Therefore we propose an online shortest-path meta-learning framework that is both computationally tractable and practically effective for improving DA performance. We present variants for both multi-source unsupervised domain adaptation (MSDA), and semi-supervised domain adaptation (SSDA). Importantly, our approach is agnostic to the base adaptation algorithm, and can be applied to improve many techniques. Experimentally, we demon-strate improvements on classic (DANN) and recent (MCD and MME) techniques for MSDA and SSDA, and ultimately achieve state of the art results on several DA benchmarks including the largest scale DomainNet.
Published: 2020
Full Text: View/download PDF

26. Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval

Author: Bhunia, Ayan Kumar, Yang, Yongxin, Hospedales, Timothy M., Xiang, Tao, and Song, Yi-Zhe
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Computer Science - Computer Vision and Pattern Recognition
Abstract: Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user's query sketch. Its widespread applicability is however hindered by the fact that drawing a sketch takes time, and most people struggle to draw a complete and faithful sketch. In this paper, we reformulate the conventional FG-SBIR framework to tackle these challenges, with the ultimate goal of retrieving the target photo with the least number of strokes possible. We further propose an on-the-fly design that starts retrieving as soon as the user starts drawing. To accomplish this, we devise a reinforcement learning-based cross-modal retrieval framework that directly optimizes rank of the ground-truth photo over a complete sketch drawing episode. Additionally, we introduce a novel reward scheme that circumvents the problems related to irrelevant sketch strokes, and thus provides us with a more consistent rank list during the retrieval. We achieve superior early-retrieval efficiency over state-of-the-art methods and alternative baselines on two publicly available fine-grained sketch retrieval datasets., IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020 [Oral Presentation] Code: https://github.com/AyanKumarBhunia/on-the-fly-FGSBIR
Published: 2020

27. Don't Wait, Just Weight: Improving Unsupervised Representations by Learning Goal-Driven Instance Weights

Author: Ericsson, Linus, Gouk, Henry, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, ComputingMethodologies_PATTERNRECOGNITION, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: In the absence of large labelled datasets, self-supervised learning techniques can boost performance by learning useful representations from unlabelled data, which is often more readily available. However, there is often a domain shift between the unlabelled collection and the downstream target problem data. We show that by learning Bayesian instance weights for the unlabelled data, we can improve the downstream classification accuracy by prioritising the most useful instances. Additionally, we show that the training time can be reduced by discarding unnecessary datapoints. Our method, BetaDataWeighter is evaluated using the popular self-supervised rotation prediction task on STL-10 and Visual Decathlon. We compare to related instance weighting schemes, both hand-designed heuristics and meta-learning, as well as conventional self-supervised learning. BetaDataWeighter achieves both the highest average accuracy and rank across datasets, and on STL-10 it prunes up to 78% of unlabelled images without significant loss in accuracy, corresponding to over 50% reduction in training time.
Published: 2020
Full Text: View/download PDF

28. Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Author: Zhou, Wei, Li, Yiying, Yang, Yongxin, Wang, Huaimin, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a novel and flexible meta-critic that observes the learning process and meta-learns an additional loss for the actor that accelerates and improves actor-critic learning. Compared to the vanilla critic, the meta-critic network is explicitly trained to accelerate the learning process; and compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic framework is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning leads to improvements in avariety of continuous control environments when combined with contemporary Off-PAC methods DDPG, TD3 and the state-of-the-art SAC., Comment: NeurIPS 2020
Published: 2020
Full Text: View/download PDF

29. Factorized Higher-Order CNNs with an Application to Spatio-Temporal Emotion Estimation

Author: Kossaifi, Jean Toisoul, Antoine Bulat, Adrian Panagakis, Yannis Hospedales, Timothy M. Pantic, Maja
Abstract: Training deep neural networks with spatio-temporal (i.e., 3D) or multidimensional convolutions of higher-order is computationally challenging due to millions of unknown parameters across dozens of layers. To alleviate this, one approach is to apply low-rank tensor decompositions to convolution kernels in order to compress the network and reduce its number of parameters. Alternatively, new convolutional blocks, such as MobileNet, can be directly designed for efficiency. In this paper, we unify these two approaches by proposing a tensor factorization framework for efficient multidimensional (separable) convolutions of higher-order. Interestingly, the proposed framework enables a novel higher-order transduction, allowing to train a network on a given domain (e.g., 2D images or N-dimensional data in general) and using transduction to generalize to higher-order data such as videos (or (N+K)-dimensional data in general), capturing for instance temporal dynamics while preserving the learnt spatial information. We apply the proposed methodology, coined CP-Higher-Order Convolution (HO-CPConv), to spatio-temporal facial emotion analysis. Most existing facial affect models focus on static imagery and discard all temporal information. This is due to the above-mentioned burden of training 3D convolutional nets and the lack of large bodies of video data annotated by experts. We address both issues with our proposed framework. Initial training is first done on static imagery before using transduction to generalize to the temporal domain. We demonstrate superior performance on three challenging large scale affect estimation datasets, AffectNet, SEWA, and AFEW-VA.
Published: 2020

30. Distance-Based Regularisation of Deep Networks for Fine-Tuning

Author: Gouk, Henry, Hospedales, Timothy M., and Pontil, Massimiliano
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: We investigate approaches to regularisation during fine-tuning of deep neural networks. First we provide a neural network generalisation bound based on Rademacher complexity that uses the distance the weights have moved from their initial values. This bound has no direct dependence on the number of weights and compares favourably to other bounds when applied to convolutional networks. Our bound is highly relevant for fine-tuning, because providing a network with a good initialisation based on transfer learning means that learning can modify the weights less, and hence achieve tighter generalisation. Inspired by this, we develop a simple yet effective fine-tuning algorithm that constrains the hypothesis class to a small sphere centred on the initial pre-trained weights, thus obtaining provably better generalisation performance than conventional transfer learning. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results. It outperforms both state of the art fine-tuning competitors, and penalty-based alternatives that we show do not directly constrain the radius of the search space.
Published: 2020
Full Text: View/download PDF

31. Deep clustering with concrete k-means

Author: Gao, Boyan, Yang, Yongxin, Gouk, Henry, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: We address the problem of simultaneously learning a k-means clustering and deep feature representation from unlabelled data, which is of interest due to the potential of deep k-means to outperform traditional two-step feature extraction and shallow-clustering strategies. We achieve this by developing a gradient-estimator for the non-differentiable k-means objective via the Gumbel-Softmax reparameterisation trick. In contrast to previous attempts at deep clustering, our concrete k-means model can be optimised with respect to the canonical k-means objective and is easily trained end-to-end without resorting to alternating optimisation. We demonstrate the efficacy of our method on standard clustering benchmarks.
Published: 2019

32. Hypernetwork Knowledge Graph Embeddings

Author: Balazevic, Ivana, Allen, Carl, and Hospedales, Timothy M.
Abstract: Knowledge graphs are graphical representations of large databases of facts, which typically suffer from incompleteness. Inferring missing relations (links) between entities (nodes) is the task of link prediction. A recent state-of-the-art approach to link prediction, ConvE, implements a convolutional neural network to extract features from concatenated subject and relation vectors. Whilst results are impressive, the method is unintuitive and poorly understood. We propose a hypernetwork architecture that generates simplified relation-specific convolutional filters that (i) outperforms ConvE and all previous approaches across standard datasets; and (ii) can be framed as tensor factorization and thus set within a well established family of factorization models for link prediction. We thus demonstrate that convolution simply offers a convenient computational means of introducing sparsity and parameter tying to find an effective trade-off between non-linear expressiveness and the number of parameters to learn.
Published: 2019
Full Text: View/download PDF

33. Frustratingly Easy Person Re-Identification: Generalizing Person Re-ID in Practice

Author: Jia, Jieru, Ruan, Qiuqi, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Contemporary person re-identification (\reid) methods usually require access to data from the deployment camera network during training in order to perform well. This is because contemporary \reid{} models trained on one dataset do not generalise to other camera networks due to the domain-shift between datasets. This requirement is often the bottleneck for deploying \reid{} systems in practical security or commercial applications, as it may be impossible to collect this data in advance or prohibitively costly to annotate it. This paper alleviates this issue by proposing a simple baseline for domain generalizable~(DG) person re-identification. That is, to learn a \reid{} model from a set of source domains that is suitable for application to unseen datasets out-of-the-box, without any model updating. Specifically, we observe that the domain discrepancy in \reid{} is due to style and content variance across datasets and demonstrate appropriate Instance and Feature Normalization alleviates much of the resulting domain-shift in Deep \reid{} models. Instance Normalization~(IN) in early layers filters out style statistic variations and Feature Normalization~(FN) in deep layers is able to further eliminate disparity in content statistics. Compared to contemporary alternatives, this approach is extremely simple to implement, while being faster to train and test, thus making it an extremely valuable baseline for implementing \reid{} in practice. With a few lines of code, it increases the rank 1 \reid{} accuracy by {11.8\%, 33.2\%, 12.8\% and 8.5\%} on the VIPeR, PRID, GRID, and i-LIDS benchmarks respectively. Source codes are available at \url{https://github.com/BJTUJia/person_reID_DualNorm}., 14 pages,2 figures
Published: 2019

34. Feature-Critic Networks for Heterogeneous Domain Generalization

Author: Li, Yiying, Yang, Yongxin, Zhou, Wei, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: The well known domain shift issue causes model performance to degrade when deployed to a new target domain with different statistics to training. Domain adaptation techniques alleviate this, but need some instances from the target domain to drive adaptation. Domain generalisation is the recently topical problem of learning a model that generalises to unseen domains out of the box, and various approaches aim to train a domain-invariant feature extractor, typically by adding some manually designed losses. In this work, we propose a learning to learn approach, where the auxiliary loss that helps generalisation is itself learned. Beyond conventional domain generalisation, we consider a more challenging setting of heterogeneous domain generalisation, where the unseen domains do not share label space with the seen ones, and the goal is to train a feature representation that is useful off-the-shelf for novel data and novel categories. Experimental evaluation demonstrates that our method outperforms state-of-the-art solutions in both settings., Presented at ICML 2019
Published: 2019

35. Measuring the Transferability of Adversarial Examples

Author: Petrov, Deyan and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Statistics - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML), Cryptography and Security (cs.CR), Machine Learning (cs.LG)
Abstract: Adversarial examples are of wide concern due to their impact on the reliability of contemporary machine learning systems. Effective adversarial examples are mostly found via white-box attacks. However, in some cases they can be transferred across models, thus enabling them to attack black-box models. In this work we evaluate the transferability of three adversarial attacks - the Fast Gradient Sign Method, the Basic Iterative Method, and the Carlini & Wagner method, across two classes of models - the VGG class(using VGG16, VGG19 and an ensemble of VGG16 and VGG19), and the Inception class(Inception V3, Xception, Inception Resnet V2, and an ensemble of the three). We also outline the problems with the assessment of transferability in the current body of research and attempt to amend them by picking specific "strong" parameters for the attacks, and by using a L-Infinity clipping technique and the SSIM metric for the final evaluation of the attack transferability.
Published: 2019
Full Text: View/download PDF

36. Investigating Generalisation in Continuous Deep Reinforcement Learning

Author: Zhao, Chenyang, Sigaud, Olivier, Stulp, Freek, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Deep Reinforcement Learning has shown great success in a variety of control tasks. However, it is unclear how close we are to the vision of putting Deep RL into practice to solve real world problems. In particular, common practice in the field is to train policies on largely deterministic simulators and to evaluate algorithms through training performance alone, without a train/test distinction to ensure models generalise and are not overfitted. Moreover, it is not standard practice to check for generalisation under domain shift, although robustness to such system change between training and testing would be necessary for real-world Deep RL control, for example, in robotics. In this paper we study these issues by first characterising the sources of uncertainty that provide generalisation challenges in Deep RL. We then provide a new benchmark and thorough empirical evaluation of generalisation challenges for state of the art Deep RL methods. In particular, we show that, if generalisation is the goal, then common practice of evaluating algorithms based on their training performance leads to the wrong conclusions about algorithm choice. Finally, we evaluate several techniques for improving generalisation and draw conclusions about the most robust techniques to date.
Published: 2019
Full Text: View/download PDF

37. RelationNet2: Deep Comparison Columns for Few-Shot Learning

Author: Zhang, Xueting, Qiang, Yuting, Sung, Flood, Yang, Yongxin, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Few-shot deep learning is a topical challenge area for scaling visual recognition to open ended growth of unseen new classes with limited labeled examples. A promising approach is based on metric learning, which trains a deep embedding to support image similarity matching. Our insight is that effective general purpose matching requires non-linear comparison of features at multiple abstraction levels. We thus propose a new deep comparison network comprised of embedding and relation modules that learn multiple non-linear distance metrics based on different levels of features simultaneously. Furthermore, to reduce over-fitting and enable the use of deeper embeddings, we represent images as distributions rather than vectors via learning parameterized Gaussian noise regularization. The resulting network achieves excellent performance on both miniImageNet and tieredImageNet., 10 pages, 5 figures, Published in IJCNN 2020
Published: 2018

38. Deep Factorised Inverse-Sketching

Author: Pang, Kaiyue, Li, Da, Song, Jifei, Song, Yi-Zhe, Xiang, Tao, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Modelling human free-hand sketches has become topical recently, driven by practical applications such as fine-grained sketch based image retrieval (FG-SBIR). Sketches are clearly related to photo edge-maps, but a human free-hand sketch of a photo is not simply a clean rendering of that photo's edge map. Instead there is a fundamental process of abstraction and iconic rendering, where overall geometry is warped and salient details are selectively included. In this paper we study this sketching process and attempt to invert it. We model this inversion by translating iconic free-hand sketches to contours that resemble more geometrically realistic projections of object boundaries, and separately factorise out the salient added details. This factorised re-representation makes it easier to match a free-hand sketch to a photo instance of an object. Specifically, we propose a novel unsupervised image style transfer model based on enforcing a cyclic embedding consistency constraint. A deep FG-SBIR model is then formulated to accommodate complementary discriminative detail from each factorised sketch for better matching with the corresponding photo. Our method is evaluated both qualitatively and quantitatively to demonstrate its superiority over a number of state-of-the-art alternatives for style transfer and FG-SBIR., Accepted to ECCV 2018
Published: 2018

39. Universal Perceptual Grouping

Author: Li, Ke, Pang, Kaiyue, Song, Jifei, Song, Yi-Zhe, Xiang, Tao, Hospedales, Timothy M., and Zhang, Honggang
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work we aim to develop a universal sketch grouper. That is, a grouper that can be applied to sketches of any category in any domain to group constituent strokes/segments into semantically meaningful object parts. The first obstacle to this goal is the lack of large-scale datasets with grouping annotation. To overcome this, we contribute the largest sketch perceptual grouping (SPG) dataset to date, consisting of 20,000 unique sketches evenly distributed over 25 object categories. Furthermore, we propose a novel deep universal perceptual grouping model. The model is learned with both generative and discriminative losses. The generative losses improve the generalisation ability of the model to unseen object categories and datasets. The discriminative losses include a local grouping loss and a novel global grouping loss to enforce global grouping consistency. We show that the proposed model significantly outperforms the state-of-the-art groupers. Further, we show that our grouper is useful for a number of sketch analysis tasks including sketch synthesis and fine-grained sketch-based image retrieval (FG-SBIR)., Accepted ECCV 2018
Published: 2018

40. Deep Neural Decision Trees

Author: Yang, Yongxin, Morillo, Irene Garcia, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: Deep neural networks have been proven powerful at processing perceptual data, such as images and audio. However for tabular data, tree-based models are more popular. A nice property of tree-based models is their natural interpretability. In this work, we present Deep Neural Decision Trees (DNDT) -- tree models realised by neural networks. A DNDT is intrinsically interpretable, as it is a tree. Yet as it is also a neural network (NN), it can be easily implemented in NN toolkits, and trained with gradient descent rather than greedy splitting. We evaluate DNDT on several tabular datasets, verify its efficacy, and investigate similarities and differences between DNDT and vanilla decision trees. Interestingly, DNDT self-prunes at both split and feature-level., Comment: presented at 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, Sweden
Published: 2018
Full Text: View/download PDF

41. The Devil is in the Middle: Exploiting Mid-level Representations for Cross-Domain Instance Matching

Author: Yu, Qian, Chang, Xiaobin, Song, Yi-Zhe, Xiang, Tao, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Many vision problems require matching images of object instances across different domains. These include fine-grained sketch-based image retrieval (FG-SBIR) and Person Re-identification (person ReID). Existing approaches attempt to learn a joint embedding space where images from different domains can be directly compared. In most cases, this space is defined by the output of the final layer of a deep neural network (DNN), which primarily contains features of a high semantic level. In this paper, we argue that both high and mid-level features are relevant for cross-domain instance matching (CDIM). Importantly, mid-level features already exist in earlier layers of the DNN. They just need to be extracted, represented, and fused properly with the final layer. Based on this simple but powerful idea, we propose a unified framework for CDIM. Instantiating our framework for FG-SBIR and ReID, we show that our simple models can easily beat the state-of-the-art models, which are often equipped with much more elaborate architectures., Reference updated
Published: 2017

42. Deep Matching Autoencoders

Author: Mukherjee, Tanmoy, Yamada, Makoto, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Statistics - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML)
Abstract: Increasingly many real world tasks involve data in multiple modalities or views. This has motivated the development of many effective algorithms for learning a common latent space to relate multiple domains. However, most existing cross-view learning algorithms assume access to paired data for training. Their applicability is thus limited as the paired data assumption is often violated in practice: many tasks have only a small subset of data available with pairing annotation, or even no paired data at all. In this paper we introduce Deep Matching Autoencoders (DMAE), which learn a common latent space and pairing from unpaired multi-modal data. Specifically we formulate this as a cross-domain representation learning and object matching problem. We simultaneously optimise parameters of representation learning auto-encoders and the pairing of unpaired multi-modal data. This framework elegantly spans the full regime from fully supervised, semi-supervised, and unsupervised (no paired data) multi-modal learning. We show promising results in image captioning, and on a new task that is uniquely enabled by our methodology: unsupervised classifier learning., 10 pages
Published: 2017

43. Weakly Supervised Image Annotation and Segmentation with Objects and Attributes

Author: Shi, Zhiyuan, Yang, Yongxin, Hospedales, Timothy M., and Xiang, Tao
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose to model complex visual scenes using a non-parametric Bayesian model learned from weakly labelled images abundant on media sharing sites such as Flickr. Given weak image-level annotations of objects and attributes without locations or associations between them, our model aims to learn the appearance of object and attribute classes as well as their association on each object instance. Once learned, given an image, our model can be deployed to tackle a number of vision problems in a joint and coherent manner, including recognising objects in the scene (automatic object annotation), describing objects using their attributes (attribute prediction and association), and localising and delineating the objects (object detection and semantic segmentation). This is achieved by developing a novel Weakly Supervised Markov Random Field Stacked Indian Buffet Process (WS-MRF-SIBP) that models objects and attributes as latent factors and explicitly captures their correlations within and across superpixels. Extensive experiments on benchmark datasets demonstrate that our weakly supervised model significantly outperforms weakly supervised alternatives and is often comparable with existing strongly supervised models on a variety of tasks including semantic segmentation, automatic image annotation and retrieval based on object-attribute associations., Comment: Accepted in IEEE Transaction on Pattern Analysis and Machine Intelligence
Published: 2017
Full Text: View/download PDF

44. Actor-Critic Sequence Training for Image Captioning

Author: Zhang, Li, Sung, Flood, Liu, Feng, Xiang, Tao, Gong, Shaogang, Yang, Yongxin, and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Generating natural language descriptions of images is an important capability for a robot or other visual-intelligence driven AI agent that may need to communicate with human users about what it is seeing. Such image captioning methods are typically trained by maximising the likelihood of ground-truth annotated caption given the image. While simple and easy to implement, this approach does not directly maximise the language quality metrics we care about such as CIDEr. In this paper we investigate training image captioning methods based on actor-critic reinforcement learning in order to directly optimise non-differentiable quality metrics of interest. By formulating a per-token advantage and value computation strategy in this novel reinforcement learning based captioning model, we show that it is possible to achieve the state of the art performance on the widely used MSCOCO benchmark.
Published: 2017
Full Text: View/download PDF

45. Unifying Multi-Domain Multi-Task Learning: Tensor and Neural Network Perspectives

Author: Yang, Yongxin and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Machine Learning (cs.LG)
Abstract: Multi-domain learning aims to benefit from simultaneously learning across several different but related domains. In this chapter, we propose a single framework that unifies multi-domain learning (MDL) and the related but better studied area of multi-task learning (MTL). By exploiting the concept of a \emph{semantic descriptor} we show how our framework encompasses various classic and recent MDL/MTL algorithms as special cases with different semantic descriptor encodings. As a second contribution, we present a higher order generalisation of this framework, capable of simultaneous multi-task-multi-domain learning. This generalisation has two mathematically equivalent views in multi-linear algebra and gated neural networks respectively. Moreover, by exploiting the semantic descriptor, it provides neural networks the capability of zero-shot learning (ZSL), where a classifier is generated for an unseen class without any training data; as well as zero-shot domain adaptation (ZSDA), where a model is generated for an unseen domain without any training data. In practice, this framework provides a powerful yet easy to implement method that can be flexibly applied to MTL, MDL, ZSL and ZSDA., Invited book chapter
Published: 2016

46. Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation

Author: Xu, Xun, Hospedales, Timothy M., and Gong, Shaogang
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Zero-Shot Learning (ZSL) promises to scale visual recognition by bypassing the conventional model training requirement of annotated examples for every category. This is achieved by establishing a mapping connecting low-level features and a semantic description of the label space, referred as visual-semantic mapping, on auxiliary data. Reusing the learned mapping to project target videos into an embedding space thus allows novel-classes to be recognised by nearest neighbour inference. However, existing ZSL methods suffer from auxiliary-target domain shift intrinsically induced by assuming the same mapping for the disjoint auxiliary and target classes. This compromises the generalisation accuracy of ZSL recognition on the target data. In this work, we improve the ability of ZSL to generalise across this domain shift in both model- and data-centric ways by formulating a visual-semantic mapping with better generalisation properties and a dynamic data re-weighting method to prioritise auxiliary data that are relevant to the target classes. Specifically: (1) We introduce a multi-task visual-semantic mapping to improve generalisation by constraining the semantic mapping parameters to lie on a low-dimensional manifold, (2) We explore prioritised data augmentation by expanding the pool of auxiliary data with additional instances weighted by relevance to the target domain. The proposed new model is applied to the challenging zero-shot action recognition problem to demonstrate its advantages over existing ZSL models., Comment: Published in ECCV 2016
Published: 2016
Full Text: View/download PDF

47. Trace Norm Regularised Deep Multi-Task Learning

Author: Yang, Yongxin and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Machine Learning (cs.LG)
Abstract: We propose a framework for training multiple neural networks simultaneously. The parameters from all models are regularised by the tensor trace norm, so that each neural network is encouraged to reuse others' parameters if possible -- this is the main motivation behind multi-task learning. In contrast to many deep multi-task learning models, we do not predefine a parameter sharing strategy by specifying which layers have tied parameters. Instead, our framework considers sharing for all shareable layers, and the sharing strategy is learned in a data-driven way., Comment: Submission to Workshop track - ICLR 2017
Published: 2016
Full Text: View/download PDF

48. Weakly Supervised Learning of Objects, Attributes and their Associations

Author: Shi, Zhiyuan, Yang, Yongxin, Hospedales, Timothy M., and Xiang, Tao
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: When humans describe images they tend to use combinations of nouns and adjectives, corresponding to objects and their associated attributes respectively. To generate such a description automatically, one needs to model objects, attributes and their associations. Conventional methods require strong annotation of object and attribute locations, making them less scalable. In this paper, we model object-attribute associations from weakly labelled images, such as those widely available on media sharing sites (e.g. Flickr), where only image-level labels (either object or attributes) are given, without their locations and associations. This is achieved by introducing a novel weakly supervised non-parametric Bayesian model. Once learned, given a new image, our model can describe the image, including objects, attributes and their associations, as well as their locations and segmentation. Extensive experiments on benchmark datasets demonstrate that our weakly supervised model performs at par with strongly supervised models on tasks such as image description and retrieval based on object-attribute associations., 14 pages, Accepted to ECCV 2014
Published: 2015

49. Transductive Multi-class and Multi-label Zero-shot Learning

Author: Fu, Yanwei, Yang, Yongxin, Hospedales, Timothy M., Xiang, Tao, and Gong, Shaogang
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (cs.LG)
Abstract: Recently, zero-shot learning (ZSL) has received increasing interest. The key idea underpinning existing ZSL approaches is to exploit knowledge transfer via an intermediate-level semantic representation which is assumed to be shared between the auxiliary and target datasets, and is used to bridge between these domains for knowledge transfer. The semantic representation used in existing approaches varies from visual attributes to semantic word vectors and semantic relatedness. However, the overall pipeline is similar: a projection mapping low-level features to the semantic representation is learned from the auxiliary dataset by either classification or regression models and applied directly to map each instance into the same semantic representation space where a zero-shot classifier is used to recognise the unseen target class instances with a single known 'prototype' of each target class. In this paper we discuss two related lines of work improving the conventional approach: exploiting transductive learning ZSL, and generalising ZSL to the multi-label case., 4 pages, 4 figures, ECCV 2014 Workshop on Parts and Attributes
Published: 2015

50. A Unified Perspective on Multi-Domain and Multi-Task Learning

Author: Yang, Yongxin and Hospedales, Timothy M.
Subjects: FOS: Computer and information sciences, Computer Science - Learning, Statistics - Machine Learning, Computer Science - Neural and Evolutionary Computing, Machine Learning (stat.ML), Neural and Evolutionary Computing (cs.NE), Machine Learning (cs.LG)
Abstract: In this paper, we provide a new neural-network based perspective on multi-task learning (MTL) and multi-domain learning (MDL). By introducing the concept of a semantic descriptor, this framework unifies MDL and MTL as well as encompassing various classic and recent MTL/MDL algorithms by interpreting them as different ways of constructing semantic descriptors. Our interpretation provides an alternative pipeline for zero-shot learning (ZSL), where a model for a novel class can be constructed without training data. Moreover, it leads to a new and practically relevant problem setting of zero-shot domain adaptation (ZSDA), which is the analogous to ZSL but for novel domains: A model for an unseen domain can be generated by its semantic descriptor. Experiments across this range of problems demonstrate that our framework outperforms a variety of alternatives., 9 pages, Accepted to ICLR 2015 Conference Track
Published: 2014

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

50 results on '"Hospedales, Timothy M."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources