28 results on '"Hospedales, Timothy M."'
Search Results
2. When and where to transfer for Bayesian network parameter learning
- Author
-
Zhou, Yun, Hospedales, Timothy M., and Fenton, Norman
- Published
- 2016
- Full Text
- View/download PDF
3. Free-hand sketch recognition by multi-kernel feature learning
- Author
-
Li, Yi, Hospedales, Timothy M., Song, Yi-Zhe, and Gong, Shaogang
- Published
- 2015
- Full Text
- View/download PDF
4. Sketch-a-Net: A Deep Neural Network that Beats Humans
- Author
-
Yu, Qian, Yang, Yongxin, Liu, Feng, Song, Yi-Zhe, Xiang, Tao, and Hospedales, Timothy M.
- Published
- 2017
- Full Text
- View/download PDF
5. Free-Hand Sketch Synthesis with Deformable Stroke Models
- Author
-
Li, Yi, Song, Yi-Zhe, Hospedales, Timothy M., and Gong, Shaogang
- Published
- 2017
- Full Text
- View/download PDF
6. Structure inference for Bayesian multisensory scene understanding
- Author
-
Hospedales, Timothy M. and Vijayakumar, Sethu
- Subjects
Image processing -- Analysis ,Bayesian statistical decision theory -- Models - Abstract
We investigate a solution to the problem of multisensor scene understanding by formulating it in the framework of Bayesian model selection and structure inference. Humans robustly associate multimodal data as appropriate, but previous modeling work has focused largely on optimal fusion, leaving segregation unaccounted for and unexploited by machine perception systems. We illustrate a unifying Bayesian solution to multisensory perception and tracking, which accounts for both integration and segregation by explicit probabilistic reasoning about data association in a temporal context. Such an explicit inference of multimodal data association is also of intrinsic interest for higher level understanding of multisensory data. We illustrate this by using a probabilistic implementation of data association in a multiparty audiovisual scenario, where unsupervised learning and structure inference is used to automatically segment, associate, and track individual subjects in audiovisual sequences. Indeed, the structure-inference-based framework introduced in this work provides the theoretical foundation needed to satisfactorily explain many confounding results in human psychophysics experiments involving multimodal cue integration and association. Index Terms--Sensor fusion, audiovisual, multimodal, detection, tracking, graphical models, model selection, Bayesian inference, speaker association.
- Published
- 2008
7. Self-Supervised Representation Learning: Introduction, advances, and challenges.
- Author
-
Ericsson, Linus, Gouk, Henry, Loy, Chen Change, and Hospedales, Timothy M.
- Abstract
Self-supervised representation learning (SSRL) methods aim to provide powerful, deep feature learning without the requirement of large annotated data sets, thus alleviating the annotation bottleneck—one of the main barriers to the practical deployment of deep learning today. These techniques have advanced rapidly in recent years, with their efficacy approaching and sometimes surpassing fully supervised pretraining alternatives across a variety of data modalities, including image, video, sound, text, and graphs. This article introduces this vibrant area, including key concepts, the four main families of approaches and associated state-of-the-art techniques, and how self-supervised methods are applied to diverse modalities of data. We further discuss practical considerations including workflows, representation transferability, and computational cost. Finally, we survey major open challenges in the field, that provide fertile ground for future work. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Visual Domain Adaptation in the Deep Learning Era.
- Author
-
Csurka, Gabriela, Hospedales, Timothy M., Salzmann, Mathieu, and Tommasi, Tatiana
- Published
- 2022
- Full Text
- View/download PDF
9. On Learning Semantic Representations for Large-Scale Abstract Sketches.
- Author
-
Xu, Peng, Huang, Yongye, Yuan, Tongtong, Xiang, Tao, Hospedales, Timothy M., Song, Yi-Zhe, and Wang, Liang
- Subjects
VIDEO games ,SPEECH perception ,BINARY codes ,FEATURE extraction ,TASK analysis - Abstract
In this paper, we focus on learning semantic representations for large-scale highly abstract sketches that were produced by the practical sketch-based application rather than the excessively well dawn sketches obtained by crowd-sourcing. We propose a dual-branch CNN-RNN network architecture to represent sketches, which simultaneously encodes both the static and temporal patterns of sketch strokes. Based on this architecture, we further explore learning the sketch-oriented semantic representations in two practical settings, i.e., hashing retrieval and zero-shot recognition on million-scale highly abstract sketches produced by practical online interactions. Specifically, we use our dual-branch architecture as a universal representation framework to design two sketch-specific deep models: (i) We propose a deep hashing model for sketch retrieval, where a novel hashing loss is specifically designed to further accommodate both the abstract and messy traits of sketches. (ii) We propose a deep embedding model for sketch zero-shot recognition, via collecting a large-scale edge-map dataset and proposing to extract a set of semantic vectors from edge-maps as the semantic knowledge for sketch zero-shot domain alignment. Both deep models are evaluated by comprehensive experiments on million-scale abstract sketches produced by a global online game QuickDraw and outperform state-of-the-art competitors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
10. Fine-Grained Instance-Level Sketch-Based Video Retrieval.
- Author
-
Xu, Peng, Liu, Kun, Xiang, Tao, Hospedales, Timothy M., Ma, Zhanyu, Guo, Jun, and Song, Yi-Zhe
- Subjects
IMAGE retrieval ,VIDEOS ,MOTION detectors ,STREAMING video & television - Abstract
Existing sketch-analysis work studies sketches depicting static objects or scenes. In this work, we propose a novel cross-modal retrieval problem of fine-grained instance-level sketch-based video retrieval (FG-SBVR), where a sketch sequence is used as a query to retrieve a specific target video instance. Compared with sketch-based still image retrieval, and coarse-grained category-level video retrieval, this is more challenging as both visual appearance and motion need to be simultaneously matched at a fine-grained level. We contribute the first FG-SBVR dataset with rich annotations. We then introduce a novel multi-stream multi-modality deep network to perform FG-SBVR under both strong and weakly supervised settings. The key component of the network is a relation module, designed to prevent model overfitting given scarce training data. We show that this model significantly outperforms a number of existing state-of-the-art models designed for video analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
11. Pixelor: a competitive sketching AI agent. so you think you can sketch?
- Author
-
Bhunia, Ayan Kumar, Das, Ayan, Muhammad, Umar Riaz, Yang, Yongxin, Hospedales, Timothy M., Xiang, Tao, Gryaditskaya, Yulia, and Song, Yi-Zhe
- Subjects
ARTIFICIAL intelligence ,DRAWING ,RECURRENT neural networks - Abstract
We present the first competitive drawing agent Pixelor that exhibits humanlevel performance at a Pictionary-like sketching game, where the participant whose sketch is recognized first is a winner. Our AI agent can autonomously sketch a given visual concept, and achieve a recognizable rendition as quickly or faster than a human competitor. The key to victory for the agent's goal is to learn the optimal stroke sequencing strategies that generate the most recognizable and distinguishable strokes first. Training Pixelor is done in two steps. First, we infer the stroke order that maximizes early recognizability of human training sketches. Second, this order is used to supervise the training of a sequence-to-sequence stroke generator. Our key technical contributions are a tractable search of the exponential space of orderings using neural sorting; and an improved Seq2Seq Wasserstein (S2S-WAE) generator that uses an optimal-transport loss to accommodate the multi-modal nature of the optimal stroke distribution. Our analysis shows that Pixelor is better than the human players of the Quick, Draw! game, under both AI and human judging of early recognition. To analyze the impact of human competitors' strategies, we conducted a further human study with participants being given unlimited thinking time and training in early recognizability by feedback from an AI judge. The study shows that humans do gradually improve their strategies with training, but overall Pixelor still matches human performance. The code and the dataset are available at http://sketchx.ai/pixelor. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
12. Sketch-a-Segmenter: Sketch-Based Photo Segmenter Generation.
- Author
-
Hu, Conghui, Li, Da, Yang, Yongxin, Hospedales, Timothy M., and Song, Yi-Zhe
- Subjects
IMAGE segmentation ,PHOTOGRAPHS - Abstract
Given pixel-level annotated data, traditional photo segmentation techniques have achieved promising results. However, these photo segmentation models can only identify objects in categories for which data annotation and training have been carried out. This limitation has inspired recent work on few-shot and zero-shot learning for image segmentation. In this article, we show the value of sketch for photo segmentation, in particular as a transferable representation to describe a concept to be segmented. We show, for the first time, that it is possible to generate a photo-segmentation model of a novel category using just a single sketch and furthermore exploit the unique fine-grained characteristics of sketch to produce more detailed segmentation. More specifically, we propose a sketch-based photo segmentation method that takes sketch as input and synthesizes the weights required for a neural network to segment the corresponding region of a given photo. Our framework can be applied at both the category-level and the instance-level, and fine-grained input sketches provide more accurate segmentation in the latter. This framework generalizes across categories via sketch and thus provides an alternative to zero-shot learning when segmenting a photo from a category without annotated training data. To investigate the instance-level relationship across sketch and photo, we create the SketchySeg dataset which contains segmentation annotations for photos corresponding to paired sketches in the Sketchy Dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
13. Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool.
- Author
-
Liu, Feng, Xiang, Tao, Hospedales, Timothy M., Yang, Wankou, and Sun, Changyin
- Subjects
REINFORCEMENT learning ,QUESTION answering systems ,ARTIFICIAL intelligence ,IMAGE color analysis ,INVERSE problems - Abstract
In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps ‘understand’ less than initially hoped, and instead master the easier task of exploiting cues given away in the question and biases in the answer distribution. In this paper we propose the inverse problem of VQA (iVQA). The iVQA task is to generate a question that corresponds to a given image and answer pair. We propose a variational iVQA model that can generate diverse, grammatically correct and content correlated questions that match the given answer. Based on this model, we show that iVQA is an interesting benchmark for visuo-linguistic understanding, and a more challenging alternative to VQA because an iVQA model needs to understand the image better to be successful. As a second contribution, we show how to use iVQA in a novel reinforcement learning framework to diagnose any existing VQA model by way of exposing its belief set: the set of question-answer pairs that the VQA model would predict true for a given image. This provides a completely new window into what VQA models ‘believe’ about images. We show that existing VQA models have more erroneous beliefs than previously thought, revealing their intrinsic weaknesses. Suggestions are then made on how to address these weaknesses going forward. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
14. Toward Deep Universal Sketch Perceptual Grouper.
- Author
-
Li, Ke, Pang, Kaiyue, Song, Yi-Zhe, Xiang, Tao, Hospedales, Timothy M., and Zhang, Honggang
- Subjects
GROUPERS ,DRAWING ,IMAGE retrieval ,TASK analysis ,IMAGE segmentation - Abstract
Human free-hand sketches provide the useful data for studying human perceptual grouping, where the grouping principles such as the Gestalt laws of grouping are naturally in play during both the perception and sketching stages. In this paper, we make the first attempt to develop a universal sketch perceptual grouper. That is, a grouper that can be applied to sketches of any category created with any drawing style and ability, to group constituent strokes/segments into semantically meaningful object parts. The first obstacle to achieving this goal is the lack of large-scale datasets with grouping annotation. To overcome this, we contribute the largest sketch perceptual grouping dataset to date, consisting of 20 000 unique sketches evenly distributed over 25 object categories. Furthermore, we propose a novel deep perceptual grouping model learned with both generative and discriminative losses. The generative loss improves the generalization ability of the model, while the discriminative loss guarantees both local and global grouping consistency. Extensive experiments demonstrate that the proposed grouper significantly outperforms the state-of-the-art competitors. In addition, we show that our grouper is useful for a number of sketch analysis tasks, including sketch semantic segmentation, synthesis, and fine-grained sketch-based image retrieval. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. Weakly-Supervised Image Annotation and Segmentation with Objects and Attributes.
- Author
-
Shi, Zhiyuan, Yang, Yongxin, Hospedales, Timothy M., and Xiang, Tao
- Subjects
IMAGE processing ,PATTERN recognition systems ,BAYESIAN analysis ,ANNOTATIONS ,SEMANTICS - Abstract
We propose to model complex visual scenes using a non-parametric Bayesian model learned from weakly labelled images abundant on media sharing sites such as Flickr. Given weak image-level annotations of objects and attributes without locations or associations between them, our model aims to learn the appearance of object and attribute classes as well as their association on each object instance. Once learned, given an image, our model can be deployed to tackle a number of vision problems in a joint and coherent manner, including recognising objects in the scene (automatic object annotation), describing objects using their attributes (attribute prediction and association), and localising and delineating the objects (object detection and semantic segmentation). This is achieved by developing a novel Weakly Supervised Markov Random Field Stacked Indian Buffet Process (WS-MRF-SIBP) that models objects and attributes as latent factors and explicitly captures their correlations within and across superpixels. Extensive experiments on benchmark datasets demonstrate that our weakly supervised model significantly outperforms weakly supervised alternatives and is often comparable with existing strongly supervised models on a variety of tasks including semantic segmentation, automatic image annotation and retrieval based on object-attribute associations. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
16. Discovery of Shared Semantic Spaces for Multiscene Video Query and Summarization.
- Author
-
Xun Xu, Hospedales, Timothy M., and Shaogang Gong
- Subjects
- *
VIDEO recording , *SEMANTICS , *VIDEO surveillance , *PIXELS , *AUTOMATION - Abstract
The growing rate of public space closed-circuit television (CCTV) installations has generated a need for automated methods for exploiting video surveillance data, including scene understanding, query, behavior annotation, and summarization. For this reason, extensive research has been performed on surveillance scene understanding and analysis. However, most studies have considered single scenes or groups of adjacent scenes. The semantic similarity between different but related scenes (e.g., many different traffic scenes of a similar layout) is not generally exploited to improve any automated surveillance tasks and reduce manual effort. Exploiting commonality and sharing any supervised annotations between different scenes is, however, challenging due to the following reason: some scenes are totally unrelated and thus any information sharing between them would be detrimental, whereas others may share only a subset of common activities and thus information sharing is only useful if it is selective. Moreover, semantically similar activities that should be modeled together and shared across scenes may have quite different pixel-level appearances in each scene. To address these issues, we develop a new framework for distributed multiple-scene global understanding that clusters surveillance scenes by their ability to explain each other's behaviors and further discovers which subset of activities are shared versus scene specific within each cluster. We show how to use this structured representation of multiple scenes to improve common surveillance tasks, including scene activity understanding, cross-scene query-by-example, behavior classification with reduced supervised labeling requirements, and video summarization. In each case, we demonstrate how our multiscene model improves on a collection of standard single-scene models and a flat model of all scenes. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
17. Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels.
- Author
-
Fu, Yanwei, Hospedales, Timothy M., Xiang, Tao, Xiong, Jiechao, Gong, Shaogang, Wang, Yizhou, and Yao, Yuan
- Subjects
- *
IMAGE recognition (Computer vision) , *CROWDSOURCING , *OUTLIERS (Statistics) , *ROBUST statistics , *SPARSE approximations - Abstract
The problem of estimating subjective visual properties from image and video has attracted increasing interest. A subjective visual property is useful either on its own (e.g. image and video interestingness) or as an intermediate representation for visual recognition (e.g. a relative attribute). Due to its ambiguous nature, annotating the value of a subjective visual property for learning a prediction model is challenging. To make the annotation more reliable, recent studies employ crowdsourcing tools to collect pairwise comparison labels. However, using crowdsourced data also introduces outliers. Existing methods rely on majority voting to prune the annotation outliers/errors. They thus require a large amount of pairwise labels to be collected. More importantly as a local outlier detection method, majority voting is ineffective in identifying outliers that can cause global ranking inconsistencies. In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly. This differs from existing methods in that (1) the proposed method integrates local pairwise comparison labels together to minimise a cost that corresponds to global inconsistency of ranking order, and (2) the outlier detection and learning to rank problems are solved jointly. This not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
18. Transductive Multi-View Zero-Shot Learning.
- Author
-
Fu, Yanwei, Hospedales, Timothy M., Xiang, Tao, and Gong, Shaogang
- Subjects
- *
OBJECT recognition (Computer vision) , *COMPUTER vision , *HYPERGRAPHS , *MACHINE learning , *COMPUTATIONAL learning theory - Abstract
Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding, to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
19. Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images.
- Author
-
Shi, Zhiyuan, Hospedales, Timothy M., and Xiang, Tao
- Subjects
- *
OBJECT recognition (Computer vision) , *LOCALIZATION theory , *SUPERVISED learning , *IMAGE processing , *BAYESIAN analysis - Abstract
We address the problem of localisation of objects as bounding boxes in images and videos with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. In this paper, a novel framework based on Bayesian joint topic modelling is proposed, which differs significantly from the existing ones in that: (1) All foreground object classes are modelled jointly in a single generative model that encodes multiple object co-existence so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) Image backgrounds are shared across classes to better learn varying surroundings and “push out” objects of interest. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Moreover, the Bayesian formulation enables the exploitation of various types of prior knowledge to compensate for the limited supervision offered by weakly labelled data, as well as Bayesian domain adaptation for transfer learning. Extensive experiments on the PASCAL VOC, ImageNet and YouTube-Object videos datasets demonstrate the effectiveness of our Bayesian joint model for weakly supervised object localisation. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
20. Learning Multimodal Latent Attributes.
- Author
-
Fu, Yanwei, Hospedales, Timothy M., Xiang, Tao, and Gong, Shaogang
- Subjects
- *
COMPUTER multitasking , *SOCIAL media research , *OBJECT recognition (Computer vision) , *SOCIAL groups , *LATENT functions (Social sciences) , *PSYCHOLOGY - Abstract
The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via transferring attribute knowledge in object recognition and relatively simple action classification. In this paper, we address the task of attribute learning for understanding multimedia data with sparse and incomplete labels. In particular, we focus on videos of social group activities, which are particularly challenging and topical examples of this task because of their multimodal content and complex and unstructured nature relative to the density of annotations. To solve this problem, we 1) introduce a concept of semilatent attribute space, expressing user-defined and latent attributes in a unified framework, and 2) propose a novel scalable probabilistic topic model for learning multimodal semilatent attributes, which dramatically reduces requirements for an exhaustive accurate attribute ontology and expensive annotation effort. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multimedia sparse data learning tasks including: multitask learning, learning with label noise, N-shot transfer learning, and importantly zero-shot learning. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
21. Finding Rare Classes: Active Learning with Generative and Discriminative Models.
- Author
-
Hospedales, Timothy M., Gong, Shaogang, and Xiang, Tao
- Subjects
- *
DATA mining , *MACHINE learning , *EDUCATIONAL technology , *PROGRAMMED instruction , *ACTIVE learning , *EXPERIENTIAL learning - Abstract
Discovering rare categories and classifying new instances of them are important data mining issues in many fields, but fully supervised learning of a rare class classifier is prohibitively costly in labeling effort. There has therefore been increasing interest both in active discovery: to identify new classes quickly, and active learning: to train classifiers with minimal supervision. These goals occur together in practice and are intrinsically related because examples of each class are required to train a classifier. Nevertheless, very few studies have tried to optimise them together, meaning that data mining for rare classes in new domains makes inefficient use of human supervision. Developing active learning algorithms to optimise both rare class discovery and classification simultaneously is challenging because discovery and classification have conflicting requirements in query criteria. In this paper, we address these issues with two contributions: a unified active learning model to jointly discover new categories and learn to classify them by adapting query criteria online; and a classifier combination algorithm that switches generative and discriminative classifiers as learning progresses. Extensive evaluation on a batch of standard UCI and vision data sets demonstrates the superiority of this approach over existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
22. Implications of Noise and Neural Heterogeneity for Vestibulo-Ocular Reflex Fidelity.
- Author
-
Hospedales, Timothy M., van Rossum, Mark C. W., Graham, Bruce P., and Mayank B. Dutia
- Subjects
- *
NOISE , *NEURONS , *CELLS , *ELECTROPHYSIOLOGY , *NEUROSCIENCES - Abstract
The vestibulo-ocular reflex (VOR) is characterized by a short-latency, high-fidelity eye movement response to head rotations at frequencies up to 20 Hz. Electrophysiological studies of medial vestibular nucleus (MVN) neurons, however, show that their response to sinusoidal currents above 10 to 12Hz is highly nonlinear and distorted by aliasing for all but very small current amplitudes. How can this system function in vivo when single cell response cannot explain its operation? Here we show that the necessary wide VOR frequency response may be achieved not by firing rate encoding of head velocity in single neurons, but in the integrated population response of asynchronously firing, intrinsically active neurons. Diffusive synaptic noise and the pacemaker-driven, intrinsic firing of MVN cells synergistically maintain asynchronous, spontaneous spiking in a population of model MVN neurons over a wide range of input signal amplitudes and frequencies. Response fidelity is further improved by a reciprocal inhibitory link between two MVN populations, mimicking the vestibular commissural system in vivo, but only if asynchrony is maintained by noise and pacemaker inputs. These results provide a previously missing explanation for the full range of VOR function and a novel account of the role of the intrinsic pacemaker conductances in MVN cells. The values of diffusive noise and pacemaker currents that give optimal response fidelity yield firing statistics similar to those in vivo, suggesting that the in vivo network is tuned to optimal performance. While theoretical studies have argued that noise and population heterogeneity can improve coding, to our knowledge this is the first evidence indicating that these parameters are indeed tuned to optimize coding fidelity in a neural control system in vivo. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
23. Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model.
- Author
-
Hospedales, Timothy M., Li, Jian, Gong, Shaogang, and Xiang, Tao
- Subjects
- *
SUPERVISED learning , *HIDDEN Markov models , *DATA modeling , *ELECTRONIC surveillance , *ALGORITHMS , *MACHINE learning - Abstract
One of the most interesting and desired capabilities for automated video behavior analysis is the identification of rarely occurring and subtle behaviors. This is of practical value because dangerous or illegal activities often have few or possibly only one prior example to learn from and are often subtle. Rare and subtle behavior learning is challenging for two reasons: 1) Contemporary modeling approaches require more data and supervision than may be available and 2) the most interesting and potentially critical rare behaviors are often visually subtle—occurring among more obvious typical behaviors or being defined by only small spatio-temporal deviations from typical behaviors. In this paper, we introduce a novel weakly supervised joint topic model which addresses these issues. Specifically, we introduce a multiclass topic model with partially shared latent structure and associated learning and inference algorithms. These contributions will permit modeling of behaviors from as few as one example, even without localization by the user and when occurring in clutter, and subsequent classification and localization of such behaviors online and in real time. We extensively validate our approach on two standard public-space data sets, where it clearly outperforms a batch of contemporary alternatives. [ABSTRACT FROM PUBLISHER]
- Published
- 2011
- Full Text
- View/download PDF
24. Deep Learning for Free-Hand Sketch: A Survey.
- Author
-
Xu P, Hospedales TM, Yin Q, Song YZ, Xiang T, and Wang L
- Abstract
Free-hand sketches are highly illustrative, and have been widely used by humans to depict objects or stories from ancient times to the present. The recent prevalence of touchscreen devices has made sketch creation a much easier task than ever and consequently made sketch-oriented applications increasingly popular. The progress of deep learning has immensely benefited free-hand sketch research and applications. This paper presents a comprehensive survey of the deep learning techniques oriented at free-hand sketch data, and the applications that they enable. The main contents of this survey include: (i) A discussion of the intrinsic traits and unique challenges of free-hand sketch, to highlight the essential differences between sketch data and other data modalities, e.g., natural photos. (ii) A review of the developments of free-hand sketch research in the deep learning era, by surveying existing datasets, research topics, and the state-of-the-art methods through a detailed taxonomy and experimental evaluation. (iii) Promotion of future work via a discussion of bottlenecks, open problems, and potential research directions for the community.
- Published
- 2023
- Full Text
- View/download PDF
25. Uncertainty-Aware Source-Free Domain Adaptive Semantic Segmentation.
- Author
-
Lu Z, Li D, Song YZ, Xiang T, and Hospedales TM
- Abstract
Source-Free Domain Adaptation (SFDA) is becoming topical to address the challenge of distribution shift between training and deployment data, while also relaxing the requirement of source data availability during target domain adaptation. In this paper, we focus on SFDA for semantic segmentation, in which pseudo labeling based target domain self-training is a common solution. However, pseudo labels generated by the source models are particularly unreliable on the target domain data due to the domain shift issue. Therefore, we propose to use Bayesian Neural Network (BNN) to improve the target self-training by better estimating and exploiting pseudo-label uncertainty. With the uncertainty estimation of BNNs, we introduce two novel self-training based components: Uncertainty-aware Online Teacher-Student Learning (UOTSL) and Uncertainty-aware FeatureMix (UFM). Extensive experiments on two popular benchmarks, GTA 5 → Cityscapes and SYNTHIA → Cityscapes, show the superiority of our proposed method with mIoU gains of 3.6% and 5.7% over the state-of-the-art respectively.
- Published
- 2023
- Full Text
- View/download PDF
26. Toward Fine-Grained Sketch-Based 3D Shape Retrieval.
- Author
-
Qi A, Gryaditskaya Y, Song J, Yang Y, Qi Y, Hospedales TM, Xiang T, and Song YZ
- Abstract
In this paper we study, for the first time, the problem of fine-grained sketch-based 3D shape retrieval. We advocate the use of sketches as a fine-grained input modality to retrieve 3D shapes at instance-level - e.g., given a sketch of a chair, we set out to retrieve a specific chair from a gallery of all chairs. Fine-grained sketch-based 3D shape retrieval (FG-SBSR) has not been possible till now due to a lack of datasets that exhibit one-to-one sketch-3D correspondences. The first key contribution of this paper is two new datasets, consisting a total of 4,680 sketch-3D pairings from two object categories. Even with the datasets, FG-SBSR is still highly challenging because (i) the inherent domain gap between 2D sketch and 3D shape is large, and (ii) retrieval needs to be conducted at the instance level instead of the coarse category level matching as in traditional SBSR. Thus, the second contribution of the paper is the first cross-modal deep embedding model for FG-SBSR, which specifically tackles the unique challenges presented by this new problem. Core to the deep embedding model is a novel cross-modal view attention module which automatically computes the optimal combination of 2D projections of a 3D shape given a query sketch.
- Published
- 2021
- Full Text
- View/download PDF
27. Frankenstein: Learning Deep Face Representations Using Small Data.
- Author
-
Guosheng Hu, Xiaojiang Peng, Yongxin Yang, Hospedales TM, and Verbeek J
- Abstract
Deep convolutional neural networks have recently proven extremely effective for difficult face recognition problems in uncontrolled settings. To train such networks, very large training sets are needed with millions of labeled images. For some applications, such as near-infrared (NIR) face recognition, such large training data sets are not publicly available and difficult to collect. In this paper, we propose a method to generate very large training data sets of synthetic images by compositing real face images in a given data set. We show that this method enables to learn models from as few as 10 000 training images, which perform on par with models trained from 500 000 images. Using our approach, we also obtain state-of-the-art results on the CASIA NIR-VIS2.0 heterogeneous face recognition data set.
- Published
- 2018
- Full Text
- View/download PDF
28. Synergistic Instance-Level Subspace Alignment for Fine-Grained Sketch-Based Image Retrieval.
- Author
-
Ke Li, Kaiyue Pang, Yi-Zhe Song, Hospedales TM, Tao Xiang, and Honggang Zhang
- Abstract
We study the problem of fine-grained sketch-based image retrieval. By performing instance-level (rather than category-level) retrieval, it embodies a timely and practical application, particularly with the ubiquitous availability of touchscreens. Three factors contribute to the challenging nature of the problem: 1) free-hand sketches are inherently abstract and iconic, making visual comparisons with photos difficult; 2) sketches and photos are in two different visual domains, i.e., black and white lines versus color pixels; and 3) fine-grained distinctions are especially challenging when executed across domain and abstraction-level. To address these challenges, we propose to bridge the image-sketch gap both at the high level via parts and attributes, as well as at the low level via introducing a new domain alignment method. More specifically, first, we contribute a data set with 304 photos and 912 sketches, where each sketch and image is annotated with its semantic parts and associated part-level attributes. With the help of this data set, second, we investigate how strongly supervised deformable part-based models can be learned that subsequently enable automatic detection of part-level attributes, and provide pose-aligned sketch-image comparisons. To reduce the sketch-image gap when comparing low-level features, third, we also propose a novel method for instance-level domain-alignment that exploits both subspace and instance-level cues to better align the domains. Finally, fourth, these are combined in a matching framework integrating aligned low-level features, mid-level geometric structure, and high-level semantic attributes. Extensive experiments conducted on our new data set demonstrate effectiveness of the proposed method.
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.