Author: "Hospedales, Timothy M." / Journal: ieee transactions on pattern analysis & machine intelligence - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hospedales, Timothy M."' showing total 8 results

Start Over Author "Hospedales, Timothy M." Journal ieee transactions on pattern analysis & machine intelligence

8 results on '"Hospedales, Timothy M."'

1. Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool.

Author: Liu, Feng, Xiang, Tao, Hospedales, Timothy M., Yang, Wankou, and Sun, Changyin
Subjects: REINFORCEMENT learning, QUESTION answering systems, ARTIFICIAL intelligence, IMAGE color analysis, INVERSE problems
Abstract: In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps ‘understand’ less than initially hoped, and instead master the easier task of exploiting cues given away in the question and biases in the answer distribution. In this paper we propose the inverse problem of VQA (iVQA). The iVQA task is to generate a question that corresponds to a given image and answer pair. We propose a variational iVQA model that can generate diverse, grammatically correct and content correlated questions that match the given answer. Based on this model, we show that iVQA is an interesting benchmark for visuo-linguistic understanding, and a more challenging alternative to VQA because an iVQA model needs to understand the image better to be successful. As a second contribution, we show how to use iVQA in a novel reinforcement learning framework to diagnose any existing VQA model by way of exposing its belief set: the set of question-answer pairs that the VQA model would predict true for a given image. This provides a completely new window into what VQA models ‘believe’ about images. We show that existing VQA models have more erroneous beliefs than previously thought, revealing their intrinsic weaknesses. Suggestions are then made on how to address these weaknesses going forward. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

2. Weakly-Supervised Image Annotation and Segmentation with Objects and Attributes.

Author: Shi, Zhiyuan, Yang, Yongxin, Hospedales, Timothy M., and Xiang, Tao
Subjects: IMAGE processing, PATTERN recognition systems, BAYESIAN analysis, ANNOTATIONS, SEMANTICS
Abstract: We propose to model complex visual scenes using a non-parametric Bayesian model learned from weakly labelled images abundant on media sharing sites such as Flickr. Given weak image-level annotations of objects and attributes without locations or associations between them, our model aims to learn the appearance of object and attribute classes as well as their association on each object instance. Once learned, given an image, our model can be deployed to tackle a number of vision problems in a joint and coherent manner, including recognising objects in the scene (automatic object annotation), describing objects using their attributes (attribute prediction and association), and localising and delineating the objects (object detection and semantic segmentation). This is achieved by developing a novel Weakly Supervised Markov Random Field Stacked Indian Buffet Process (WS-MRF-SIBP) that models objects and attributes as latent factors and explicitly captures their correlations within and across superpixels. Extensive experiments on benchmark datasets demonstrate that our weakly supervised model significantly outperforms weakly supervised alternatives and is often comparable with existing strongly supervised models on a variety of tasks including semantic segmentation, automatic image annotation and retrieval based on object-attribute associations. [ABSTRACT FROM PUBLISHER]
Published: 2017
Full Text: View/download PDF

3. Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels.

Author: Fu, Yanwei, Hospedales, Timothy M., Xiang, Tao, Xiong, Jiechao, Gong, Shaogang, Wang, Yizhou, and Yao, Yuan
Subjects: *IMAGE recognition (Computer vision), *CROWDSOURCING, *OUTLIERS (Statistics), *ROBUST statistics, *SPARSE approximations
Abstract: The problem of estimating subjective visual properties from image and video has attracted increasing interest. A subjective visual property is useful either on its own (e.g. image and video interestingness) or as an intermediate representation for visual recognition (e.g. a relative attribute). Due to its ambiguous nature, annotating the value of a subjective visual property for learning a prediction model is challenging. To make the annotation more reliable, recent studies employ crowdsourcing tools to collect pairwise comparison labels. However, using crowdsourced data also introduces outliers. Existing methods rely on majority voting to prune the annotation outliers/errors. They thus require a large amount of pairwise labels to be collected. More importantly as a local outlier detection method, majority voting is ineffective in identifying outliers that can cause global ranking inconsistencies. In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly. This differs from existing methods in that (1) the proposed method integrates local pairwise comparison labels together to minimise a cost that corresponds to global inconsistency of ranking order, and (2) the outlier detection and learning to rank problems are solved jointly. This not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations. [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

4. Transductive Multi-View Zero-Shot Learning.

Author: Fu, Yanwei, Hospedales, Timothy M., Xiang, Tao, and Gong, Shaogang
Subjects: *OBJECT recognition (Computer vision), *COMPUTER vision, *HYPERGRAPHS, *MACHINE learning, *COMPUTATIONAL learning theory
Abstract: Most existing zero-shot learning approaches exploit transfer learning via an intermediate semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding, to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

5. Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images.

Author: Shi, Zhiyuan, Hospedales, Timothy M., and Xiang, Tao
Subjects: *OBJECT recognition (Computer vision), *LOCALIZATION theory, *SUPERVISED learning, *IMAGE processing, *BAYESIAN analysis
Abstract: We address the problem of localisation of objects as bounding boxes in images and videos with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. In this paper, a novel framework based on Bayesian joint topic modelling is proposed, which differs significantly from the existing ones in that: (1) All foreground object classes are modelled jointly in a single generative model that encodes multiple object co-existence so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) Image backgrounds are shared across classes to better learn varying surroundings and “push out” objects of interest. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Moreover, the Bayesian formulation enables the exploitation of various types of prior knowledge to compensate for the limited supervision offered by weakly labelled data, as well as Bayesian domain adaptation for transfer learning. Extensive experiments on the PASCAL VOC, ImageNet and YouTube-Object videos datasets demonstrate the effectiveness of our Bayesian joint model for weakly supervised object localisation. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

6. Learning Multimodal Latent Attributes.

Author: Fu, Yanwei, Hospedales, Timothy M., Xiang, Tao, and Gong, Shaogang
Subjects: *COMPUTER multitasking, *SOCIAL media research, *OBJECT recognition (Computer vision), *SOCIAL groups, *LATENT functions (Social sciences), *PSYCHOLOGY
Abstract: The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via transferring attribute knowledge in object recognition and relatively simple action classification. In this paper, we address the task of attribute learning for understanding multimedia data with sparse and incomplete labels. In particular, we focus on videos of social group activities, which are particularly challenging and topical examples of this task because of their multimodal content and complex and unstructured nature relative to the density of annotations. To solve this problem, we 1) introduce a concept of semilatent attribute space, expressing user-defined and latent attributes in a unified framework, and 2) propose a novel scalable probabilistic topic model for learning multimodal semilatent attributes, which dramatically reduces requirements for an exhaustive accurate attribute ontology and expensive annotation effort. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multimedia sparse data learning tasks including: multitask learning, learning with label noise, N-shot transfer learning, and importantly zero-shot learning. [ABSTRACT FROM PUBLISHER]
Published: 2014
Full Text: View/download PDF

7. Structure Inference for Bayesian Multisensory Scene Understanding.

Author: Hospedales, Timothy M. and Vijayakumar, Sethu
Subjects: *SIGNAL processing, *SENSOR networks, *MULTISENSOR data fusion, *BAYESIAN analysis, *INTEGRATION (Theory of knowledge), *PSYCHOPHYSIOLOGY
Abstract: We investigate a solution to the problem of multisensor scene understanding by formulating it in the framework of Bayesian model selection and structure inference. Humans robustly associate multimodal data as appropriate, but previous modeling work has focused largely on optimal fusion, leaving segregation unaccounted for and unexploited by machine perception systems. We illustrate a unifying Bayesian solution to multisensory perception and tracking, which accounts for both integration and segregation by explicit probabilistic reasoning about data association in a temporal context. Such an explicit inference of multimodal data association is also of intrinsic interest for higher level understanding of multisensory data. We illustrate this by using a probabilistic implementation of data association in a multiparty audiovisual scenario, where unsupervised learning and structure inference is used to automatically segment, associate, and track individual subjects in audiovisual sequences. Indeed, the structure-inference-based framework introduced in this work provides the theoretical foundation needed to satisfactorily explain many confounding results in human psychophysics experiments involving multimodal cue integration and association. [ABSTRACT FROM AUTHOR]
Published: 2008
Full Text: View/download PDF

8. Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model.

Author: Hospedales, Timothy M., Li, Jian, Gong, Shaogang, and Xiang, Tao
Subjects: *SUPERVISED learning, *HIDDEN Markov models, *DATA modeling, *ELECTRONIC surveillance, *ALGORITHMS, *MACHINE learning
Abstract: One of the most interesting and desired capabilities for automated video behavior analysis is the identification of rarely occurring and subtle behaviors. This is of practical value because dangerous or illegal activities often have few or possibly only one prior example to learn from and are often subtle. Rare and subtle behavior learning is challenging for two reasons: 1) Contemporary modeling approaches require more data and supervision than may be available and 2) the most interesting and potentially critical rare behaviors are often visually subtle—occurring among more obvious typical behaviors or being defined by only small spatio-temporal deviations from typical behaviors. In this paper, we introduce a novel weakly supervised joint topic model which addresses these issues. Specifically, we introduce a multiclass topic model with partially shared latent structure and associated learning and inference algorithms. These contributions will permit modeling of behaviors from as few as one example, even without localization by the user and when occurring in clutter, and subsequent classification and localization of such behaviors online and in real time. We extensively validate our approach on two standard public-space data sets, where it clearly outperforms a batch of contemporary alternatives. [ABSTRACT FROM PUBLISHER]
Published: 2011
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

8 results on '"Hospedales, Timothy M."'

1. Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool.

2. Weakly-Supervised Image Annotation and Segmentation with Objects and Attributes.

3. Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels.

4. Transductive Multi-View Zero-Shot Learning.

5. Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images.

6. Learning Multimodal Latent Attributes.

7. Structure Inference for Bayesian Multisensory Scene Understanding.

8. Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

8 results on '"Hospedales, Timothy M."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources