38 results on '"Philip Torr"'
Search Results
2. Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation.
- Author
-
Hang Li 0010, Chengzhi Shen, Philip Torr 0001, Volker Tresp, and Jindong Gu
- Published
- 2024
- Full Text
- View/download PDF
3. RoDLA: Benchmarking the Robustness of Document Layout Analysis Models.
- Author
-
Yufan Chen, Jiaming Zhang 0001, Kunyu Peng, Junwei Zheng, Ruiping Liu, Philip Torr 0001, and Rainer Stiefelhagen
- Published
- 2024
- Full Text
- View/download PDF
4. FedMedICL: Towards Holistic Evaluation of Distribution Shifts in Federated Medical Imaging.
- Author
-
Kumail Alhamoud, Yasir Ghunaim, Motasem Alfarra, Thomas Hartvigsen, Philip Torr 0001, Bernard Ghanem, Adel Bibi, and Marzyeh Ghassemi
- Published
- 2024
- Full Text
- View/download PDF
5. HelloFresh: LLM Evalutions on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits.
- Author
-
Tim Franzmeyer, Aleksandar Shtedritski, Samuel Albanie, Philip Torr 0001, João F. Henriques, and Jakob N. Foerster
- Published
- 2024
- Full Text
- View/download PDF
6. Prompting a Pretrained Transformer Can Be a Universal Approximator.
- Author
-
Aleksandar Petrov, Philip Torr 0001, and Adel Bibi
- Published
- 2024
7. Extracting Training Data From Document-Based VQA Models.
- Author
-
Francesco Pinto, Nathalie Rauschmayr, Florian Tramèr, Philip Torr 0001, and Federico Tombari
- Published
- 2024
8. Efficient Error Certification for Physics-Informed Neural Networks.
- Author
-
Francisco Eiras, Adel Bibi, Rudy Bunel, Krishnamurthy Dj Dvijotham, Philip Torr 0001, and M. Pawan Kumar
- Published
- 2024
9. Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation.
- Author
-
Yibo Yang, Xiaojie Li, Motasem Alfarra, Hasan Abed Al Kader Hammoud, Adel Bibi, Philip Torr 0001, and Bernard Ghanem
- Published
- 2024
10. Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators.
- Author
-
Jianhao Yuan, Francesco Pinto, Adam Davies, and Philip Torr 0001
- Published
- 2024
11. Porf: Pose residual field for accurate Neural surface Reconstruction.
- Author
-
Jia-Wang Bian, Wenjing Bian, Victor Adrian Prisacariu, and Philip Torr 0001
- Published
- 2024
12. When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations.
- Author
-
Aleksandar Petrov, Philip Torr 0001, and Adel Bibi
- Published
- 2024
13. Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images.
- Author
-
Kuofeng Gao, Yang Bai, Jindong Gu, Shu-Tao Xia, Philip Torr 0001, Zhifeng Li 0001, and Wei Liu 0005
- Published
- 2024
14. Select to Perfect: Imitating desired behavior from large multi-agent data.
- Author
-
Tim Franzmeyer, Edith Elkind, Philip Torr 0001, Jakob Nicolaus Foerster, and João F. Henriques
- Published
- 2024
15. Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation.
- Author
-
Wenxuan Zhang, Youssef Mohamed, Bernard Ghanem, Philip Torr 0001, Adel Bibi, and Mohamed Elhoseiny
- Published
- 2024
16. Real-Fake: Effective Training Data Synthesis Through Distribution Matching.
- Author
-
Jianhao Yuan, Jie Zhang, Shuyang Sun, Philip Torr 0001, and Bo Zhao
- Published
- 2024
17. Influencer Backdoor Attack on Semantic Segmentation.
- Author
-
Haoheng Lan, Jindong Gu, Philip Torr 0001, and Hengshuang Zhao
- Published
- 2024
18. Illusory Attacks: Information-theoretic detectability matters in adversarial attacks.
- Author
-
Tim Franzmeyer, Stephen Marcus McAleer, João F. Henriques, Jakob Nicolaus Foerster, Philip Torr 0001, Adel Bibi, and Christian Schröder de Witt
- Published
- 2024
19. An Image Is Worth 1000 Lies: Transferability of Adversarial Images across Prompts on Vision-Language Models.
- Author
-
Haochen Luo, Jindong Gu, Fengyuan Liu, and Philip Torr 0001
- Published
- 2024
20. Linear complexity self-attention with 3rd order polynomials
- Author
-
Francesca Babiloni, Ioannis Marras, Jiankang Deng, Filippos Kokkinos, Matteo Maggioni, Grigorios Chrysos, Philip Torr, and Stefanos Zafeiriou
- Subjects
Computational Theory and Mathematics ,Artificial Intelligence ,Applied Mathematics ,Computer Vision and Pattern Recognition ,Software - Abstract
Self-attention mechanisms and non-local blocks have become crucial building blocks for state-of-the-art neural architectures thanks to their unparalleled ability in capturing long-range dependencies in the input. However their cost is quadratic with the number of spatial positions hence making their use impractical in many real case applications. In this work, we analyze these methods through a polynomial lens, and we show that self-attention can be seen as a special case of a 3 rd order polynomial. Within this polynomial framework, we are able to design polynomial operators capable of accessing the same data pattern of non-local and self-attention blocks while reducing the complexity from quadratic to linear. As a result, we propose two modules (Poly-NL and Poly-SA) that can be used as “drop-in” replacements for more-complex non-local and self-attention layers in state-of-the-art CNNs and ViT architectures. Our modules can achieve comparable, if not better, performance across a wide range of computer vision tasks while keeping a complexity equivalent to a standard linear layer.
- Published
- 2023
21. Open World Entity Segmentation
- Author
-
Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Philip Torr, Zhe Lin, and Jiaya Jia
- Subjects
Computational Theory and Mathematics ,Artificial Intelligence ,Applied Mathematics ,Computer Vision and Pattern Recognition ,Software - Abstract
We introduce a new image segmentation task, called Entity Segmentation (ES), which aims to segment all visual entities (objects and stuffs) in an image without predicting their semantic labels. By removing the need of class label prediction, the models trained for such task can focus more on improving segmentation quality. It has many practical applications such as image manipulation and editing where the quality of segmentation masks is crucial but class labels are less important. We conduct the first-ever study to investigate the feasibility of convolutional center-based representation to segment things and stuffs in a unified manner, and show that such representation fits exceptionally well in the context of ES. More specifically, we propose a CondInst-like fully-convolutional architecture with two novel modules specifically designed to exploit the class-agnostic and non-overlapping requirements of ES. Experiments show that the models designed and trained for ES significantly outperforms popular class-specific panoptic segmentation models in terms of segmentation quality. Moreover, an ES model can be easily trained on a combination of multiple datasets without the need to resolve label conflicts in dataset merging, and the model trained for ES on one or more datasets can generalize very well to other test datasets of unseen domains. The code has been released at https://github.com/dvlab-research/Entity
- Published
- 2022
22. Deeply Explain CNN via Hierarchical Decomposition
- Author
-
Ming-Ming Cheng, Peng-Tao Jiang, Ling-Hao Han, Liang Wang, and Philip Torr
- Subjects
FOS: Computer and information sciences ,Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Computer Vision and Pattern Recognition ,Software - Abstract
In computer vision, some attribution methods for explaining CNNs attempt to study how the intermediate features affect network prediction. However, they usually ignore the feature hierarchies among the intermediate features. This paper introduces a hierarchical decomposition framework to explain CNN’s decision-making process in a top-down manner. Specifically, we propose a gradient-based activation propagation (gAP) module that can decompose any intermediate CNN decision to its lower layers and find the supporting features. Then we utilize the gAP module to iteratively decompose the network decision to the supporting evidence from different CNN layers. The proposed framework can generate a deep hierarchy of strongly associated supporting evidence for the network decision, which provides insight into the decision-making process. Moreover, gAP is effort-free for understanding CNN-based models without network architecture modification and extra training processes. Experiments show the effectiveness of the proposed method. The data and source code will be publicly available at https://mmcheng.net/hdecomp/.
- Published
- 2022
23. Vision transformer with progressive sampling
- Author
-
Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, and Dahua Lin
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Transformers with powerful global relation modeling abilities have been introduced to fundamental computer vision tasks recently. As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens. However, such naive tokenization could destruct object structures, assign grids to uninterested regions such as background, and introduce interference signals. To mitigate the above issues, in this paper, we propose an iterative and progressive sampling strategy to locate discriminative regions. At each iteration, embeddings of the current sampling step are fed into a transformer encoder layer, and a group of sampling offsets is predicted to update the sampling locations for the next step. The progressive sampling is differentiable. When combined with the Vision Transformer, the obtained PS-ViT network can adaptively learn where to look. The proposed PS-ViT is both effective and efficient. When trained from scratch on ImageNet, PS-ViT performs 3.8% higher than the vanilla ViT in terms of top-1 accuracy with about $4\times$ fewer parameters and $10\times$ fewer FLOPs. Code is available at https://github.com/yuexy/PS-ViT., Comment: Accepted to ICCV 2021
- Published
- 2022
24. Collaborative Quantization Embeddings for Intra-subject Prostate MR Image Registration
- Author
-
Ziyi Shen, Qianye Yang, Yuming Shen, Francesco Giganti, Vasilis Stavrinides, Richard Fan, Caroline Moore, Mirabela Rusu, Geoffrey Sonn, Philip Torr, Dean Barratt, and Yipeng Hu
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Image and Video Processing (eess.IV) ,Computer Science - Computer Vision and Pattern Recognition ,FOS: Electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Image registration is useful for quantifying morphological changes in longitudinal MR images from prostate cancer patients. This paper describes a development in improving the learning-based registration algorithms, for this challenging clinical application often with highly variable yet limited training data. First, we report that the latent space can be clustered into a much lower dimensional space than that commonly found as bottleneck features at the deep layer of a trained registration network. Based on this observation, we propose a hierarchical quantization method, discretizing the learned feature vectors using a jointly-trained dictionary with a constrained size, in order to improve the generalisation of the registration networks. Furthermore, a novel collaborative dictionary is independently optimised to incorporate additional prior information, such as the segmentation of the gland or other regions of interest, in the latent quantized space. Based on 216 real clinical images from 86 prostate cancer patients, we show the efficacy of both the designed components. Improved registration accuracy was obtained with statistical significance, in terms of both Dice on gland and target registration error on corresponding landmarks, the latter of which achieved 5.46 mm, an improvement of 28.7\% from the baseline without quantization. Experimental results also show that the difference in performance was indeed minimised between training and testing data., Comment: preprint version, accepted for MICCAI 2022 (25th International Conference on Medical Image Computing and Computer Assisted Intervention)
- Published
- 2022
25. PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
- Author
-
Zitong Yu, Yuming Shen, Jingang Shi, Hengshuang Zhao, Philip Torr, and Guoying Zhao
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Remote photoplethysmography (rPPG), which aims at measuring heart activities and physiological signals from facial video without any contact, has great potential in many applications (e.g., remote healthcare and affective computing). Recent deep learning approaches focus on mining subtle rPPG clues using convolutional neural networks with limited spatio-temporal receptive fields, which neglect the long-range spatio-temporal perception and interaction for rPPG modeling. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture, to adaptively aggregate both local and global spatio-temporal features for rPPG representation enhancement. As key modules in PhysFormer, the temporal difference transformers first enhance the quasi-periodic rPPG features with temporal difference guided global attention, and then refine the local spatio-temporal representation against interference. Furthermore, we also propose the label distribution learning and a curriculum learning inspired dynamic constraint in frequency domain, which provide elaborate supervisions for PhysFormer and alleviate overfitting. Comprehensive experiments are performed on four benchmark datasets to show our superior performance on both intra- and cross-dataset testings. One highlight is that, unlike most transformer networks needed pretraining from large-scale datasets, the proposed PhysFormer can be easily trained from scratch on rPPG datasets, which makes it promising as a novel transformer baseline for the rPPG community. The codes will be released at https://github.com/ZitongYu/PhysFormer., Accepted by CVPR2022
- Published
- 2021
26. Large-scale Unsupervised Semantic Segmentation
- Author
-
Shanghua Gao, Zhong-Yu Li, Ming-Hsuan Yang, Ming-Ming Cheng, Junwei Han, and Philip Torr
- Subjects
FOS: Computer and information sciences ,Computational Theory and Mathematics ,Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Applied Mathematics ,Computer Science - Computer Vision and Pattern Recognition ,Computer Vision and Pattern Recognition ,Software - Abstract
Empowered by large datasets, e.g., ImageNet, unsupervised learning on large-scale data has enabled significant advances for classification tasks. However, whether the large-scale unsupervised semantic segmentation can be achieved remains unknown. There are two major challenges: i) we need a large-scale benchmark for assessing algorithms; ii) we need to develop methods to simultaneously learn category and shape representation in an unsupervised manner. In this work, we propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to help the research progress. Building on the ImageNet dataset, we propose the ImageNet-S dataset with 1.2 million training images and 50k high-quality semantic segmentation annotations for evaluation. Our benchmark has a high data diversity and a clear task objective. We also present a simple yet effective method that works surprisingly well for LUSS. In addition, we benchmark related un/weakly/fully supervised methods accordingly, identifying the challenges and possible directions of LUSS. The benchmark and source code is publicly available at https://github.com/LUSSeg., Benchmark and Source Code: https://github.com/LUSSeg
- Published
- 2021
27. TransMix: Attend to Mix for Vision Transformers
- Author
-
Jie-Neng Chen, Shuyang Sun, Ju He, Philip Torr, Alan Yuille, and Song Bai
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Mixup-based augmentation has been found to be effective for generalizing models during training, especially for Vision Transformers (ViTs) since they can easily overfit. However, previous mixup-based methods have an underlying prior knowledge that the linearly interpolated ratio of targets should be kept the same as the ratio proposed in input interpolation. This may lead to a strange phenomenon that sometimes there is no valid object in the mixed image due to the random process in augmentation but there is still response in the label space. To bridge such gap between the input and label spaces, we propose TransMix, which mixes labels based on the attention maps of Vision Transformers. The confidence of the label will be larger if the corresponding input image is weighted higher by the attention map. TransMix is embarrassingly simple and can be implemented in just a few lines of code without introducing any extra parameters and FLOPs to ViT-based models. Experimental results show that our method can consistently improve various ViT-based models at scales on ImageNet classification. After pre-trained with TransMix on ImageNet, the ViT-based models also demonstrate better transferability to semantic segmentation, object detection and instance segmentation. TransMix also exhibits to be more robust when evaluating on 4 different benchmarks. Code will be made publicly available at https://github.com/Beckschen/TransMix., Comment: Code will be made publicly available at https://github.com/Beckschen/TransMix
- Published
- 2021
- Full Text
- View/download PDF
28. Diagnosing and Preventing Instabilities in Recurrent Video Processing
- Author
-
Thomas Tanay, Aivar Sootla, Matteo Maggioni, Puneet K. Dokania, Philip Torr, Ales Leonardis, and Gregory Slabaugh
- Subjects
FOS: Computer and information sciences ,Computational Theory and Mathematics ,Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Applied Mathematics ,Computer Science - Computer Vision and Pattern Recognition ,Computer Vision and Pattern Recognition ,Software - Abstract
Recurrent models are a popular choice for video enhancement tasks such as video denoising or super-resolution. In this work, we focus on their stability as dynamical systems and show that they tend to fail catastrophically at inference time on long video sequences. To address this issue, we (1) introduce a diagnostic tool which produces input sequences optimized to trigger instabilities and that can be interpreted as visualizations of temporal receptive fields, and (2) propose two approaches to enforce the stability of a model during training: constraining the spectral norm or constraining the stable rank of its convolutional layers. We then introduce Stable Rank Normalization for Convolutional layers (SRN-C), a new algorithm that enforces these constraints. Our experimental results suggest that SRN-C successfully enforces stablility in recurrent video processing models without a significant performance loss.
- Published
- 2020
29. Computer Vision - ECCV 2008 : 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part IV
- Author
-
David Forsyth, Philip Torr, Andrew Zisserman, David Forsyth, Philip Torr, and Andrew Zisserman
- Subjects
- Data mining, Computer vision, Image processing—Digital techniques, Computer graphics, Pattern recognition systems, Digital humanities
- Abstract
Welcome to the 2008EuropeanConference onComputer Vision. These proce- ings are the result of a great deal of hard work by many people. To produce them, a total of 871 papers were reviewed. Forty were selected for oral pres- tation and 203 were selected for poster presentation, yielding acceptance rates of 4.6% for oral, 23.3% for poster, and 27.9% in total. Weappliedthreeprinciples.First,sincewehadastronggroupofAreaChairs, the?nal decisions to accept or reject a paper rested with the Area Chair, who wouldbeinformedbyreviewsandcouldactonlyinconsensuswithanotherArea Chair. Second, we felt that authors were entitled to a summary that explained how the Area Chair reached a decision for a paper. Third, we were very careful to avoid con?icts of interest. Each paper was assigned to an Area Chair by the Program Chairs, and each Area Chair received a pool of about 25 papers. The Area Chairs then identi?ed and rankedappropriatereviewersfor eachpaper in their pool, and a constrained optimizationallocated three reviewers to each paper. We are very proud that every paper received at least three reviews. At this point, authors were able to respond to reviews. The Area Chairs then needed to reach a decision. We used a series of procedures to ensure careful review and to avoid con?icts of interest. ProgramChairs did not submit papers. The Area Chairs were divided into three groups so that no Area Chair in the group was in con?ict with any paper assigned to any Area Chair in the group.
- Published
- 2008
30. Efecto de tres densidades de Tetranychus urticae (Acari: Tetranychidae) sobre el crecimiento de plantas de rosa
- Author
-
Fabián Rojas Sánchez, Philip Torres Ramírez, Daniel Rodríguez Caicedo, and Fernando Cantor Rincón
- Subjects
Fenología ,producción ,densidades ,tetranychus urticae ,plantas de rosa. ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Ecology ,QH540-549.5 - Abstract
El ácaro fitófago Tetranychus urticae es una de las principales plagas en los cultivos de rosa. Este ácaro genera pérdida en la calidad de la flor, disminución en el crecimiento de las plantas y reducción en la producción de botones. Sin embargo, en Colombia no se encuentra reportado el efecto de densidades del fitófago sobre parámetros de crecimiento de plantas de rosa bajo invernadero. Por lo anterior, en el presente artículo se resume la realización de un experimento cuyo diseño fue completamente al azar, con tres densidades de T. Urticae; cero (0), cinco (5) y quince (15) ácaros/hoja de rosa y diez repeticiones por cada uno. Cada 15 días, durante seis meses, se registraron las variables de peso seco de hojas y tallos, así como también, la altura del tallo, el número de hojas y el número de individuos de cada estadío de desarrollo de T. urticae/planta, en cada uno de los tratamientos. Se encontró que al tener una densidad inicial de 15 ácaros/hoja se genera un incremento de biomasa, haciendo que la energía de la planta sea invertida en producción de hojas y no en producción de flores que es el fin de este cultivo.
- Published
- 2011
- Full Text
- View/download PDF
31. Eficiencia de Trichoderma harzianum y preparados microbiales sobre patógenos en cultivos
- Author
-
Philip Torres R. and Omar Guerrero G.
- Subjects
Parasitismo ,antibiosis ,trichoderma harzianum ,R. solani y S ,sclerotiorum. ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Ecology ,QH540-549.5 - Abstract
Con el objeto de evaluar el efecto de Trichoderma harzianum, E.M. (Microorganismos Eficaces) y de AGROPLUX sobre el desarrollo de hongos Rhizoctonia solani y Sclerotinia sclerotiorum, fue realizada la presente investigación en condiciones de laboratorio y se determinaron los efectos sobre el crecimiento micelial de los patógenos en cajas de Potril, con el PDA. Los resultados indican que T. harzianum presentó una acción de parasitismo con R. solani y S.sclerotiorum; en cambio con E.M. y AGROPLUX se observó la presencia de antibiosis representada en un halo alrededor del anillo de vidrio que contenía los productos mencionados, los cuales impidieron el crecimiento de los patógenos.
- Published
- 2008
- Full Text
- View/download PDF
32. An Analysis of Convex Relaxations for MAP Estimation of Discrete MRFs.
- Author
-
Kumar, M. Pawan, Kolmogorov, Vladimir, and Philip Torr, H. S.
- Subjects
- *
MARKOV random fields , *APPROXIMATION theory , *ALGORITHMS , *LINEAR programming , *QUADRATIC programming - Abstract
The problem of obtaining the maximum a posteriori estimate of a general discrete Markov random field (i.e., a Markov random field defined using a discrete set of labels) is known to be NP-hard. However, due to its central importance in many applications, several approximation algorithms have been proposed in the literature. In this paper, we present an analysis of three such algorithms based on convex relaxations: (i) LP- S: the linear programming (LP) relaxation proposed by Schlesinger (1976) for a special case and independently in Chekuri et al. (2001), Koster et al. (1998), and Wainwright et al. (2005) for the general case; (ii) QP-RL: the quadratic programming (QP) relaxation of Ravikumar and Lafferty (2006); and (iii) SOCP-MS: the second order cone programming (SOCP) relaxation first proposed by Muramatsu and Suzuki (2003) for two label problems and later extended by Kumar et al. (2006) for a general label set. We show that the SOCP-MS and the QP-RL relaxations are equivalent. Furthermore, we prove that despite the flexibility in the form of the constraints/objective function offered by QP and SOCP, the LP-S relaxation strictly dominates (i.e., provides a better approximation than) QP-RLand SOCP-MS. We generalize these results by defining a large class of SOCP (and equivalent QP) relaxations which is dominated by the LP- S relaxation. Based on these results we propose some novel SOCP relaxations which define constraints using random variables that form cycles or cliques in the graphical model representation of the random field. Using some examples we show that the new SOCP relaxations strictly dominate the previous approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2009
33. Temporal Surface Tracking Using Mesh Evolution
- Author
-
Radu Horaud, Kiran Varanasi, Andrei Zaharescu, Edmond Boyer, Interpretation and Modelling of Images and Videos (PERCEPTION), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), and David Forsyth and Philip Torr and Andre Zisserman
- Subjects
Surface (mathematics) ,Matching (graph theory) ,Geodesic ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,02 engineering and technology ,T-vertices ,Displacement (vector) ,[INFO.INFO-GR]Computer Science [cs]/Graphics [cs.GR] ,Visual hull ,Vertex (geometry) ,Displacement field ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Polygon mesh ,Artificial intelligence ,Laplacian smoothing ,business ,Laplace operator ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
International audience; In this paper, we address the problem of surface tracking in multiple camera environments and over time sequences. In order to fully track a surface undergoing significant deformations, we cast the problem as a mesh evolution over time. Such an evolution is driven by 3D displacement fields estimated between meshes recovered independently at different time frames. Geometric and photometric information is used to identify a robust set of matching vertices. This provides a sparse displacement field that is densified over the mesh by Laplacian diffusion. In contrast to existing approaches that evolve meshes, we do not assume a known model or a fixed topology. The contribution is a novel mesh evolution based framework that allows to fully track, over long sequences, an unknown surface encountering deformations, including topological changes. Results on very challenging and publicly available image based 3D mesh sequences demonstrate the ability of our framework to efficiently recover surface motions .
- Published
- 2008
34. Hamming embedding and weak geometric consistency for large scale image search
- Author
-
Hervé Jégou, Cordelia Schmid, Matthijs Douze, Learning and recognition in vision (LEAR), Laboratoire d'informatique GRAphique, VIsion et Robotique de Grenoble (GRAVIR - IMAG), Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), ANR RAFFUT, ANR GAIA, GRAVIT, David Forsyth and Philip Torr and Andrew Zisserman, ANR-07-RIAM-0007,RAFFUT,Repérage Automatique de Fichiers Frauduleux sur sites UGC (User Generated Content)(2007), ANR-07-BLAN-0328,GAIA,Géométrie Algorithmique Informationnelle et Applications(2007), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), and Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Scale (ratio) ,Matching (graph theory) ,business.industry ,Nearest neighbor search ,Geometric transformation ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Binary number ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Inverted index ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Visual Word ,Artificial intelligence ,Representation (mathematics) ,business ,Mathematics - Abstract
International audience; This paper improves recent methods for large scale image search. State-of-the-art methods build on the bag-of-features image representation. We, first, analyze bag-of-features in the framework of approximate nearest neighbor search. This shows the sub-optimality of such a representation for matching descriptors and leads us to derive a more precise representation based on 1) Hamming embedding (HE) and 2) weak geometric consistency constraints (WGC). HE provides binary signatures that refine the matching based on visual words. WGC filters matching descriptors that are not consistent in terms of angle and scale. HE and WGC are integrated within the inverted file and are efficiently exploited for all images, even in the case of very large datasets. Experiments performed on a dataset of one million of images show a significant improvement due to the binary signature and the weak geometric consistency constraints, as well as their efficiency. Estimation of the full geometric transformation, i.e., a re-ranking step on a short list of images, is complementary to our weak geometric consistency constraints and allows to further improve the accuracy.
- Published
- 2008
35. General Imaging Geometry for Central Catadioptric Cameras
- Author
-
Joao P. Barreto, Peter Sturm, Interpretation and Modelling of Images and Videos (PERCEPTION), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Institute of Systems and Robotics (ISR-UC), Universidade de Coimbra [Coimbra], and David Forsyth and Philip Torr and Andrew Zisserman
- Subjects
business.industry ,Epipolar geometry ,Orthographic projection ,Bilinear interpolation ,[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] ,020207 software engineering ,Geometry ,02 engineering and technology ,Catadioptric system ,0202 electrical engineering, electronic engineering, information engineering ,Perspective camera ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Special case ,Omnidirectional antenna ,Fundamental matrix (computer vision) ,business ,Mathematics - Abstract
International audience; Catadioptric cameras are a popular type of omnidirectional imaging system. Their imaging and multi-view geometry has been exten- sively studied; epipolar geometry for instance, is geometrically speaking, well understood. However, the existence of a bilinear matching constraint and an associated fundamental matrix, has so far only been shown for the special case of para-catadioptric cameras (consisting of a paraboloidal mirror and an orthographic camera). The main goal of this work is to obtain such results for all central catadioptric cameras. Our main result is to show the existence of a general 15x15 fundamental matrix. This is based on and completed by a number of other results, e.g. the formulation of general catadioptric projection matrices and plane homographies.
- Published
- 2008
36. Improving People Search Using Query Expansions: How Friends Help To Find People
- Author
-
Thomas Mensink, Jakob Verbeek, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), David Forsyth and Philip Torr and Andre Zisserman, and ANR-07-MDCO-0010,R2I,Recherche Interactive d'Images(2007)
- Subjects
Information retrieval ,Result set ,Computer science ,Process (computing) ,020207 software engineering ,02 engineering and technology ,Mixture model ,Query expansion ,Discriminative model ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Face (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Set (psychology) - Abstract
International audience; In this paper we are interested in finding images of people on the web, and more specifically within large databases of captioned news images. It has recently been shown that visual analysis of the faces in images returned on a text-based query over captions can significantly improve search results. The underlying idea to improve the text-based results is that although this initial result is imperfect, it will render the queried person to be relatively frequent as compared to other people, so we can search for a large group of highly similar faces. The performance of such methods depends strongly on this assumption: for people whose face appears in less than about 40% of the initial text-based result, the performance may be very poor. The contribution of this paper is to improve search results by exploiting faces of other people that co-occur frequently with the queried person. We refer to this process as 'query expansion'. In the face analysis we use the query expansion to provide a query-specific relevant set of 'negative' examples which should be separated from the potentially positive examples in the text-based result set. We apply this idea to a recently-proposed method which filters the initial result set using a Gaussian mixture model, and apply the same idea using a logistic discriminant model. We experimentally evaluate the methods using a set of 23 queries on a database of 15.000 captioned news stories from yahoonews. The results show that (i) query expansion improves both methods, (ii) that our discriminative models outperform the generative ones, and (iii) our best results surpass the state-of-the-art results by 10% precision on average.
- Published
- 2008
37. Tracking 3D Object using Flexible Models
- Author
-
Lucie Masson, Frédéric Jurie, Michel Dhome, Laboratoire des sciences et matériaux pour l'électronique et d'automatique (LASMEA), Université Blaise Pascal - Clermont-Ferrand 2 (UBP)-Centre National de la Recherche Scientifique (CNRS), Learning and recognition in vision (LEAR), Laboratoire d'informatique GRAphique, VIsion et Robotique de Grenoble (GRAVIR - IMAG), Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), William Clocksin and Andrew Fitzgibbon and Philip Torr, and Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Grenoble (INPG)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Grenoble (INPG)-Inria Grenoble - Rhône-Alpes
- Subjects
business.industry ,Computer science ,Over training ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] ,Statistical model ,02 engineering and technology ,020202 computer hardware & architecture ,Spline (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Eye tracking ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business - Abstract
International audience; This article proposes a flexible tracker which can estimate motion and deformations of 3D objects by considering their appearances as nonrigid surfaces. In this approach, a flexible model is built by matching features (key points) over training sequences and by learning the deformations of a spline based model. This statistical model captures the variations in the appearance of objects caused by 3D pose variations. Visual tracking is then possible, for each new frame, by matching local features of the model according to their local appearances as well as optimal optimization of the constraints provided by the flexible model. The approach is demonstrated on real-world images sequences.
- Published
- 2005
38. Markov random fields for textures recognition with local invariant regions and their geometric relationships
- Author
-
Cordelia Schmid, Juliette Blanchet, Florence Forbes, Learning and recognition in vision (LEAR), Laboratoire d'informatique GRAphique, VIsion et Robotique de Grenoble (GRAVIR - IMAG), Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Grenoble (INPG)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Grenoble (INPG)-Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Modelling and Inference of Complex and Structured Stochastic Systems [?-2006] (MISTIS [?-2006]), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), William Clocksin and Andrew Fitzgibbon and Philip Torr, and Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Inria Grenoble - Rhône-Alpes
- Subjects
Local invariant ,Random field ,Markov chain ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] ,Pattern recognition ,02 engineering and technology ,Texture recognition ,Mean field theory ,020204 information systems ,Computer Science::Computer Vision and Pattern Recognition ,Parametric model ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Probabilistic framework ,Hidden Markov model ,Mathematics - Abstract
International audience; This paper describes a new probabilistic framework for recognizing textures in images. Images are described by local affine-invariant descriptors and their spatial relationships. We introduce a statistical parametric models of the dependence between descriptors. We use Hidden Markov Models (HMM) and estimate the parameters with a recent technique based on the mean field principle. Preliminary results for texture recognition are promising and outperform existing techniques.
- Published
- 2005
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.