Author: "Xerox Company" / Topic: artificial intelligence - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Xerox Company"' showing total 42 results

Start Over Author "Xerox Company" Topic artificial intelligence

42 results on '"Xerox Company"'

1. Activity representation with motion hierarchies

Author: Adrien Gaidon, Zaid Harchaoui, Cordelia Schmid, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Microsoft Research - Inria Joint Centre (MSR - INRIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Microsoft Research Laboratory Cambridge-Microsoft Corporation [Redmond, Wash.], Computer Vision, Xerox Research Centre Europe [Meylan], Xerox Company-Xerox Company, ERC_Allegro, MSR-Inria, AXES, ANR, ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011), European Project: 269980,EC:FP7:ICT,FP7-ICT-2009-6,AXES(2011), European Project: 320559,EC:FP7:ERC,ERC-2012-ADG_20120216,ALLEGRO(2013), and European Project: 216886,EC:FP7:ICT,FP7-ICT-2007-1,PASCAL2(2008)
Subjects: Context (language use), 02 engineering and technology, Video analysis, Action recognition, Activity recognition, Artificial Intelligence, Spectral clustering, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, Mathematics, Binary tree, business.industry, Kernel methods, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], 020206 networking & telecommunications, Pattern recognition, Tree (data structure), Kernel method, Pattern recognition (psychology), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Motion decomposition, Software
Abstract: Complex activities, e.g. pole vaulting, are composed of a variable number of sub-events connected by complex spatio-temporal relations, whereas simple actions can be represented as sequences of short temporal parts. In this paper, we learn hierarchical representations of activity videos in an unsupervised manner. These hierarchies of mid-level motion components are data-driven decompositions specific to each video. We introduce a spectral divisive clustering algorithm to efficiently extract a hierarchy over a large number of tracklets (i.e. local trajectories). We use this structure to represent a video as an unordered binary tree. We model this tree using nested histograms of local motion features. We provide an efficient positive definite kernel that computes the structural and visual similarity of two hierarchical decompositions by relying on models of their parent---child relations. We present experimental results on four recent challenging benchmarks: the High Five dataset (Patron-Perez et al., High five: recognising human interactions in TV shows, 2010), the Olympics Sports dataset (Niebles et al., Modeling temporal structure of decomposable motion segments for activity classification, 2010), the Hollywood 2 dataset (Marszalek et al., Actions in context, 2009), and the HMDB dataset (Kuehne et al., HMDB: A large video database for human motion recognition, 2011). We show that per-video hierarchies provide additional information for activity recognition. Our approach improves over unstructured activity models, baselines using other motion decomposition algorithms, and the state of the art.
Published: 2014
Full Text: View/download PDF

2. Label-Embedding for Image Classification

Author: Zeynep Akata, Zaid Harchaoui, Florent Perronnin, Cordelia Schmid, Xerox Research Centre Europe [Meylan], Xerox Company, Apprentissage de modèles à partir de données massives (Thoth ), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Inria Grenoble - Rhône-Alpes, and Institut National de Recherche en Informatique et en Automatique (Inria)
Subjects: FOS: Computer and information sciences, Label Embedding, Linear programming, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Text mining, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 0105 earth and related environmental sciences, Training set, Image Classification, Contextual image classification, business.industry, Applied Mathematics, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Pattern recognition, Computational Theory and Mathematics, Embedding, Zero-Shot Learning, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Attributes, business, computer, Software
Abstract: Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label-embedding problem: each class is embedded in the space of attribute vectors. We introduce a function that measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, given an image, the correct classes rank higher than the incorrect ones. Results on the Animals With Attributes and Caltech-UCSD-Birds datasets show that the proposed framework outperforms the standard Direct Attribute Prediction baseline in a zero-shot learning scenario. Label embedding enjoys a built-in ability to leverage alternative sources of information instead of or in addition to attributes, such as e.g. class hierarchies or textual descriptions. Moreover, label embedding encompasses the whole range of learning settings from zero-shot learning to regular learning with a large number of labeled examples., Comment: IEEE TPAMI preprint
Published: 2016
Full Text: View/download PDF

3. STATISTICALLY ASSISTED FLUID IMAGE REGISTRATION ALGORITHM - SAFIRA

Author: Xavier Pennec, Agatha D. Lee, Paul M. Thompson, Natasha Lepore, C. Brun, Katie L. McMahon, Margaret J. Wright, Marina Barysheva, Yi-Yu Chou, Greig I. de Zubicaray, Xerox Research Centre Europe [Meylan], Xerox Company, Laboratory of Neuro Imaging [Los Angeles] (LONI), University of California [Los Angeles] (UCLA), University of California-University of California, Analysis and Simulation of Biomedical Images (ASCLEPIOS), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Centre for Magnetic Resonance [Brisbanne], University of Queensland [Brisbane], Queensland Institute of Medical Research, and University of California (UC)-University of California (UC)
Subjects: education.field_of_study, Ground truth, [SDV.IB.IMA]Life Sciences [q-bio]/Bioengineering/Imaging, Covariance matrix, business.industry, Population, Image registration, Covariance, [INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation, Article, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, [INFO.INFO-IM]Computer Science [cs]/Medical Imaging, Vector field, Computer vision, Algorithm design, Artificial intelligence, Tensor, business, education, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, Algorithm, ComputingMilieux_MISCELLANEOUS, Mathematics
Abstract: In this paper, we develop and validate a new Statistically Assisted Fluid Registration Algorithm (SAFIRA) for brain images. A non-statistical version of this algorithm was first implemented in [2] and re-formulated using Lagrangian mechanics in [3]. Here we extend this algorithm to 3D: given 3D brain images from a population, vector fields and their corresponding deformation matrices are computed in a first round of registrations using the non-statistical implementation. Covariance matrices for both the deformation matrices and the vector fields are then obtained and incorporated (separately or jointly) in the regularizing (i.e., the non-conservative Lagrangian) terms, creating four versions of the algorithm. We evaluated the accuracy of each algorithm variant using the manually labeled LPBA40 dataset, which provides us with ground truth anatomical segmentations. We also compared the power of the different algorithms using tensor-based morphometry-a technique to analyze local volumetric differences in brain structure- applied to 46 3D brain scans from healthy monozygotic twins.
Published: 2018
Full Text: View/download PDF

4. BEST INDIVIDUAL TEMPLATE SELECTION FROM DEFORMATION TENSOR MINIMIZATION

Author: C. Brun, Arthur W. Toga, M. Meredith, Marina Barysheva, Margaret J. Wright, Katie L. McMahon, Yi-Yu Chou, Xavier Pennec, Paul M. Thompson, G.I. de Zubicaray, Natasha Lepore, Agatha D. Lee, Laboratory of Neuro Imaging [Los Angeles] (LONI), University of California [Los Angeles] (UCLA), University of California (UC)-University of California (UC), Xerox Research Centre Europe [Meylan], Xerox Company, Analysis and Simulation of Biomedical Images (ASCLEPIOS), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Centre for Magnetic Resonance [Brisbanne], University of Queensland [Brisbane], Queensland Institute of Medical Research, and University of California-University of California
Subjects: education.field_of_study, Similarity (geometry), business.industry, [SDV.IB.IMA]Life Sciences [q-bio]/Bioengineering/Imaging, Cumulative distribution function, Population, Monozygotic twin, Pattern recognition, computer.software_genre, [INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation, Article, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, Voxel, Metric (mathematics), Null distribution, [INFO.INFO-IM]Computer Science [cs]/Medical Imaging, Computer vision, Tensor, Artificial intelligence, business, education, computer, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, ComputingMilieux_MISCELLANEOUS, Mathematics
Abstract: We study the influence of the choice of template in tensor- based morphometry. Using 3D brain MR images from 10 monozygotic twin pairs, we defined a tensor-based distance in the log-Euclidean framework [1] between each image pair in the study. Relative to this metric, twin pairs were found to be closer to each other on average than random pairings, consistent with evidence that brain structure is under strong genetic control. We also computed the intraclass correlation and associated permutation p-value at each voxel for the determinant of the Jacobian matrix of the transformation. The cumulative distribution function (cdf) of the p-values was found at each voxel for each of the templates and compared to the null distribution. Surprisingly, there was very little difference between CDFs of statistics computed from analyses using different templates. As the brain with least log-Euclidean deformation cost, the mean template defined here avoids the blurring caused by creating a synthetic image from a population, and when selected from a large population, avoids bias by being geometrically centered, in a metric that is sensitive enough to anatomical similarity that it can even detect genetic affinity among anatomies.
Published: 2018
Full Text: View/download PDF

5. SMILK, linking natural language and data from the web

Author: Elena Cabrio, Molka Tounsi Dhouib, Fabien Gandon, Cédric Lopez, Catherine Faron-Zucker, Frédéric Segond, Exploration et exploitation de données textuelles (TEXTE), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Web-Instrumented Man-Machine Interactions, Communities and Semantics (WIMMICS), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Scalable and Pervasive softwARe and Knowledge Systems (Laboratoire I3S - SPARKS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA), Xerox Research Centre Europe [Meylan], and Xerox Company
Subjects: media_common.quotation_subject, Natural language processing, 02 engineering and technology, Art, Web of data, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Artificial Intelligence, Linked Data, [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], 0202 electrical engineering, electronic engineering, information engineering, Ontologies, 020201 artificial intelligence & image processing, Humanities, Software, media_common
Abstract: National audience; As part of the SMILK Joint Lab, we studied the use of Natural Language Processing to: (1) enrich knowledge bases and link data on the web, and conversely (2) use this linked data to contribute to the improvement of text analysis and the annotation of textual content, and to support knowledge extraction. The evaluation focused on brand-related information retrieval in the field of cosmetics. This article describes each step of our approach: the creation of ProVoc, an ontology to describe products and brands; the automatic population of a knowledge base mainly based on ProVoc from heterogeneous textual resources; and the evaluation of an application which that takes the form of a browser plugin providing additional knowledge to users browsing the web.; Le laboratoire commun SMILK avait pour double sujet d'étude l'utilisation du traitement automatique du langage naturel pour aider à la construction et au liage de données sur le web et, à l'inverse, l'utilisation de ces données liées du web sémantique pour aider à l'analyse des textes et venir en appui de l'extraction de connaissances et l'annotation de contenus textuels. L'évaluation de nos travaux s'est focalisée sur la recherche d'informations portant sur des marques, plus particulièrement dans le domaine de la cosmétique. Cet article décrit chaque étape de notre approche : la conception de ProVoc, une ontologie pour décrire les produits et marques ; le peuplement automatique d'une base de connaissances reposant notamment sur ProVoc à partir de ressources textuelles hétérogènes; et l'évaluation d'une application prenant la forme d'un plugin de navigateur proposant des connaissances supplémentaires aux utilisateurs naviguant sur le web.
Published: 2018
Full Text: View/download PDF

6. Symbolic Priors for RNN-based Semantic Parsing

Author: Marc Dymetman, Chunyang Xiao, Claire Gardent, Xerox Research Centre Europe [Meylan], Xerox Company, Natural Language Processing : representations, inference and semantics (SYNALP), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Subjects: FOS: Computer and information sciences, Parsing, Computer Science - Computation and Language, Grammar, Intersection (set theory), Computer science, business.industry, media_common.quotation_subject, 020207 software engineering, 02 engineering and technology, computer.software_genre, Automaton, Recurrent neural network, 0202 electrical engineering, electronic engineering, information engineering, Logical form, 020201 artificial intelligence & image processing, S-attributed grammar, [INFO]Computer Science [cs], Artificial intelligence, business, computer, Computation and Language (cs.CL), Natural language processing, Natural language, media_common
Abstract: Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attention in the domain of Semantic Parsing. While in principle they can be trained directly on pairs (natural language utterances, logical forms), their performance is limited by the amount of available data. To alleviate this problem, we propose to exploit various sources of prior knowledge: the well-formedness of the logical forms is modeled by a weighted context-free grammar; the likelihood that certain entities present in the input utterance are also present in the logical form is modeled by weighted finite-state automata. The grammar and automata are combined together through an efficient intersection algorithm to form a soft guide (“background”) to the RNN.We test our method on an extension of the Overnight dataset and show that it not only strongly improves over an RNN baseline, but also outperforms non-RNN models based on rich sets of hand-crafted features.
Published: 2018
Full Text: View/download PDF

7. Accuracy of using natural language processing methods for identifying healthcare-associated infections

Author: Stéfan Jacques Darmoni, Frédérique Segond, Nastassia Tvardik, Ivan Kergourlay, M.-H. Metzger, André Bittar, Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS), Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Université Le Havre Normandie (ULH), Normandie Université (NU), Equipe Traitement de l'information en Biologie Santé (TIBS - LITIS), Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Xerox Research Centre Europe [Meylan], Xerox Company, and ANR-12-TECS-0006,SYNODOS,SYstème de Normalisation et d'Organisation de Données médicales textuelles pour l'Observation en Santé(2012)
Subjects: Healthcare associated infections, Adult, medicine.medical_specialty, animal structures, 020205 medical informatics, Epidemiology, computerized, Specialty, Nice, Health Informatics, 02 engineering and technology, Decision support systems, Healthcare-associated infections, computer.software_genre, Sensitivity and Specificity, Hospitals, University, 03 medical and health sciences, Clinical, 0302 clinical medicine, Intensive care, 0202 electrical engineering, electronic engineering, information engineering, medicine, Electronic Health Records, Humans, [INFO]Computer Science [cs], 030212 general & internal medicine, computer.programming_language, Cross Infection, business.industry, Digestive surgery, Medical record, Natural language processing, Medical records systems, virus diseases, University hospital, 3. Good health, Intensive Care Units, Artificial intelligence, business, computer, Algorithms
Abstract: Objective There is a growing interest in using natural language processing (NLP) for healthcare-associated infections (HAIs) monitoring. A French project consortium, SYNODOS, developed a NLP solution for detecting medical events in electronic medical records for epidemiological purposes. The objective of this study was to evaluate the performance of the SYNODOS data processing chain for detecting HAIs in clinical documents. Materials and methods The collection of textual records in these hospitals was carried out between October 2009 and December 2010 in three French University hospitals (Lyon, Rouen and Nice). The following medical specialties were included in the study: digestive surgery, neurosurgery, orthopedic surgery, adult intensive-care units. Reference Standard surveillance was compared with the results of automatic detection using NLP. Sensitivity on 56 HAI cases and specificity on 57 non-HAI cases were calculated. Results The accuracy rate was 84% (n = 95/113). The overall sensitivity of automatic detection of HAIs was 83.9% (CI 95%: 71.7–92.4) and the specificity was 84.2% (CI 95%: 72.1–92.5). The sensitivity varies from one specialty to the other, from 69.2% (CI 95%: 38.6–90.9) for intensive care to 93.3% (CI 95%: 68.1–99.8) for orthopedic surgery. The manual review of classification errors showed that the most frequent cause was an inaccurate temporal labeling of medical events, which is an important factor for HAI detection. Conclusion This study confirmed the feasibility of using NLP for the HAI detection in hospital facilities. Automatic HAI detection algorithms could offer better surveillance standardization for hospital comparisons.
Published: 2018
Full Text: View/download PDF

8. LCR-Net: Localization-Classification-Regression for Human Pose

Author: Philippe Weinzaepfel, Cordelia Schmid, Grégory Rogez, Apprentissage de modèles à partir de données massives (Thoth ), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Xerox Research Centre Europe [Meylan], Xerox Company, Amazon_gift, European Project: 320559,EC:FP7:ERC,ERC-2012-ADG_20120216,ALLEGRO(2013), Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Inria Grenoble - Rhône-Alpes, and Institut National de Recherche en Informatique et en Automatique (Inria)
Subjects: Contextual image classification, Computer science, business.industry, Initialization, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], 020207 software engineering, Pattern recognition, 02 engineering and technology, 3D pose estimation, Real image, Regression, Articulated body pose estimation, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Classifier (UML), Pose
Abstract: International audience; We propose an end-to-end architecture for joint 2D and 3D human pose estimation in natural images. Key to our approach is the generation and scoring of a number of pose proposals per image, which allows us to predict 2D and 3D pose of multiple people simultaneously. Hence, our approach does not require an approximate localization of the humans for initialization. Our architecture, named LCR-Net, contains 3 main components: 1) the pose proposal generator that suggests potential poses at different locations in the image; 2) a classifier that scores the different pose proposals ; and 3) a regressor that refines pose proposals both in 2D and 3D. All three stages share the convolutional feature layers and are trained jointly. The final pose estimation is obtained by integrating over neighboring pose hypotheses , which is shown to improve over a standard non maximum suppression algorithm. Our approach significantly outperforms the state of the art in 3D pose estimation on Human3.6M, a controlled environment. Moreover, it shows promising results on real images for both single and multi-person subsets of the MPII 2D pose benchmark.
Published: 2017
Full Text: View/download PDF

9. DeepMatching: Hierarchical Deformable Dense Matching

Author: Philippe Weinzaepfel, Zaid Harchaoui, Cordelia Schmid, Jerome Revaud, Xerox Research Centre Europe [Meylan], Xerox Company, Apprentissage de modèles à partir de données massives (Thoth ), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), ERC_Allegro, MSR-Inria, AXES, ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011), European Project: 320559,EC:FP7:ERC,ERC-2012-ADG_20120216,ALLEGRO(2013), European Project: 269980,EC:FP7:ICT,FP7-ICT-2009-6,AXES(2011), Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Inria Grenoble - Rhône-Alpes, and Institut National de Recherche en Informatique et en Automatique (Inria)
Subjects: FOS: Computer and information sciences, deep convolutional neural networks (CNN), Computer science, Computer Vision and Pattern Recognition (cs.CV), Computation, Computer Science - Computer Vision and Pattern Recognition, Optical flow, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, optical flow, Artificial Intelligence, Robustness (computer science), Motion estimation, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Blossom algorithm, business.industry, Detector, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], 020207 software engineering, Robotics, Non-rigid matching, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Affine transformation, Artificial intelligence, business, Software
Abstract: International audience; We introduce a novel matching algorithm, called DeepMatching, to compute dense correspondences between images. DeepMatching relies on a hierarchical, multi-layer, correlational architecture designed for matching images and was inspired by deep convolutional approaches. The proposed matching algorithm can handle non-rigid deformations and repetitive textures and efficiently determines dense correspondences in the presence of significant changes between images. We evaluate the performance of DeepMatching, in comparison with state-of-the-art matching algorithms, on the Mikolajczyk (Mikolajczyk et al. A comparison of affine region detectors, 2005), the MPI-Sintel (Butler et al. A naturalistic open source movie for optical flow evaluation, 2012) and the Kitti (Geiger et al. Vision meets robotics: The KITTI dataset, 2013) datasets. DeepMatching outperforms the state-of-the-art algorithms and shows excellent results in particular for repetitive textures. We also apply DeepMatching to the computation of optical flow, called DeepFlow, by integrating it in the large displacement optical flow (LDOF) approach of Brox and Malik (Large displacement optical flow: descriptor matching in variational motion estimation, 2011). Additional robustness to large displacements and complex motion is obtained thanks to our matching approach. DeepFlow obtains competitive performance on public benchmarks for optical flow estimation.
Published: 2016
Full Text: View/download PDF

10. Orthogonality regularizer for question answering

Author: Marc Dymetman, Guillaume Bouchard, Chunyang Xiao, Claire Gardent, Xerox Research Centre Europe [Meylan], Xerox Company, University College of London [London] (UCL), Natural Language Processing : representations, inference and semantics (SYNALP), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)
Subjects: Orthogonality (programming), business.industry, Computer science, 02 engineering and technology, Space (commercial competition), Knowledge base, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), Question answering, Object type, Embedding, [INFO]Computer Science [cs], 020201 artificial intelligence & image processing, Artificial intelligence, business, Lying
Abstract: International audience; Learning embeddings of words and knowledge base elements is a promising approach for open domain question answering. Based on the remark that relations and entities are distinct object types lying in the same embedding space, we analyze the benefit of adding a regularizer favoring the embeddings of entities to be orthogonal to those of relations. The main motivation comes from the observation that modifying the embeddings using prior knowledge often helps performance. The experiments show that incorporating the regularizer yields better results on a challenging question answering benchmark.
Published: 2016
Full Text: View/download PDF

11. Aggregating Local Image Descriptors into Compact Codes

Author: Herve Jegou, Cordelia Schmid, Jorge Sanchez, Patrick Pérez, Florent Perronnin, Matthijs Douze, Multimedia content-based indexing (TEXMEX), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Xerox Research Centre Europe [Meylan], Xerox Company, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Service Expérimentation et Développement (SED [Grenoble]), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Technicolor R & I [Cesson Sévigné], Technicolor, QUAERO, ANR-07-BLAN-0328,GAIA,Géométrie Algorithmique Informationnelle et Applications(2007), European Project: 216529,EC:FP7:ICT,FP7-ICT-2007-1,PINVIEW(2008), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, and SED [Grenoble]
Subjects: Fisher kernel, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, image retrieval, image search, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Image retrieval, Mathematics, Multi-core processor, business.industry, Applied Mathematics, Dimensionality reduction, Search engine indexing, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Byte, 020207 software engineering, Pattern recognition, Visualization, Computational Theory and Mathematics, Kernel (image processing), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software, indexing
Abstract: International audience; This paper addresses the problem of large-scale image search. Three constraints have to be taken into account: search accuracy, efficiency, and memory usage. We first present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension. We then jointly optimize dimensionality reduction and indexing in order to obtain a precise vector comparison as well as a compact representation. The evaluation shows that the image representation can be reduced to a few dozen bytes while preserving high accuracy. Searching a 100 million image dataset takes about 250 ms on one processor core.
Published: 2012
Full Text: View/download PDF

12. Local Convolutional Features with Unsupervised Training for Image Retrieval

Author: Julien Mairal, Cordelia Schmid, Matthijs Douze, Mattis Paulin, Florent Perronin, Zaid Harchaoui, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Xerox Research Centre Europe [Meylan], Xerox Company, ERC_Allegro, ANR_Macaron, CNRS_Mastodons_Titan, Xerox Research Center Europe, ANR-14-CE23-0003,MACARON,Apprentissage statistique à grande échelle et applications(2014), and European Project: 320559,EC:FP7:ERC,ERC-2012-ADG_20120216,ALLEGRO(2013)
Subjects: Computer science, business.industry, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Scale-invariant feature transform, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], 020207 software engineering, Pattern recognition, 02 engineering and technology, Machine learning, computer.software_genre, Automatic image annotation, Kernel (image processing), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Visual Word, Artificial intelligence, business, computer, Image retrieval
Abstract: International audience; Patch-level descriptors underlie several important computer vision tasks, such as stereo-matching or content-based image retrieval. We introduce a deep convolutional architecture that yields patch-level descriptors, as an alternative to the popular SIFT descriptor for image retrieval. The proposed family of descriptors, called Patch-CKN, adapt the recently introduced Convolutional Kernel Network (CKN), an unsupervised framework to learn convolutional architectures. We present a comparison framework to benchmark current deep convolutional approaches along with Patch-CKN for both patch and image retrieval, including our novel ``RomePatches'' dataset. Patch-CKN descriptors yield competitive results compared to supervised CNNs alternatives on patch and image retrieval.
Published: 2015
Full Text: View/download PDF

13. Explicit embeddings for nearest neighbor search with Mercer kernels

Author: Hervé Jégou, Rémi Gribonval, Patrick Pérez, Florent Perronnin, Anthony Bourrier, GIPSA - Vision and Brain Signal Processing (GIPSA-VIBS), Département Images et Signal (GIPSA-DIS), Grenoble Images Parole Signal Automatique (GIPSA-lab), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Grenoble Images Parole Signal Automatique (GIPSA-lab), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Parcimonie et Nouveaux Algorithmes pour le Signal et la Modélisation Audio (PANAMA), Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-SIGNAUX ET IMAGES NUMÉRIQUES, ROBOTIQUE (IRISA-D5), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Technicolor [Cesson Sévigné], Technicolor, Xerox Research Centre Europe [Meylan], Xerox Company, Creating and exploiting explicit links between multimedia fragments (LinkMedia), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-MEDIA ET INTERACTIONS (IRISA-D6), Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Grenoble Images Parole Signal Automatique (GIPSA-lab), Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), SIGNAUX ET IMAGES NUMÉRIQUES, ROBOTIQUE (IRISA-D5), CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National de Recherche en Informatique et en Automatique (Inria)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-CentraleSupélec-Télécom Bretagne-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Université de Bretagne Sud (UBS)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)
Subjects: Statistics and Probability, Explicit embeddings, Nearest neighbor search, Machine learning, computer.software_genre, Margin (machine learning), Encoding (memory), Mathematics, Euclidean space, business.industry, Applied Mathematics, Nearest Neighbor search, Condensed Matter Physics, Euclidean distance, Modeling and Simulation, Kernel (statistics), [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], Embedding, Mercer kernels, Geometry and Topology, Computer Vision and Pattern Recognition, Artificial intelligence, Fixed-radius near neighbors, business, computer, Algorithm
Abstract: International audience; Many approximate nearest neighbor search algorithms operate under memory constraints, by computing short signatures for database vectors while roughly keeping the neighborhoods for the distance of interest. Encoding procedures designed for the Euclidean distance have attracted much attention in the last decade.In the case where the distance of interest is based on a Mercer kernel, we propose a simple, yet effective two-step encoding scheme: first, compute an explicit embedding to map the initial space into a Euclidean space; second, apply an encoding step designed to work with the Euclidean distance. Comparing this simple baseline with existing methods relying on implicit encoding, we demonstrate better search recall for similar code sizes with the chi-square kernel in databases comprised of visual descriptors, outperforming concurrent state-of-the-art techniques by a large margin.
Published: 2015
Full Text: View/download PDF

14. Revisiting the Fisher vector for fine-grained classification

Author: Naila Murray, Philippe-Henri Gosselin, Hervé Jégou, Florent Perronnin, Multimedia content-based indexing (TEXMEX), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Multimedia Indexation and Data Integration (MIDI), Equipes Traitement de l'Information et Systèmes (ETIS - UMR 8051), Ecole Nationale Supérieure de l'Electronique et de ses Applications (ENSEA)-Centre National de la Recherche Scientifique (CNRS)-CY Cergy Paris Université (CY)-Ecole Nationale Supérieure de l'Electronique et de ses Applications (ENSEA)-Centre National de la Recherche Scientifique (CNRS)-CY Cergy Paris Université (CY), Xerox Research Centre Europe [Meylan], Xerox Company, Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, and Gosselin, Philippe-Henri
Subjects: Contextual image classification, Computer science, business.industry, Fisher kernel, Search engine indexing, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], 020207 software engineering, Pattern recognition, Fisher vector, 02 engineering and technology, Machine learning, computer.software_genre, [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Artificial Intelligence, Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, computer, Software
Abstract: Wining method of Fine-grain image classification challenge 2013.Late combination of two indexing and classification strategies.Good practices for fine grain image classification.Key features: descriptors filtering, spatial coordinates coding, active learning. This paper describes the joint submission of Inria and Xerox to their joint participation to the FGCOMP'2013 challenge. Although the proposed system follows most of the standard Fisher classification pipeline, we describe a few key features and good practices that significantly improve the accuracy when specifically considering fine-grain classification tasks. In particular, we consider the late fusion of two systems both based on Fisher vectors, but for which we choose drastically design choices that make them very complementary. Moreover, we propose a simple yet effective filtering strategy, which significantly boosts the performance for several class domains.
Published: 2014

15. Transformation Pursuit for Image Classification

Author: Zaid Harchaoui, Cordelia Schmid, Jerome Revaud, Florent Perronnin, Mattis Paulin, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Xerox Research Centre Europe [Meylan], Xerox Company, ERC_Allegro, AXES, ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011), ANR-12-CORD-0016,FIRE-ID,Reconnaissance à grain fin dans les grandes bases de données d'images(2012), European Project: 320559,EC:FP7:ERC,ERC-2012-ADG_20120216,ALLEGRO(2013), and European Project: 269980,EC:FP7:ICT,FP7-ICT-2009-6,AXES(2011)
Subjects: Training set, Contextual image classification, Selection (relational algebra), business.industry, Training time, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Machine learning, computer.software_genre, Image (mathematics), Transformation (function), Simple (abstract algebra), Artificial intelligence, Noise (video), business, computer, Mathematics
Abstract: International audience; A simple approach to learning invariances in image classification consists in augmenting the training set with transformed versions of the original images. However, given a large set of possible transformations, selecting a compact subset is challenging. Indeed, all transformations are not equally informative and adding uninformative transformations increases training time with no gain in accuracy. We propose a principled algorithm – Image Transformation Pursuit (ITP) – for the automatic selection of a compact set of transformations. ITP works in a greedy fashion, by selecting at each iteration the one that yields the highest accuracy gain. ITP also allows to efficiently explore complex transformations, that combine basic transformations. We report results on two public benchmarks: the CUB dataset of bird images and the ImageNet 2010 challenge. Using Fisher Vector representations, we achieve an improvement from 28.2% to 45.2% in top-1 accuracy on CUB, and an improvement from 70.1% to 74.9% in top-5 accuracy on ImageNet. We also show significant improvements for deep convnet features: from 47.3% to 55.4% on CUB and from 77.9% to 81.4% on ImageNet.
Published: 2014
Full Text: View/download PDF

16. Instance classification with prototype selection

Author: Hervé Jégou, Florent Perronnin, Josip Krapac, Teddy Furon, Multimedia content-based indexing (TEXMEX), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Xerox Research Centre Europe [Meylan], Xerox Company, Xerox Research Center Europe, ANR-12-CORD-0016,FIRE-ID,Reconnaissance à grain fin dans les grandes bases de données d'images(2012), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique
Subjects: Contextual image classification, business.industry, Computer science, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Feature selection, Fisher vector, Classification scheme, Pattern recognition, Machine learning, computer.software_genre, Logos Bible Software, Hamming embedding, ComputingMethodologies_PATTERNRECOGNITION, Voting, Artificial intelligence, business, computer, Selection (genetic algorithm), media_common
Abstract: International audience; We address the problem of instance classification: our goal is to annotate images with tags corresponding to objects classes which exhibit small intra-class variations such as logos, products or landmarks. We propose a novel algorithm for the selection of class-specific prototypes which are used in a voting-based classification scheme. We show significant improvements over two state-of-the-art methods, namely the Fisher vector and Hamming Embedding, on two challenging methods of logos and vehicles.
Published: 2014
Full Text: View/download PDF

17. Translation project adaptation for MT-enhanced computer assisted translation

Author: Nicola Bertoldi, Mauro Cettolo, Marcello Federico, Christophe Servan, Loïc Barrault, Holger Schwenk, Fondazione Bruno Kessler [Trento, Italy] (FBK), Laboratoire d'Informatique de l'Université du Mans (LIUM), Le Mans Université (UM), Xerox Research Centre Europe [Meylan], Xerox Company, and European Project: 287688,EC:FP7:ICT,FP7-ICT-2011-7,MATECAT(2011)
Subjects: Linguistics and Language, Computer science, 02 engineering and technology, computer.software_genre, Language and Linguistics, German, 030507 speech-language pathology & audiology, 03 medical and health sciences, Translation project, Artificial Intelligence, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, Language industry, business.industry, Information technology, Transfer-based machine translation, language.human_language, [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, language, Computer-assisted translation, 020201 artificial intelligence & image processing, Artificial intelligence, Computational linguistics, 0305 other medical science, business, computer, Software, Natural language processing
Abstract: The effective integration of MT technology into computer-assisted translation tools is a challenging topic both for academic research and the translation industry. In particular, professional translators consider the ability of MT systems to adapt to the feedback provided by them to be crucial. In this paper, we propose an adaptation scheme to tune a statistical MT system to a translation project using small amounts of post-edited texts, like those generated by a single user in even just one day of work. The same scheme can be applied on a larger scale in order to focus general purpose models towards the specific domain of interest. We assess our method on two domains, namely information technology and legal, and four translation directions, from English to French, Italian, Spanish and German. The main outcome is that our adaptation strategy can be very effective provided that the seed data used for adaptation is `close enough' to the remaining text to be translated; otherwise, MT quality neither improves nor worsens, thus showing the robustness of our method.
Published: 2014
Full Text: View/download PDF

18. Image Classification with the Fisher Vector: Theory and Practice

Author: Thomas Mensink, Jakob Verbeek, Jorge Sanchez, Florent Perronnin, Facultad de Matemática, Astronomía y Física [Cordoba] (FaMAF), Universidad Nacional de Córdoba [Argentina], Xerox Research Centre Europe [Meylan], Xerox Company, Intelligent Systems Lab. (ISLA), University of Amsterdam [Amsterdam] (UvA), Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Amsterdam Machine Learning lab (IVI, FNWI), and Intelligent Sensory Information Systems (IVI, FNWI)
Subjects: IMAGE CLASSIFICATION, Image classification, Fisher kernel, LARGE-SCALE CLASSIFICATION, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Set (abstract data type), [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Artificial Intelligence, Encoding (memory), Product quantization, 0202 electrical engineering, electronic engineering, information engineering, FISHER VECTOR, Large-scale classification, Representation (mathematics), Otras Ciencias de la Computación e Información, Fisher vector, Mathematics, Contextual image classification, business.industry, 020207 software engineering, Pattern recognition, BAG-OF-VISUAL WORDS, Mixture model, Bag-of-words model in computer vision, Computer Science::Computer Vision and Pattern Recognition, Ciencias de la Computación e Información, Pattern recognition (psychology), 020201 artificial intelligence & image processing, Bag-of-Visual words, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software, CIENCIAS NATURALES Y EXACTAS
Abstract: A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K— with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique. Fil: Sanchez, Jorge Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Córdoba. Centro de Investigación y Estudios de Matemática de Córdoba(p); Argentina Fil: Perronnin, Florent . Xerox Research Centre Europe; Francia Fil: Mensink, Thomas. University of Amsterdam. Inteligent Systems Lab Amsterdam; Países Bajos Fil: Verbeek, Jakob. LEAR Team, INRIA Grenoble; Francia
Published: 2013
Full Text: View/download PDF

19. Distance-Based Image Classification: Generalizing to new classes at near-zero cost

Author: Jakob Verbeek, Thomas Mensink, Gabriela Csurka, Florent Perronnin, Intelligent Systems Lab. (ISLA), University of Amsterdam [Amsterdam] (UvA), Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Xerox Research Centre Europe [Meylan], Xerox Company, Amsterdam Machine Learning lab (IVI, FNWI), and Intelligent Sensory Information Systems (IVI, FNWI)
Subjects: Computer science, Generalization, 02 engineering and technology, Machine learning, computer.software_genre, 030218 nuclear medicine & medical imaging, k-nearest neighbors algorithm, Pattern Recognition, Automated, 03 medical and health sciences, 0302 clinical medicine, Artificial Intelligence, Image Interpretation, Computer-Assisted, 0202 electrical engineering, electronic engineering, information engineering, Computer Simulation, Image retrieval, Training set, Contextual image classification, business.industry, Applied Mathematics, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Pattern recognition, Models, Theoretical, Image Enhancement, Support vector machine, Computational Theory and Mathematics, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Transfer of learning, business, Classifier (UML), computer, Software, Algorithms, Distance based
Abstract: International audience; We study large-scale image classification methods that can incorporate new classes and training images continuously over time at negligible cost. To this end we consider two distance-based classifiers, the k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers, and introduce a new metric learning approach for the latter. We also introduce an extension of the NCM classifier to allow for richer class representations. Experiments on the ImageNet 2010 challenge dataset, which contains over 106 training images of 1,000 classes, show that, surprisingly, the NCM classifier compares favorably to the more flexible k-NN classifier. Moreover, the NCM performance is comparable to that of linear SVMs which obtain current state-of-the-art performance. Experimentally we study the generalization performance to classes that were not used to learn the metrics. Using a metric learned on 1,000 classes, we show results for the ImageNet-10K dataset which contains 10,000 classes, and obtain performance that is competitive with the current state-of-the-art, while being orders of magnitude faster. Furthermore, we show how a zero-shot class prior based on the ImageNet hierarchy can improve performance when few training images are available.
Published: 2013
Full Text: View/download PDF

20. Label-Embedding for Attribute-Based Classification

Author: Zaid Harchaoui, Cordelia Schmid, Zeynep Akata, Florent Perronnin, Xerox Research Centre Europe [Meylan], Xerox Company, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), and Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)
Subjects: Training set, Contextual image classification, business.industry, Computer science, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Pattern recognition, 02 engineering and technology, Machine learning, computer.software_genre, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Embedding, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: International audience; Attributes are an intermediate representation, which enables parameter sharing between classes, a must when training data is scarce. We propose to view attribute-based image classification as a label-embedding problem: each class is embedded in the space of attribute vectors. We introduce a function which measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, given an image, the correct classes rank higher than the incorrect ones. Results on the Animals With Attributes and Caltech-UCSD-Birds datasets show that the proposed framework outperforms the standard Direct Attribute Prediction baseline in a zero-shot learning scenario. The label embedding framework offers other advantages such as the ability to leverage alternative sources of information in addition to attributes (e.g. class hierarchies) or to transition smoothly from zero-shot learning to learning with large quantities of data.
Published: 2013
Full Text: View/download PDF

21. Large Scale Metric Learning for Distance-Based Image Classification on Open Ended Data Sets

Author: Mensink, T., Verbeek, J., Perronnin, F., Csurka, G., Farinella, G.M., Battiato, S., Cipolla, R, Xerox Research Centre Europe [Meylan], Xerox Company, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Farinella, Giovanni Maria and Battiato, Sebastiano and Cipolla, Roberto, Amsterdam Machine Learning lab (IVI, FNWI), and Intelligent Sensory Information Systems (IVI, FNWI)
Subjects: Contextual image classification, business.industry, 05 social sciences, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Pattern recognition, 02 engineering and technology, Machine learning, computer.software_genre, Metric (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, 0509 other social sciences, 050904 information & library sciences, business, Image retrieval, Classifier (UML), computer, Mathematics, Distance based
Abstract: International audience; Many real-life large-scale datasets are open-ended and dynamic: new images are continuously added to existing classes, new classes appear over time, and the semantics of existing classes might evolve too. Therefore, we study large-scale image classification methods that can incorporate new classes and training images continuously over time at negligible cost. To this end we consider two distance-based classifiers, the k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers. Since the performance of distance-based classifiers heavily depends on the used distance function, we cast the problem into one of learning a low-rank metric, which is shared across all classes. For the NCM classifier we introduce a new metric learning approach, and we also introduce an extension to allow for richer class representations. Experiments on the ImageNet 2010 challenge dataset, which contains over one million training images of thousand classes, show that, surprisingly, the NCM classifier compares favorably to the more flexible k-NN classifier. Moreover, the NCM performance is comparable to that of linear SVMs which obtain current state-of-the-art performance. Experimentally we study the generalization performance to classes that were not used to learn the metrics. Using a metric learned on 1,000 classes, we show results for the ImageNet-10K dataset which contains 10,000 classes, and obtain performance that is competitive with the current state-of-the-art, while being orders of magnitude faster.
Published: 2013
Full Text: View/download PDF

22. Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost

Author: Jakob Verbeek, Gabriela Csurka, Thomas Mensink, Florent Perronnin, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Xerox Research Centre Europe [Meylan], Xerox Company, and Andrew Fitzgibbon and Svetlana Lazebnik and Pietro Perona and Yoichi Sato and Cordelia Schmid
Subjects: Training set, Contextual image classification, Generalization, business.industry, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], 020207 software engineering, Pattern recognition, 02 engineering and technology, Machine learning, computer.software_genre, Support vector machine, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Transfer of learning, business, Reference class, Classifier (UML), computer, Mathematics
Abstract: International audience; We are interested in large-scale image classification and especially in the setting where images corresponding to new or existing classes are continuously added to the training set. Our goal is to devise classifiers which can incorporate such images and classes on-the-fly at (near) zero cost. We cast this problem into one of learning a metric which is shared across all classes and explore k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers. We learn metrics on the ImageNet 2010 challenge data set, which contains more than 1.2M training images of 1K classes. Surprisingly, the NCM classifier compares favorably to the more flexible k-NN classifier, and has comparable performance to linear SVMs. We also study the generalization performance, among others by using the learned metric on the ImageNet-10K dataset, and we obtain competitive performance. Finally, we explore zero-shot classification, and show how the zero-shot model can be combined very effectively with small training datasets.
Published: 2012
Full Text: View/download PDF

23. Towards good practice in large-scale learning for image classification

Author: Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid, Xerox Research Centre Europe [Meylan], Xerox Company, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), and Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)
Subjects: Early stopping, Contextual image classification, business.industry, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Pattern recognition, 02 engineering and technology, 010501 environmental sciences, Machine learning, computer.software_genre, 01 natural sciences, Regularization (mathematics), Ranking (information retrieval), Support vector machine, Stochastic gradient descent, 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Stochastic optimization, Artificial intelligence, business, computer, 0105 earth and related environmental sciences, Mathematics
Abstract: International audience; We propose a benchmark of several objective functions for large-scale image classification: we compare the one- vs-rest, multiclass, ranking and weighted average ranking SVMs. Using stochastic gradient descent optimization, we can scale the learning to millions of images and thousands of classes. Our experimental evaluation shows that ranking based algorithms do not outperform a one-vs-rest strategy and that the gap between the different algorithms reduces in case of high-dimensional data. We also show that for one-vs-rest, learning through cross-validation the optimal degree of imbalance between the positive and the negative samples can have a significant impact. Furthermore, early stopping can be used as an effective regularization strategy when training with stochastic gradient algorithms. Follow- ing these "good practices", we were able to improve the state-of-the-art on a large subset of 10K classes and 9M of images of ImageNet from 16.7% accuracy to 19.1%.
Published: 2012
Full Text: View/download PDF

24. Tree-structured CRF models for interactive image labeling

Author: Jakob Verbeek, Thomas Mensink, Gabriela Csurka, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Xerox Research Centre Europe [Meylan], and Xerox Company
Subjects: Computer science, 02 engineering and technology, Documentation, computer.software_genre, Machine learning, Sensitivity and Specificity, Pattern Recognition, Automated, User-Computer Interface, Artificial Intelligence, Image Interpretation, Computer-Assisted, 0202 electrical engineering, electronic engineering, information engineering, Leverage (statistics), Structured prediction, Contextual image classification, Standard test image, business.industry, Applied Mathematics, Cognitive neuroscience of visual object recognition, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Reproducibility of Results, 020206 networking & telecommunications, Image Enhancement, Image labeling, Radiology Information Systems, Computational Theory and Mathematics, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Data mining, Artificial intelligence, business, computer, Software, Algorithms
Abstract: International audience; We propose structured prediction models for image labeling that explicitly take into account dependencies among image labels. In our tree structured models, image labels are nodes, and edges encode dependency relations. To allow for more complex dependencies, we combine labels in a single node, and use mixtures of trees. Our models are more expressive than independent predictors, and lead to more accurate label predictions. The gain becomes more significant in an interactive scenario where a user provides the value of some of the image labels at test time. Such an interactive scenario offers an interesting trade-off between label accuracy and manual labeling effort. The structured models are used to decide which labels should be set by the user, and transfer the user input to more accurate predictions on other image labels. We also apply our models to attribute-based image classification, where attribute predictions of a test image are mapped to class probabilities by means of a given attribute-class mapping. Experimental results on three publicly available benchmark data sets show that in all scenarios our structured models lead to more accurate predictions, and leverage user input much more effectively than state-of-the-art independent models.
Published: 2012
Full Text: View/download PDF

25. Learning Multiple Tasks with Boosted Decision Trees

Author: Boris Chidlovskii, Rémi Gilleron, Fabien Torre, Jean Baptiste Faddoul, Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Xerox Research Centre Europe [Meylan], Xerox Company, Modeling Tree Structures, Machine Learning, and Information Extraction (MOSTRARE), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria), and Lecture Note in Computer Science
Subjects: Incremental decision tree, Information Gain, Computer science, business.industry, Decision tree learning, Decision Trees, ID3 algorithm, Decision tree, Multi-task learning, Machine learning, computer.software_genre, Boosting, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Information Fuzzy Networks, Alternating decision tree, Artificial intelligence, Decision stump, business, Multi-Task Learning, computer
Abstract: International audience; We address the problem of multi-task learning with no label correspondence among tasks. Learning multiple related tasks simultane- ously, by exploiting their shared knowledge can improve the predictive performance on every task. We develop the multi-task Adaboost en- vironment with Multi-Task Decision Trees as weak classifiers. We first adapt the well known decision tree learning to the multi-task setting. We revise the information gain rule for learning decision trees in the multi- task setting. We use this feature to develop a novel criterion for learning Multi-Task Decision Trees. The criterion guides the tree construction by learning the decision rules from data of different tasks, and representing different degrees of task relatedness. We then modify MT-Adaboost to combine Multi-task Decision Trees as weak learners. We experimentally validate the advantage of the new technique; we report results of ex- periments conducted on several multi-task datasets, including the Enron email set and Spam Filtering collection.
Published: 2012
Full Text: View/download PDF

26. Face recognition from caption-based supervision

Author: Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, Cordelia Schmid, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Xerox Research Centre Europe [Meylan], and Xerox Company
Subjects: Face retrieval, Computer science, Metric learning, 02 engineering and technology, Facial recognition system, Set (abstract data type), Query expansion, Constrained clustering, Discriminative model, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Face recognition, business.industry, Weakly supervised learning, 020207 software engineering, Pattern recognition, [INFO.INFO-GR]Computer Science [cs]/Graphics [cs.GR], Face (geometry), Metric (mathematics), Pattern recognition (psychology), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software
Abstract: In this paper, we present methods for face recognition using a collection of images with captions. We consider two tasks: retrieving all faces of a particular person in a data set, and establishing the correct association between the names in the captions and the faces in the images. This is challenging because of the very large appearance variation in the images, as well as the potential mismatch between images and their captions. For both tasks, we compare generative and discriminative probabilistic models, as well as methods that maximize subgraph densities in similarity graphs. We extend them by considering different metric learning techniques to obtain appropriate face representations that reduce intra person variability and increase inter person separation. For the retrieval task, we also study the benefit of query expansion. To evaluate performance, we use a new fully labeled data set of 31147 faces which extends the recent Labeled Faces in the Wild data set. We present extensive experimental results which show that metric learning significantly improves the performance of all approaches on both tasks.
Published: 2012
Full Text: View/download PDF

27. Learning structured prediction models for interactive image labeling

Author: Thomas Mensink, Jakob Verbeek, Gabriela Csurka, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Xerox Research Centre Europe [Meylan], and Xerox Company
Subjects: Contextual image classification, Standard test image, business.industry, Computer science, 02 engineering and technology, computer.software_genre, Machine learning, [INFO.INFO-GR]Computer Science [cs]/Graphics [cs.GR], 030218 nuclear medicine & medical imaging, 03 medical and health sciences, Image labeling, 0302 clinical medicine, 0202 electrical engineering, electronic engineering, information engineering, Leverage (statistics), 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, Structured prediction, business, computer
Abstract: International audience; We propose structured models for image labeling that take into account the dependencies among the image labels explicitly. These models are more expressive than independent label predictors, and lead to more accurate predictions. While the improvement is modest for fully-automatic image labeling, the gain is significant in an interactive scenario where a user provides the value of some of the image labels. Such an interactive scenario offers an interesting trade-off between accuracy and manual labeling effort. The structured models are used to decide which labels should be set by the user, and transfer the user input to more accurate predictions on other image labels. We also apply our models to attribute-based image classification, where attribute predictions of a test image are mapped to class probabilities by means of a given attribute-class mapping. In this case the structured models are built at the attribute level. We also consider an interactive system where the system asks a user to set some of the attribute values in order to maximally improve class prediction performance. Experimental results on three publicly available benchmark data sets show that in all scenarios our structured models lead to more accurate predictions, and leverage user input much more effectively than state-of-the-art independent models.
Published: 2011
Full Text: View/download PDF

28. Specifying and computing preferred plans

Author: Sheila A. McIlraith, Christian Fritz, Meghyn Bienvenu, Laboratoire de Recherche en Informatique (LRI), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Distributed and heterogeneous data and knowledge (LEO), Université Paris-Sud - Paris 11 (UP11)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Palo Alto Research Center (PARC), Xerox Company, Department of Computer Science [University of Toronto] (DCS), and University of Toronto
Subjects: Linguistics and Language, Theoretical computer science, Knowledge representation and reasoning, Computer science, Semantics (computer science), 0102 computer and information sciences, 02 engineering and technology, Plan (drawing), 01 natural sciences, Language and Linguistics, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Artificial Intelligence, Preferences, 0202 electrical engineering, electronic engineering, information engineering, Automated reasoning, ComputingMilieux_MISCELLANEOUS, computer.programming_language, Planning with preferences, business.industry, Planner, Preference, 010201 computation theory & mathematics, Knowledge representation, Bounded function, 020201 artificial intelligence & image processing, Artificial intelligence, Situation calculus, business, computer
Abstract: In this paper, we address the problem of specifying and computing preferred plans using rich, qualitative, user preferences. We propose a logical language for specifying preferences over the evolution of states and actions associated with a plan. We provide a semantics for our first-order preference language in the situation calculus, and prove that progression of our preference formulae preserves this semantics. This leads to the development of PPlan, a bounded best-first search planner that computes preferred plans. Our preference language is amenable to integration with many existing planners, and beyond planning, can be used to support a diversity of dynamical reasoning tasks that employ preferences.
Published: 2011
Full Text: View/download PDF

29. On data fusion in information retrieval using different aggregation operators

Author: Julien Ah-Pine, Xerox Research Centre Europe [Meylan], and Xerox Company
Subjects: Multicriteria decision, fuzzy results merging, Computer Networks and Communications, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Context (language use), metasearch, 02 engineering and technology, computer.software_genre, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Search engine, Operator (computer programming), Artificial Intelligence, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, unsupervised rank aggregation, data fusion, Information retrieval, Rank (computer programming), Sensor fusion, [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], aggregation operator, 020201 artificial intelligence & image processing, Data mining, Metasearch engine, computer, Software
Abstract: International audience; This paper is concerned with the problem of unsupervised rank aggregation in the context of metasearch in information retrieval. In such tasks, we are given many partial ordered lists of retrieved items provided by many search engines and we want to define a way for aggregating those lists in order to find out a consensus. One classical approach consists in aggregating, for each retrieved item, the scores given by the different search engines. Then, we use the resulting aggregated scores distribution in order to infer a consensus ordered list. In this paper we investigate whether aggregation operators defined in the fields of multi-sensor fusion and multicriteria decision making are of interest for metasearch problems or not. Moreover, another purpose of this paper is to introduce a new aggregation operator, its foundations and its properties. We finally test all these aggregation operators for metasearch tasks using the Letor 2.0 dataset. Our results show that among the studied aggregation functions, the ones which are more compensatory outperform the baseline methods CombSUM and CombMNZ.
Published: 2011
Full Text: View/download PDF

30. Boosting Multi-Task Weak Learners with Applications to Textual and Social Data

Author: Fabien Torre, Boris Chidlovskii, Rémi Gilleron, Jean Baptiste Faddoul, Xerox Research Centre Europe [Meylan], Xerox Company, Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS), Modeling Tree Structures, Machine Learning, and Information Extraction (MOSTRARE), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria), Groupe de Recherche en Apprentissage Automatique (GRAppA - LIFL), and Université de Lille, Sciences et Technologies-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)
Subjects: Boosting (machine learning), business.industry, Computer science, Decision trees, Decision tree, Multi-task learning, Textual and social data, Machine learning, computer.software_genre, Boosting, Support vector machine, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Decision stump, Artificial intelligence, AdaBoost, business, Classifier (UML), computer
Abstract: International audience; Learning multiple related tasks from data simultaneously can improve predictive performance relative to learning these tasks independently. In this paper we propose a novel multi-task learning algorithm called MT-Adaboost: it extends Adaboost algorithm to the multi-task setting; it uses as multi-task weak classifier a multi-task decision stump. This allows to learn different dependencies between tasks for different regions of the learning space. Thus, we relax the conventional hypothesis that tasks behave similarly in the whole learning space. Moreover, MT-Adaboost can learn multiple tasks without imposing the constraint of sharing the same label set and/or examples between tasks. A theoretical analysis is derived from the analysis of the original Adaboost. Experiments for multiple tasks over large scale textual data sets with social context (Enron and Tobacco) give rise to very promising results.
Published: 2010

31. Improving the Fisher Kernel for Large-Scale Image Classification

Author: Florent Perronnin, Thomas Mensink, Jorge Sanchez, Xerox Research Centre Europe [Meylan], Xerox Company, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), and Kostas Daniilidis and Petros Maragos and Nikos Paragios
Subjects: Contextual image classification, business.industry, Fisher kernel, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Scale-invariant feature transform, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], 020207 software engineering, Fisher vector, Pattern recognition, 02 engineering and technology, Pascal (programming language), Mixture model, Machine learning, computer.software_genre, ComputingMethodologies_PATTERNRECOGNITION, Discriminative model, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, computer.programming_language, Mathematics
Abstract: International audience; The Fisher kernel (FK) is a generic framework which combines the benefits of generative and discriminative approaches. In the context of image classification the FK was shown to extend the popular bag-of-visual-words (BOV) by going beyond count statistics. However, in practice, this enriched representation has not yet shown its superiority over the BOV. In the first part we show that with several well-motivated modifications over the original framework we can boost the accuracy of the FK. On PASCAL VOC 2007 we increase the Average Precision (AP) from 47.9% to 58.3%. Similarly, we demonstrate state-of-the-art accuracy on CalTech 256. A major advantage is that these results are obtained using only SIFT descriptors and costless linear classifiers. Equipped with this representation, we can now explore image classification on a larger scale. In the second part, as an application, we compare two abundant resources of labeled images to learn classifiers: ImageNet and Flickr groups. In an evaluation involving hundreds of thousands of training images we show that classifiers learned on Flickr groups perform surprisingly well (although they were not intended for this purpose) and that they can complement classifiers learned on more carefully annotated datasets.
Published: 2010
Full Text: View/download PDF

32. Image Annotation with TagProp on the MIRFLICKR set

Author: Matthieu Guillaumin, Cordelia Schmid, Jakob Verbeek, Thomas Mensink, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Xerox Research Centre Europe [Meylan], and Xerox Company
Subjects: Information retrieval, Computer science, business.industry, Image annotation, Search engine indexing, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], 020207 software engineering, Pattern recognition, 02 engineering and technology, Support vector machine, Set (abstract data type), ACM: H.: Information Systems/H.3: INFORMATION STORAGE AND RETRIEVAL/H.3.3: Information Search and Retrieval, Annotation, Automatic image annotation, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Relevance (information retrieval), Artificial intelligence, business, Image retrieval, ACM: H.: Information Systems/H.3: INFORMATION STORAGE AND RETRIEVAL/H.3.1: Content Analysis and Indexing
Abstract: International audience; Image annotation is an important computer vision problem where the goal is to determine the relevance of annotation terms for images. Image annotation has two main applications: (i) proposing a list of relevant terms to users that want to assign indexing terms to images, and (ii) supporting keyword based search for images without indexing terms, using the relevance estimates to rank images. In this paper we present TagProp, a weighted nearest neighbour model that predicts the term relevance of images by taking a weighted sum of the annotations of the visually most similar images in an annotated training set. TagProp can use a collection of distance measures capturing different aspects of image content, such as local shape descriptors, and global colour histograms. It automatically finds the optimal combination of distances to define the visual neighbours of images that are most useful for annotation prediction. TagProp compensates for the varying frequencies of annotation terms using a term-specific sigmoid to scale the weighted nearest neighbour tag predictions. We evaluate different variants of TagProp with experiments on the MIR Flickr set, and compare with an approach that learns a separate SVM classifier for each annotation term. We also consider using Flickr tags to train our models, both as additional features and as training labels. We find the SVMs to work better when learning from the manual annotations, but TagProp to work better when learning from the Flickr tags. We also find that using the Flickr tags as a feature can significantly improve the performance of SVMs learned from manual annotations.
Published: 2010
Full Text: View/download PDF

33. Modeling images as mixtures of reference images

Author: Yan Liu, Florent Perronnin, Xerox Research Centre Europe [Meylan], Xerox Company, Extraction de Caractéristiques et Identification (imagine), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), and Université de Lyon-Université Lumière - Lyon 2 (UL2)
Subjects: Kullback–Leibler divergence, Contextual image classification, business.industry, Approximation algorithm, Pattern recognition, 02 engineering and technology, Mixture model, 01 natural sciences, 010104 statistics & probability, symbols.namesake, Kernel (image processing), Computer Science::Computer Vision and Pattern Recognition, Convex optimization, 0202 electrical engineering, electronic engineering, information engineering, symbols, [INFO]Computer Science [cs], 020201 artificial intelligence & image processing, Convex combination, Artificial intelligence, 0101 mathematics, business, Gaussian process, Mathematics
Abstract: International audience; A state-of-the-art approach to measure the similarity oftwo images is to model each image by a continuous distribution,generally a Gaussian mixture model (GMM), andto compute a probabilistic similarity between the GMMs.One limitation of traditionalmeasures such as the Kullback-Leibler (KL) divergence and the Probability Product Kernel(PPK) is that they measure a global match of distributions.This paper introduces a novel image representation. Wepropose to approximate an image, modeled by a GMM, asa convex combination of K reference image GMMs, andthen to describe the image as the K-dimensional vector ofmixture weights. The computed weights encode a similaritythat favors local matches (i.e. matches of individual Gaussians)and is therefore fundamentally different from the KLor PPK. Although the computation of the mixture weightsis a convex optimization problem, its direct optimization isdifficult. We propose two approximate optimization algorithms:the first one based on traditional sampling methods,the second one based on a variational bound approximationof the true objective function.We apply this novel representation to the image categorizationproblem and compare its performance to traditionalkernel-based methods. We demonstrate on the PASCALVOC 2007 dataset a consistent increase in classificationaccuracy.
Published: 2009
Full Text: View/download PDF

34. Virtual screening with support vector machines and structure kernels

Author: Pierre Mahé, Jean-Philippe Vert, Xerox Research Centre Europe [Meylan], Xerox Company, Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM), Centre de Bioinformatique (CBIO), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Vert, Jean-Philippe, Mines Paris - PSL (École nationale supérieure des mines de Paris), Institut Curie [Paris]-MINES ParisTech - École nationale supérieure des mines de Paris, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)
Subjects: FOS: Computer and information sciences, Graph kernel, Databases, Factual, Computer science, Drug Evaluation, Preclinical, Ligands, computer.software_genre, Quantitative Biology - Quantitative Methods, 01 natural sciences, Machine Learning (cs.LG), Polynomial kernel, Drug Discovery, [CHIM.CHEM] Chemical Sciences/Cheminformatics, MESH: Ligands, Quantitative Methods (q-bio.QM), [INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM], 0303 health sciences, [SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], General Medicine, [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM], Computer Science Applications, Kernel method, Pharmaceutical Preparations, Kernel (statistics), Radial basis function kernel, MESH: Drug Evaluation, Preclinical, Data mining, Tree kernel, Algorithms, [CHIM.CHEM]Chemical Sciences/Cheminformatics, MESH: Pharmaceutical Preparations, MESH: Algorithms, Machine learning, 03 medical and health sciences, MESH: Computer Simulation, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Artificial Intelligence, Least squares support vector machine, MESH: Artificial Intelligence, Computer Simulation, 030304 developmental biology, business.industry, Organic Chemistry, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG], MESH: Databases, Factual, 0104 chemical sciences, Support vector machine, Computer Science - Learning, 010404 medicinal & biomolecular chemistry, FOS: Biological sciences, Artificial intelligence, [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM], business, computer
Abstract: International audience; Support vector machines and kernel methods have recently gained considerable attention in chemoinformatics. They offer generally good performance for problems of supervised classification or regression, and provide a flexible and computationally efficient framework to include relevant information and prior knowledge about the data and problems to be handled. In particular, with kernel methods molecules do not need to be represented and stored explicitly as vectors or fingerprints, but only to be compared to each other through a comparison function technically called a kernel. While classical kernels can be used to compare vector or fingerprint representations of molecules, completely new kernels were developed in the recent years to directly compare the 2D or 3D structures of molecules, without the need for an explicit vectorization step through the extraction of molecular descriptors. While still in their infancy, these approaches have already demonstrated their relevance on several toxicity prediction and structure-activity relationship problems.
Published: 2009

35. Towards a methodology for named entities annotation

Author: Karën Fort, Adeline Nazarenko, Maud Ehrmann, Fort, Karën, Laboratoire d'Informatique de Paris-Nord (LIPN), Université Sorbonne Paris Cité (USPC)-Institut Galilée-Université Paris 13 (UP13)-Centre National de la Recherche Scientifique (CNRS), Xerox Research Centre Europe [Meylan], Xerox Company, and Quaero
Subjects: Scheme (programming language), Information retrieval, Computer science, business.industry, [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing, computer.software_genre, Task (project management), [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, Entity linking, Annotation, Named-entity recognition, Ask price, Artificial intelligence, business, computer, Natural language processing, computer.programming_language
Abstract: Poster; International audience; Today, the named entity recognition task is considered as fundamental, but it involves some specific difficulties in terms of annotation. Those issues led us to ask the fundamental question of what the annotators should annotate and, even more important, for which purpose. We thus identify the applications using named entity recognition and, according to the real needs of those applications, we propose to semantically define the elements to annotate. Finally, we put forward a number of methodological recommendations to ensure a coherent and reliable annotation scheme.
Published: 2009
Full Text: View/download PDF

36. A similarity measure between unordered vector sets with application to image categorization

Author: Yan Liu, Florent Perronnin, Extraction de Caractéristiques et Identification (imagine), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), Xerox Research Centre Europe [Meylan], and Xerox Company
Subjects: Contextual image classification, business.industry, Feature extraction, Probabilistic logic, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, 02 engineering and technology, Similarity measure, Mixture model, 030507 speech-language pathology & audiology, 03 medical and health sciences, symbols.namesake, ComputingMethodologies_PATTERNRECOGNITION, Histogram, Computer Science::Computer Vision and Pattern Recognition, 0202 electrical engineering, electronic engineering, information engineering, Maximum a posteriori estimation, symbols, 020201 artificial intelligence & image processing, [INFO]Computer Science [cs], Artificial intelligence, 0305 other medical science, business, Gaussian process, Mathematics
Abstract: International audience; We present a novel approach to compute the similarity between two unordered variable-sized vector sets. To solve this problem, several authors have proposed to model each vector set with a Gaussian mixture model (GMM) and to compute a probabilistic measure of similarity between the GMMs. The main contribution of this paper is to model each vector set with a GMM adapted from a common “universal” GMM using the maximum a posteriori (MAP) criterion. The advantages of this approach are twofold. MAP provides a more accurate estimate of the GMM parameters compared to standard maximum likelihood estimation (MLE) in the challenging case where the cardinality of the vector set is small. Moreover, there is a correspondence between the Gaussians of two GMMs adapted from a common distribution and one can take advantage of this fact to compute efficiently the probabilistic similarity. This work is applied to the image categorization problem: images are modeled as bags of low-level features and classification is performed using a kernel classifier based on the proposed similarity measure. Experimental results on the PASCAL VOC 2006 and VOC 2007 databases show the excellent performance of our approach.
Published: 2008
Full Text: View/download PDF

37. A new registration method based on Log-Euclidean Tensor metrics and its application to genetic studies

Author: Xavier Pennec, Agatha D. Lee, C. Brun, Natasha Lepore, G.I. de Zubicaray, Arthur W. Toga, Yi-Yu Chou, Paul M. Thompson, Margaret J. Wright, Katie L. McMahon, Marina Barysheva, Xerox Research Centre Europe [Meylan], Xerox Company, Laboratory of Neuro Imaging [Los Angeles] (LONI), University of California [Los Angeles] (UCLA), University of California (UC)-University of California (UC), Analysis and Simulation of Biomedical Images (ASCLEPIOS), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Centre for Magnetic Resonance [Brisbanne], University of Queensland [Brisbane], Queensland Institute of Medical Research, and University of California-University of California
Subjects: Drug trial, [SDV.IB.IMA]Life Sciences [q-bio]/Bioengineering/Imaging, business.industry, Physics::Medical Physics, Image registration, Pattern recognition, [INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation, Article, symbols.namesake, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, Group differences, Deformation tensor, Jacobian matrix and determinant, Euclidean geometry, [INFO.INFO-IM]Computer Science [cs]/Medical Imaging, symbols, Statistical analysis, Computer vision, Artificial intelligence, Diffeomorphism, business, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, ComputingMilieux_MISCELLANEOUS, Mathematics
Abstract: In structural brain MRI, group differences or changes in brain structures can be detected using Tensor-Based Morphometry (TBM). This method consists of two steps: (1) a non-linear registration step, that aligns all of the images to a common template, and (2) a subsequent statistical analysis. The numerous registration methods that have recently been developed differ in their detection sensitivity when used for TBM, and detection power is paramount in epidemological studies or drug trials. We therefore developed a new fluid registration method that computes the mappings and performs statistics on them in a consistent way, providing a bridge between TBM registration and statistics. We used the Log-Euclidean framework to define a new regularizer that is a fluid extension of the Riemannian elasticity, which assures diffeomorphic transformations. This regularizer constrains the symmetrized Jacobian matrix, also called the deformation (or strain) tensor. We applied our method to an MRI dataset from 40 fraternal and identical twins, to revealed voxelwise measures of average volumetric differences in brain structure for subjects with different degrees of genetic resemblance.
Published: 2008
Full Text: View/download PDF

38. XTM: A Robust Temporal Text Processor

Author: Caroline Hagège, Xavier Tannier, Xerox Research Centre Europe [Meylan], Xerox Company, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), and Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11)
Subjects: Computer science, 02 engineering and technology, computer.software_genre, 0202 electrical engineering, electronic engineering, information engineering, Semantic memory, Dimension (data warehouse), relations temporelles, Temporal information, traitement automatique de la langue, 060201 languages & linguistics, Information retrieval, business.industry, Information processor, 06 humanities and the arts, Noun phrase, extraction d'information, [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, TAL, Research centre, [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], 0602 languages and literature, temps, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing
Abstract: International audience; We present in this paper the work that has been developed at Xerox Research Centre Europe to build a robust temporal text processor. The aim of this processor is to extract events described in texts and to link them, when possible, to a temporal anchor. Another goal is to be able to establish temporal ordering between the events expressed in texts. One of the originalities of this work is that the temporal processor is coupled with a syntactic-semantic analyzer. The temporal module takes then advantage of syntactic and semantic information extracted from text and at the same time, syntactic and semantic processing benefits from the temporal processing performed. As a result, analysis and management of temporal information is combined with other kinds of syntactic and semantic information, making possible a more refined text understanding processor that takes into account the temporal dimension.
Published: 2008

39. XRCE-T: XIP Temporal Module for TempEval Campaign

Author: Xavier Tannier, Caroline Hagège, Xerox Research Centre Europe [Meylan], Xerox Company, Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), and Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11)
Subjects: Computer science, business.industry, expressions temporelles, computer.software_genre, Syntax, Traitement automatique des langues, Extraction d'information, Complement (complexity), [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, TAL, [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], temps, Computer vision, Artificial intelligence, business, computer, Natural language processing
Abstract: http://www.limsi.fr/Individu/xtannier/Publications/Tannier_TempEval07_abstract.html; International audience; We present the system we used for the TempEval competition. This system relies on a deep syntactic analyzer that has been extended for the treatment of temporal ex-pressions. So, together with the temporal treatment needed for TempEval purposes, further syntactico-semantic information is also calculated, making thus temporal processing a complement for a better gen-eral purpose text understanding system.
Published: 2007

40. Hierarchical Part-Based Visual Object Categorization

Author: Bill Triggs, Guillaume Bouchard, Xerox Research Centre Europe [Meylan], Xerox Company, Learning and recognition in vision (LEAR), Laboratoire d'informatique GRAphique, VIsion et Robotique de Grenoble (GRAVIR - IMAG), Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Cordelia Schmid and Stefano Soatto and Carlo Tomasi, and Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Grenoble (INPG)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Grenoble (INPG)-Inria Grenoble - Rhône-Alpes
Subjects: Contextual image classification, business.industry, probability, Probabilistic logic, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Pattern recognition, 02 engineering and technology, Solid modeling, Spatial relation, Generative model, Categorization, Robustness (computer science), 020204 information systems, Expectation–maximization algorithm, 0202 electrical engineering, electronic engineering, information engineering, computational geometry, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Mathematics, image classification
Abstract: International audience; We propose a generative model that codes the geometry and appearance of generic visual object categories as a loose hierarchy of parts, with probabilistic spatial relations linking parts to subparts, soft assignment of subparts to parts, and scale invariant keypoint based local features at the lowest level of the hierarchy. The method is designed to efficiently handle categories containing hundreds of redundant local features, such as those returned by current key-point detectors. This robustness allows it to outperform constellation style models, despite their stronger spatial models. The model is initialized by robust bottom-up voting over location-scale pyramids, and optimized by expectation-maximization. Training is rapid, and objects do not need to be marked in the training images. Experiments on several popular datasets show the method's ability to capture complex natural object classes.
Published: 2005
Full Text: View/download PDF

41. A Formal Study of a Visual Language for the Visualization of Document Type Definition

Author: Jean-Yves Vion-Dury, Emmanuel Pietriga, Xerox Research Centre Europe [Meylan], Xerox Company, Tools for Electronic Documents, Research and applications (OPERA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), and IEEE
Subjects: Structure (mathematical logic), Syntax (programming languages), Programming language, business.industry, Computer science, [INFO.INFO-WB]Computer Science [cs]/Web, [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS], 020207 software engineering, 02 engineering and technology, Document type definition, Mathematical proof, computer.software_genre, Visualization, Visual language, 020204 information systems, Formal language, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], business, computer, Natural language processing, Visual programming language
Abstract: This formal study proposes a transformational approach to the definition of general purpose visual languages based on hierarchical structures, addressing more specifically DTD visualization as its application area. We show that such visual languages can be constructed through progressive refinement of a syntax based on nested/juxtaposed rectangles. Several transformation stages, which can all be formally characterized, produce a high quality visual representation which expresses the fundamental properties of the original structure. Moreover, this approach opens some perspectives in proving visual properties through standard mathematical tools such as inductive proofs, thus establishing some practical links between visual language theory and classical language theory.
Published: 2001
Full Text: View/download PDF

42. A RISK ASSESSMENT SYSTEM WITH AUTOMATIC EXTRACTION OF EVENT TYPES

Author: Philippe Capet, Thomas Delavallade, Stavroula Voyatzi, Takuya Nakamura, Ágnes Sándor, Cedric Tarsitano, THALES Land & Joint Systems, THALES, Laboratoire d'Informatique Gaspard-Monge (LIGM), Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM), Xerox Research Centre Europe [Meylan], Xerox Company, Mercier-Laurent, Eunikka and Leake, David, Nakamura, Takuya, Mercier-Laurent, Eunikka and Leake, David, THALES [France], and Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS)
Subjects: Lexique-Grammaire, Computer science, Event (computing), business.industry, 02 engineering and technology, computer.software_genre, Machine learning, Fuzzy logic, [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Information extraction, [INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL], 020204 information systems, Dependency grammar, 0202 electrical engineering, electronic engineering, information engineering, Early warning system, Graph (abstract data type), 020201 artificial intelligence & image processing, Artificial intelligence, Data mining, Risk assessment, business, computer, Natural language
Abstract: International audience; In this article we describe the joint effort of experts in linguistics, information extraction and risk assessment to integrate EventSpotter, an automatic event extraction engine, into ADAC, an automated early warning system. By detecting as early as possible weak signals of emerging risks ADAC provides a dynamic synthetic picture of situations involving risk. The ADAC system calculates risk on the basis of fuzzy logic rules operated on a template graph whose leaves are event types. EventSpotter is based on a general purpose natural language dependency parser, XIP, enhanced with domain-specific lexical resources (Lexicon- Grammar). Its role is to automatically feed the leaves with input data.

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

42 results on '"Xerox Company"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources