24 results on '"Luc Van Gool"'
Search Results
2. Saliency Prediction with Active Semantic Segmentation
- Author
-
Juan Xu, Gemma Roig, Qi Zhao, Ming Jiang, Xavier Boix, and Luc Van Gool
- Subjects
business.industry ,Computer science ,Scale-space segmentation ,Segmentation ,Computer vision ,Artificial intelligence ,business - Abstract
Ming Jiang1 mjiang@u.nus.edu Xavier Boix1,3 elexbb@nus.edu.sg Juan Xu1 jxu@nus.edu.sg Gemma Roig2,3 gemmar@mit.edu Luc Van Gool3 vangool@vision.ee.ethz.ch Qi Zhao1 eleqiz@nus.edu.sg 1 Department of Electrical and Computer Engineering National University of Singapore Singapore 2 CBMM, LCSL Massachusetts Institute of Technology Istituto Italiano di Tecnologia Cambridge, MA 3 Computer Vision Laboratory ETH Zurich Switzerland
- Published
- 2015
3. Learning to Rank Histograms for Object Retrieval
- Author
-
Yuhua Chen, Matthieu Guillaumin, Danfeng Qin, and Luc Van Gool
- Subjects
Computer science ,business.industry ,Histogram ,Learning to rank ,Pattern recognition ,Artificial intelligence ,Object (computer science) ,business - Published
- 2014
4. Frankenhorse: Automatic Completion of Articulating Objects from Image-based Reconstruction
- Author
-
Nikolay Kobyshev, Alex Mansfield, William S. C. Chang, Hayko Riemenschneider, and Luc Van Gool
- Subjects
Theoretical computer science ,Matching (graph theory) ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,PSI_VISICS ,Object (computer science) ,Pipeline (software) ,Set (abstract data type) ,Structure from motion ,Computer vision ,Polygon mesh ,Noise (video) ,Artificial intelligence ,business ,Focus (optics) ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Reconstruction of scene geometry and semantics are important problems in vision, and increasingly brought together. The state of the art in Structure from Motion and Multi View Stereo (SfM+MVS) can already create accurate, dense reconstructions of scenes. Systems such as CMPMVS [2] are freely available and produce impressive results automatically. However, when assumptions break down or there is insufficient data, noise, extraneous geometry and holes appear in the reconstruction. We propose to solve these problems by introducing prior knowledge. We focus on the difficult class of articulating objects, such as people and animals. Prior modelling of these classes is difficult due to the articulation and large intra-class variation. We propose an automatic method for completion which does not rely on a prior model of the deformation or training data captured under controlled conditions. Instead, given far from perfect reconstructions, we simultaneously complete each using the well-reconstructed parts of the others. This is enabled by the data-driven piecewise-rigid 3D model alignment method of Chang and Zwicker [1]. This method estimates local coordinate frames on the meshes and proposes correspondences by matching local descriptors. Each correspondence determines a rigid alignment, which is used as a label in a graph labelling problem to determine a piecewise-rigid alignment which brings the meshes into correspondence while penalising stretching edges. Our main contributions are as follows. We present a novel, fully automatic method for the completion of noisy real SfM+MVS reconstructions which (1) exploits a set of noisy reconstructions of objects of the class, rather than relying on a large clean training set which is expensive to collect, (2) handles the articulation structure in the class of objects, allowing larger holes to be filled and with greater accuracy than a generic smoothness prior and (3) is exemplar-based, allowing details to be maintained that may be smoothed out in related learning-based approaches. Our method takes as its input sets of images of scenes each containing an object of a specific class. For each input image set, initially yielding an incomplete and cluttered reconstruction of the whole scene, the output is a completed model of the object, created using the other reconstructions. Our method consists of a pipeline of several stages, visualised in Figure 1. In the first stage, each scene is reconstructed using a SfM+MVS pipeline [2]. We then segment the objects from the scene by combining object detections in the images. In the third stage, we align each of
- Published
- 2014
5. Metric Learning from Poses for Temporal Clustering of Human Motion
- Author
-
Adolfo Lopez-Mendez, Luc Van Gool, Juergen Gall, Josep R. Casas, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Universitat Politècnica de Catalunya. GPI - Grup de Processament d'Imatge i Vídeo
- Subjects
Temporal clustering ,Sequence ,Computer science ,business.industry ,Semantic interpretation ,Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo [Àrees temàtiques de la UPC] ,Machine learning ,computer.software_genre ,Human motion ,Task (project management) ,Action (philosophy) ,Metric (mathematics) ,Imatges -- Processament --Tècniques digitals ,Artificial intelligence ,Image processing -- Digital techniques ,Semantic information ,business ,computer - Abstract
Temporal clustering of human motion into semantically meaningful behaviors is a challenging task. While unsupervised methods do well to some extent, the obtained clusters often lack a semantic interpretation. In this paper, we propose to learn what makes a sequence of human poses different from others such that it should be annotated as an action. To this end, we formulate the problem as weakly supervised temporal clustering for an unknown number of clusters. Weak supervision is attained by learning a metric from the implicit semantic distances derived from already annotated databases. Such a metric contains some low-level semantic information that can be used to effectively segment a human motion sequence into distinct actions or behaviors. The main advantage of our approach is that metrics can be successfully used across datasets, making our method a compelling alternative to unsupervised methods. Experiments on publicly available mocap datasets show the effectiveness of our approach.
- Published
- 2012
6. Sparsity Potentials for Detecting Objects with the Hough Transform
- Author
-
Luc Van Gool, Juergen Gall, Nima Sedaghat Alvar, and Nima Razavi
- Subjects
Statistical assumption ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Inference ,Pattern recognition ,Pascal (programming language) ,Object (computer science) ,Measure (mathematics) ,Object detection ,Hough transform ,law.invention ,Discriminative model ,law ,Computer vision ,Artificial intelligence ,business ,computer ,Mathematics ,computer.programming_language - Abstract
Hough transform based object detectors divide an object into a number of patches and combine them using a shape model. For efficient combination of patches into the shape model, the individual patches are assumed to be independent of one another. Although this independence assumption is key for fast inference, it requires the individual patches to have a high discriminative power in predicting the class and location of objects. In this paper, we argue that the sparsity of the appearance of a patch in its neighborhood can be a very powerful measure to increase the discriminative power of a local patch and incorporate it as a sparsity potential for object detection. Further, we show that this potential shall depend on the appearance of the patch to adapt to the statistics of the neighborhood specific to the type of appearance (e.g. texture or structure) it represents. We have evaluated our method on challenging datasets including the PASCAL VOC 2007 dataset and show that using the proposed sparsity potential result in a substantial improvement in the detection accuracy.
- Published
- 2012
7. A Training-free Classification Framework for Textures, Writers, and Materials
- Author
-
Radu Timofte and Luc Van Gool
- Subjects
World Wide Web ,Computer science ,Training (civil) - Published
- 2012
8. On-line Hough Forests
- Author
-
Peter M. Roth, Horst Bischof, Luc Van Gool, Samuel Schulter, and Christian Leistner
- Subjects
Tree (data structure) ,Offset (computer science) ,law ,Computer science ,Video tracking ,Decision tree ,Upper and lower bounds ,Algorithm ,Object detection ,Hough transform ,law.invention ,Random forest - Abstract
Recently, Gall & Lempitsky [6] and Okada [9] introduced Hough Forests (HF), which emerged as a powerful tool in object detection, tracking and several other vision applications. HFs are based on the generalized Hough transform [2] and are ensembles of randomized decision trees, consisting of both classification and regression nodes, which are trained recursively. Densly sampled patches of the target object {Pi = (Ai,yi,di)} represent the training data, where Ai is the appearance, yi the label, and di a vector pointing to the center of the object. Each node tries to find an optimal splitting function by either optimizing the information gain for classification nodes or the variance of offset vectors di for regression nodes. This yields quite clean leaf nodes according to both, appearance and offset. However, typically HFs are trained in off-line mode, which means that they assume having access to the entire training set at once. This limits their application in situations where the data arrives sequentially, e.g., in object tracking, in incremental, or large-scale learning. For all of these applications, on-line methods inherently can perform better. Thus, we propose in this paper an on-line learning scheme for Hough forests, which allows to extend their usage to further applications, such as the tracking of arbitrary target instances or large-scale learning of visual classifiers. Growing such a tree in an on-line fashion is a difficult task, as errors in the hard splitting rules cannot be corrected easily further down the tree. While Godec et al. [8] circumvent the recursive on-line update of classification trees by randomly growing the trees to their full size and just update the leaf node statistics, we integrate the ideas from [5, 10] that follow a tree-growing principle. The basic idea there is to start with a tree consisting of only one node, which is the root node and the only leaf at that time. Each node collects the data falling in it and decides on its own, based on a certain splitting criterion, whether to split this node or to further update the statistics. Although the splitting criteria in [5, 10] have strong theoretical support, we will show in the experiments that it even suffices to only count the number n of samples Pi that a node has already incorporated and split when n > γ , where γ is a predefined threshold. An overview of this procedure is given in Figure 1. This splitting criterion requires to find reasonable splitting functions with only a small subset of the data, which does not necessarily have to be a disadvantage when building random forests. As stated in Breiman [4], the upper bound for the generalization error of random forests can be optimized with a high strength of the individual trees but also a low correlation between them. To this end, we derive a new but simple splitting procedure for off-line HFs based on subsampling the input space on the node level, which can further decrease the correlation between the trees. That is, each node in a tree randomly samples a predefined number γ of data samples uniformly over all available data at the current node, which is then used for finding a good splitting function. In the first experiment, we demonstrate on three object detection data sets that both, our on-line formulation and subsample splitting scheme, can reach similar performance compared to the classical Hough forests and can even outperform them, see Figures 2(a)&(b). Additionally, during training both proposed methods are orders of magnitudes faster than the original approach (Figure 2(c)). In the second part of the experiments, we demonstrate the power of our method on visual object tracking. Especially, our focus lies on tracking objects of a priori unknown classes, as class-specific tracking with off-line forests has already been demonstrated before [7]. We present results on seven tracking data sets and show that our on-line HFs can outperform state-of-the-art tracking-by-detection methods. Figure 1: While labeled samples arrive on-line, each tree propagates the sample to the corresponding leaf node, which decides whether to split the current leaf or to update its statistics.
- Published
- 2011
9. Transforming Image Completion
- Author
-
Carsten Rother, Pushmeet Kohli, Alex Mansfield, Toby Sharp, Luc Van Gool, and Mukta Prasad
- Subjects
Brightness ,Exploit ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Key (cryptography) ,Pattern recognition ,Artificial intelligence ,business ,Scale (map) ,Rotation (mathematics) ,Task (project management) ,Image (mathematics) - Abstract
Image completion is an important photo-editing task which involves synthetically filling a hole in the image such that the image still appears natural. State-of-the-art image completion methods work by searching for patches in the image that fit well in the hole region. Our key insight is that image patches remain natural under a variety of transformations (such as scale, rotation and brightness change), and it is important to exploit this. We propose and investigate the use of different optimisation methods to search for the best patches and their respective transformations for producing consistent, improved completions. Experiments on a number of challenging problem instances demonstrate that our methods outperform state-of-the-art techniques.
- Published
- 2011
10. Efficient 3D object detection using multiple pose-specific classifiers
- Author
-
Michael Villamizar, Helmut Grabner, Francesc Moreno-Noguer, Luc Van Gool, Alberto Sanfeliu, Juan Andrade-Cetto, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Institut de Robòtica i Informàtica Industrial, Universitat Politècnica de Catalunya. VIS - Visió Artificial i Sistemes Intel·ligents, and Universitat Politècnica de Catalunya. VIS - Visió Artificial i Sistemes Intel.ligents
- Subjects
Speedup ,Computer science ,business.industry ,Visió per ordinador ,Estimator ,020207 software engineering ,02 engineering and technology ,Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo [Àrees temàtiques de la UPC] ,3D pose estimation ,Object detection ,Feature sharing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Detection rate ,business ,Classifier (UML) ,Pattern recognition::Computer vision [Classificació INSPEC] - Abstract
Presentado al 22nd BMVC celebrado en University of Dundee (Escocia) en septiembre de 2011., We propose an efficient method for object localization and 3D pose estimation. A two-step approach is used. In the first step, a pose estimator is evaluated in the input images in order to estimate potential object locations and poses. These candidates are then validated, in the second step, by the corresponding pose-specific classifier. The result is a detection approach that avoids the inherent and expensive cost of testing the complete set of specific classifiers over the entire image. A further speedup is achieved by feature sharing. Features are computed only once and are then used for evaluating the pose estimator and all specific classifiers. The proposed method has been validated on two public datasets for the problem of detecting of cars under several views. The results show that the proposed approach yields high detection rates while keeping efficiency., This work was supported by the Spanish Ministry of Science and Innovation under Projects RobTaskCoop (DPI2010-17112), PAU (DPI2008-06022), and MIPRCV (Consolider - Ingenio 2010 CSD2007-00018), and the EU CEEDS Project FP7-ICT-2009-5-95682.
- Published
- 2011
11. Sparse Representation Based Projections
- Author
-
Luc Van Gool, Radu Timofte, Jesse, Hoey, Stephen, McKenna, and Emanuele, Trucco
- Subjects
Exploit ,business.industry ,Dimensionality reduction ,Pattern recognition ,Sparse approximation ,PSI_VISICS ,Embedding ,Point of departure ,Artificial intelligence ,sparse representation ,business ,dimensionality reduction ,Intuition ,Mathematics - Abstract
In dimensionality reduction most methods aim at preserving one or a few properties of the original space in the resulting embedding. As our results show, preserving the sparse representation of the signals from the original space in the (lower) dimensional projected space is beneficial for several benchmarks (faces, traffic signs, and handwritten digits). The intuition behind is that taking a sparse representation for the different samples as point of departure highlights the important correlations among the samples that one then wants to exploit to arrive at the final, effective low-dimensional embedding. We explicitly adapt the LPP and LLE techniques to work with the sparse representation criterion and compare to the original methods on the referenced databases, and this for both unsupervised and supervised cases. The improved results corroborate the usefulness of the proposed sparse representation based linear and non-linear projections. Timofte R., Van Gool L., ''Sparse representation based projections'', Proceedings 22nd British machine vision conference - BMVC 2011, pp. 61.1-61.12, August 29 - September 2, 2011, Dundee, Scotland. ispartof: pages:61-61 ispartof: Proceedings of the 22nd British machine vision conference - BMVC 2011 pages:61-61 ispartof: British machine vision conference - BMVC 2011 location:Dundee, Scotland date:29 Aug - 2 Sep 2011 status: published
- Published
- 2011
12. Does Human Action Recognition Benefit from Pose Estimation?
- Author
-
Juergen Gall, Angela Yao, Luc Van Gool, and Gabriele Fanelli
- Subjects
Computer science ,business.industry ,Action recognition ,Computer vision ,Artificial intelligence ,3D pose estimation ,business ,Pose ,Classifier (UML) ,Articulated body pose estimation - Abstract
Early works on human action recognition focused on tracking and classifying articulated body motions. Such methods required accurate localisation of body parts, which is a difficult task, particularly under realistic imaging conditions. As such, recent trends have shifted towards the use of more abstract, low-level appearance features such as spatio-temporal interest points. Motivated by the recent progress in pose estimation, we feel that pose-based action recognition systems warrant a second look. In this paper, we address the question of whether pose estimation is useful for action recognition or if it is better to train a classifier only on low-level appearance features drawn from video data. We compare pose-based, appearance-based and combined pose and appearance features for action recognition in a home-monitoring scenario. Our experiments show that posebased features outperform low-level appearance features, even when heavily corrupted by noise, suggesting that pose estimation is beneficial for the action recognition task.
- Published
- 2011
13. Temporal Relations in Videos for Unsupervised Activity Analysis
- Author
-
Helmut Grabner, Luc Van Gool, and Fabian Nater
- Subjects
Structure (mathematical logic) ,Computer science ,business.industry ,Repertoire ,Video sequence ,Pattern recognition ,Artificial intelligence ,Data mining ,business ,computer.software_genre ,computer ,Sequence (medicine) - Abstract
Observing the different video sequences in Fig. 1, increments between frames are quite small compared to the changes throughout the whole sequence. For instance, the behavior of a tracked person (2nd row) is composed of a certain repertoire of activities with transitions in between that are typically short in comparison. This can also be observed at larger scales, like day-night changes or seasonal changes (3rd and 4th row) and already suggests a hierarchical structure.
- Published
- 2011
14. Object and Action Classification with Latent Variables
- Author
-
Luc Van Gool, Vinay P. Namboodiri, Hakan Bilen, Hoey, Emanuele Jesse and McKenna, and Hoey, Stephen and Trucco, Emanuele Jesse and McKenna
- Subjects
Probabilistic latent semantic analysis ,business.industry ,Pattern recognition ,Latent variable ,PSI_VISICS ,Parameter space ,Machine learning ,computer.software_genre ,Latent class model ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Action recognition ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Mathematics - Abstract
In this paper we propose a generic framework to incorporate unobserved auxiliary information for classifying objects and actions. This framework allows us to explicitly account for localisation and alignment of representations for generic object and action classes as latent variables. We approach this problem in the discriminative setting as learning a max-margin classifier that infers the class label along with the latent variables. Through this paper we make the following contributions a) We provide a method for incorporating latent variables into object and action classification b) We specifically account for the presence of an explicit class related subregion which can include foreground and/or background. c) We explore a way to learn a better classifier by iterative expansion of the latent parameter space. We demonstrate the performance of our approach by rigorous experimental evaluation on a number of standard object and action recognition datasets. © 2011. The copyright of this document resides with its authors. Bilen H., Namboodiri V.P., Van Gool L., ''Object and action classification with latent variables'', Proceedings 22nd British machine vision conference - BMVC 2011, August 29 - September 2, 2011, Dundee, Scotland (best paper award). ispartof: pages:1-11 ispartof: Proceedings 22nd British machine vision conference - BMVC 2011 vol:25 issue:17 pages:1-11 ispartof: British machine vision conference - BMVC 2011 location:Dundee, Scotland date:29 Aug - 2 Sep 2011 status: published
- Published
- 2011
15. Automatic annotation of unique locations from video and text
- Author
-
Sien Moens, Chris Engels, Koen Deschacht, Luc Van Gool, Jan Hendrik Becker, and Tinne Tuytelaars
- Subjects
Scheme (programming language) ,Topic model ,Similarity (geometry) ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Television series ,Pattern recognition ,computer.software_genre ,Latent Dirichlet allocation ,symbols.namesake ,Annotation ,symbols ,Segmentation ,Data mining ,Artificial intelligence ,Single episode ,business ,computer ,computer.programming_language - Abstract
Given a video and associated text, we propose an automatic annotation scheme in which we employ a latent topic model to generate topic distributions from weighted text and then modify these distributions based on visual similarity. We apply this scheme to location annotation of a television series for which transcripts are available. The topic distributions allow us to avoid explicit classification, which is useful in cases where the exact number of locations is unknown. Moreover, many locations are unique to a single episode, making it impossible to obtain representative training data for a supervised approach. Our method first segments the episode into scenes by fusing cues from both images and text. We then assign location-oriented weights to the text and generate topic distributions for each scene using Latent Dirichlet Allocation. Finally, we update the topic distributions using the distributions of visually similar scenes. We formulate our visual similarity between scenes as an Earth Mover’s Distance problem. We quantitatively validate our multi-modal approach to segmentation and qualitatively evaluate the resulting location annotations. Our results demonstrate that we are able to generate accurate annotations, even for locations only seen in a single episode.
- Published
- 2010
16. On-line Adaption of Class-specific Codebooks for Instance Tracking
- Author
-
Luc Van Gool, Nima Razavi, and Juergen Gall
- Subjects
Set (abstract data type) ,Matching (graph theory) ,Computer science ,business.industry ,Line (geometry) ,Probabilistic logic ,Codebook ,Class (philosophy) ,Computer vision ,Artificial intelligence ,Object (computer science) ,business ,Image (mathematics) - Abstract
In this work, we demonstrate that an off-line trained class-specific detector can be transformed into an instance-specific detector on-the-fly. To this end, we make use of a codebook-based detector [1] that is trained on an object class. Codebooks model the spatial distribution and appearance of object parts. When matching an image against a codebook, a certain set of codebook entries is activated to cast probabilistic votes for the object. For a given object hypothesis, one can collect the entries that voted for the object. In our case, these entries can be regarded as a signature for the target of interest. Since a change of pose and appearance can lead to an activation of very different codebook entries, we learn the statistics for the target and the background over time, i.e. we learn on-line the probability of each part in the codebook belonging to the target. By taking the target-specific statistics into account for voting, the target can be distinguished from other instances in the background yielding a higher detection confidence for the target, see Fig. 1. A class-specific codebook as in [1, 2, 3, 4, 5] is trained off-line to identify any instance of the class in any image. It models the probability of the patches belonging to the object class p ( c=1|L ) and the local spatial distribution of the patches with respect to the object center p ( x|c=1,L ) . For detection, patches are sampled from an image and matched against the codebook, i.e. each patch P(y) sampled from image location y ends at a leaf L(y). The probability for an instance of the class centered at the location x is then given by
- Published
- 2010
17. PRISM: PRincipled Implicit Shape Model
- Author
-
Alain Lehmann, Luc Van Gool, and Bastian Leibe
- Subjects
Implicit Shape Model ,business.industry ,Heuristic ,Kernel density estimation ,Probabilistic logic ,Statistical model ,Generalised Hough transform ,Artificial intelligence ,business ,Mixture model ,Algorithm ,Object detection ,Mathematics - Abstract
This paper addresses the problem of object detection by means of the Generalised Hough transform paradigm. The Implicit Shape Model (ISM) is a well-known approach based on this idea. It made this paradigm popular and has been adopted many times. Although the algorithm exhibits robust detection performance, its description, i.e. its probabilistic model, involves arguments which are unsatisfactory from a probabilistic standpoint. We propose a framework which overcomes these problems and gives a sound justification to the voting procedure. Furthermore, our framework allows for a formal understanding of the heuristic of soft-matching commonly used in visual vocabulary systems. We show that it is sufficient to use soft-matching during learning only and to perform fast nearest neighbour matching at recognition time (where speed is of prime importance). Our implementation is based on Gaussian Mixture Models (instead of kernel density estimators as with ISM) which lead to a fast gradient-based object detector.
- Published
- 2009
18. Segmentation-Based Urban Traffic Scene Understanding
- Author
-
Helmut Grabner, Luc Van Gool, Tobias Mueller, and Andreas Ess
- Subjects
Contextual image classification ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pedestrian crossing ,Object detection ,Depth map ,Video tracking ,Segmentation ,Computer vision ,Artificial intelligence ,AdaBoost ,business ,Smoothing - Abstract
Recognizing the traffic scene in front of a car is an important asset for autonomous driving, as well as for safety systems. While GPS-based maps abound and have reached an incredible level of accuracy, they can still profit from additional, image-based information. Especially in urban scenarios, GPS reception can be shaky, or the map might not contain the latest detours due to constructions, demonstrations, etc. Furthermore, such maps are static and cannot account for other dynamic traffic agents, such as cars or pedestrians. In this paper, we therefore propose an image-based system that is able to recognize both the road type (straight, left/right curve, crossing, ...) as well as a set of often encountered objects (car, pedestrian, pedestrian crossing). The obtained information could then be fused with existing maps and either assist the driver directly (e.g., a pedestrian crossing is ahead: slow down) or help in improving object tracking (e.g., where are possible entrance points for pedestrians or cars?). Starting from a video sequence obtained from a car driving through urban areas, we employ a two-stage architecture termed SegmentationBased Urban Traffic Scene Understanding (SUTSU) that first builds an intermediate representation of the image based on a patch-wise image classification. The patch-wise segmentation is inspired by recent work [3, 4, 5] and assigns class probabilities to every 8× 8 image patch. As a feature set, we use the coefficients of the Walsh-Hadamard transform (a decomposition of the image into square waves), and, if available, additional information from the depth map. These are then used in a oneversus-all training using AdaBoost for feature selection, where we choose 13 texture classes that we found to be representative of typical urban scenes. This yields a meta representation of the scene that is more suitable for further processing, Fig. 1 (b,c). In recent publications, such a segmentation was used for a variety of purposes, such as improvement of object detection [1, 5], analysis of occlusion boundaries, or 3D reconstruction. In this paper, we will investigate the use of a segmentation for urban scene analysis. We infer another set of features from the segmentation’s probability maps, analyzing repetitivity, curvature, and rough structure. This set is then again used with a one-versus-all training to infer both the type of road segment ahead, as well the additional presence of pedestrians, cars, or pedestrian crossing. A Hidden Markov Model is used for temporally smoothing the result. SUTSU is tested on two challenging sequences, spanning over 50 minutes video of driving through Zurich. The experiments show that while a state-of-the-art scene classifier [2] can keep global classes such as road types, similarly well apart, a manually crafted feature set based on a segmentation clearly outperforms it on object classes. Example images are shown in Fig. 2. The main contribution of this paper is the application of recent research efforts in scene categorization research to do vision “in the wild”, driving through urban scenarios. We furthermore show the advantage of a segmentation-based approach over a global descriptor, as the intermediate representation can easily be adapted to other underlying image data (e.g. dusk, rain, ...), without having to change the high-level classifier.
- Published
- 2009
19. Exemplar-based Action Recognition in Video
- Author
-
Tinne Tuytelaars, Luc Van Gool, Geert Willems, and Jan Hendrik Becker
- Subjects
business.industry ,Brute-force search ,Pattern recognition ,Domain (software engineering) ,Action (philosophy) ,Discriminative model ,Spatial reference system ,Minimum bounding box ,Sliding window protocol ,Computer vision ,Visual Word ,Artificial intelligence ,business ,Mathematics - Abstract
In this work, we present a method for action localization and recognition using an exemplar-based approach. It starts from local dense yet scale-invariant spatio-temporal features. The most discriminative visual words are selected and used to cast bounding box hypotheses, which are then verified and further grouped into the final detections. To the best of our knowledge, we are the first to extend the exemplar-based approach using local features into the spatio-temporal domain. This allows us to avoid the problems that typically plague sliding window-based approaches - in particular the exhaustive search over spatial coordinates, time, and spatial as well as temporal scales. We report state-ofthe-art results on challenging datasets, extracted from real movies, for both classification and localization.
- Published
- 2009
20. Hough Transform-based Mouth Localization for Audio-Visual Speech Recognition
- Author
-
Gabriele Fanelli, Luc Van Gool, and Jürgen Gall
- Subjects
genetic structures ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Probabilistic logic ,Audio-visual speech recognition ,Hough transform ,law.invention ,ComputingMethodologies_PATTERNRECOGNITION ,stomatognathic system ,law ,Computer vision ,Artificial intelligence ,Invariant (mathematics) ,business ,Sensory cue ,ComputingMethodologies_COMPUTERGRAPHICS ,Mathematics - Abstract
We present a novel method for mouth localization in the context of multimodal speech recognition where audio and visual cues are fused to improve the speech recognition accuracy. While facial feature points like mouth corners or lip contours are commonly used to estimate at least scale, position, and orientation of the mouth, we propose a Hough transform-based method. Instead of relying on a predefined sparse subset of mouth features, it casts probabilistic votes for the mouth center from several patches in the neighborhood and accumulates the votes in a Hough image. This makes the localization more robust as it does not rely on the detection of a single feature. In addition, we exploit the different shape properties of eyes and mouth in order to localize the mouth more efficiently. Using the rotation invariant representation of the iris, scale and orientation can be efficiently inferred from the localized eye positions. The superior accuracy of our method and quantitative improvements for audio-visual speech recognition over monomodal approaches are demonstrated on two datasets.
- Published
- 2009
21. An Efficient Shared Multi-Class Detection Cascade
- Author
-
Esther Koller-Meier, Luc Van Gool, and Philipp Zehnder
- Subjects
Cascade ,business.industry ,Detector ,Training phase ,Object detector ,Pattern recognition ,Artificial intelligence ,AdaBoost ,Detection rate ,business ,Classifier (UML) ,Haar wavelet ,Mathematics - Abstract
We propose a novel multi-class object detector, that optimizes the detection costs while retaining a desired detection rate. The detector uses a cascade that unites the handling of similar object classes while separating off classes at appropriate levels of the cascade. No prior knowledge about the relationship between classes is needed as the classifier structure is automatically determined during the training phase. The detection nodes in the cascade use Haar wavelet features and Gentle AdaBoost, however the approach is not dependent on the specific features used and can easily be extended to other cases. Experiments are presented for several numbers of object classes and the approach is compared to other classifying schemes. The results demonstrate a large efficiency gain that is particularly prominent for a greater number of classes. Also the complexity of the training scales well with the number of classes.
- Published
- 2008
22. Generalised Linear Pose Estimation
- Author
-
Alexander Neubeck, Luc Van Gool, and Andreas Ess
- Subjects
Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Usability ,3D pose estimation ,Motion (physics) ,Linear algorithm ,Application domain ,Robot ,Computer vision ,Artificial intelligence ,Special case ,business ,Pose - Abstract
This paper investigates several aspects of 3D-2D camera pose estimation, aimed at robot navigation in poorly-textured scenes. The major contribution is a fast, linear algorithm for the general case with six or more points. We show how to specialise this to work with only four or five points, which is of utmost importance in a test and hypothesis framework. Our formulation allows for an easy inclusion of lines, as well as the handling of other camera geometries, such as stereo rigs. We also treat the special case of planar motion, a valid restriction for most indoor environments. We conclude the paper with extensive simulated tests and a real test case, which substantiate the algorithm’s usability for our application domain.
- Published
- 2007
23. Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions
- Author
-
Tinne Tuytelaars and Luc Van Gool
- Subjects
Alternative methods ,Robustness (computer science) ,business.industry ,Image database ,Hessian affine region detector ,Stereo matching ,Computer vision ,Artificial intelligence ,Invariant (mathematics) ,business ,Mathematics - Abstract
‘Invariant regions’ are image patches that automatically deform with changing viewpoint as to keep on covering identical physical parts of a scene. Such regions are then described by a set of invariant features, which makes it relatively easy to match them between views and under changing illumination. In previous work, we have presented invariant regions that are based on a combination of corners and edges. The application discussed then was image database retrieval. Here, an alternative method for extracting (affinely) invariant regions is given, that does not depend on the presence of edges or corners in the image but is purely intensity-based. Also, we demonstrate the use of such regions for another application, which is wide baseline stereo matching. As a matter of fact, the goal is to build an opportunistic system that exploits several types of invariant regions as it sees fit. This yields more correspondences and a system that can deal with a wider range of images. To increase the robustness of the system even further, two semi-local constraints on combinations of region correspondences are derived (one geometric, the other photometric). They allow to test the consistency of correspondences and hence to reject falsely matched regions.
- Published
- 2000
24. Shape-from-copies
- Author
-
André Oosterlinck, Marc Van Diest, Luc Van Gool, and Theo Moons
- Subjects
Computer science - Published
- 1993
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.