Author: "Lepetit, Vincent" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lepetit, Vincent"' showing total 813 results

Start Over Author "Lepetit, Vincent"

813 results on '"Lepetit, Vincent"'

51. Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

Author: Armagan, Anil, Garcia-Hernando, Guillermo, Baek, Seungryul, Hampali, Shreyas, Rad, Mahdi, Zhang, Zhaohui, Xie, Shipeng, Chen, MingXiu, Zhang, Boshen, Xiong, Fu, Xiao, Yang, Cao, Zhiguo, Yuan, Junsong, Ren, Pengfei, Huang, Weiting, Sun, Haifeng, Hrúz, Marek, Kanis, Jakub, Krňoul, Zdeněk, Wan, Qingfu, Li, Shile, Yang, Linlin, Lee, Dongheui, Yao, Angela, Zhou, Weiguo, Mei, Sijia, Liu, Yunhui, Spurr, Adrian, Iqbal, Umar, Molchanov, Pavlo, Weinzaepfel, Philippe, Brégier, Romain, Rogez, Grégory, Lepetit, Vincent, and Kim, Tae-Kyun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge (HANDS'19) to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. More exactly, HANDS'19 is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand model to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27mm to 13mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of a parametric 3D hand model (MANO), and different HPE methods/backbones., Comment: European Conference on Computer Vision (ECCV), 2020
Published: 2020

52. Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields

Author: Ramamonjisoa, Michael, Du, Yuming, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Current methods for depth map prediction from monocular images tend to predict smooth, poorly localized contours for the occlusion boundaries in the input image. This is unfortunate as occlusion boundaries are important cues to recognize objects, and as we show, may lead to a way to discover new objects from scene reconstruction. To improve predicted depth maps, recent methods rely on various forms of filtering or predict an additive residual depth map to refine a first estimate. We instead learn to predict, given a depth map predicted by some reconstruction method, a 2D displacement field able to re-sample pixels around the occlusion boundaries into sharper reconstructions. Our method can be applied to the output of any depth estimation method, in an end-to-end trainable fashion. For evaluation, we manually annotated the occlusion boundaries in all the images in the test split of popular NYUv2-Depth dataset. We show that our approach improves the localization of occlusion boundaries for all state-of-the-art monocular depth estimation methods that we could evaluate, without degrading the depth accuracy for the rest of the images., Comment: Accepted to CVPR 2020
Published: 2020

53. General 3D Room Layout from a Single View by Render-and-Compare

Author: Stekovic, Sinisa, Hampali, Shreyas, Rad, Mahdi, Sarkar, Sayan Deb, Fraundorfer, Friedrich, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a novel method to reconstruct the 3D layout of a room (walls, floors, ceilings) from a single perspective view in challenging conditions, by contrast with previous single-view methods restricted to cuboid-shaped layouts. This input view can consist of a color image only, but considering a depth map results in a more accurate reconstruction. Our approach is formalized as solving a constrained discrete optimization problem to find the set of 3D polygons that constitute the layout. In order to deal with occlusions between components of the layout, which is a problem ignored by previous works, we introduce an analysis-by-synthesis method to iteratively refine the 3D layout estimate. As no dataset was available to evaluate our method quantitatively, we created one together with several appropriate metrics. Our dataset consists of 293 images from ScanNet, which we annotated with precise 3D layouts. It offers three times more samples than the popular NYUv2 303 benchmark, and a much larger variety of layouts.
Published: 2020

54. AssemblyNet: A large ensemble of CNNs for 3D Whole Brain MRI Segmentation

Author: Coupé, Pierrick, Mansencal, Boris, Clément, Michaël, Giraud, Rémi, de Senneville, Baudouin Denis, Ta, Vinh-Thong, Lepetit, Vincent, and Manjon, José V.
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Whole brain segmentation using deep learning (DL) is a very challenging task since the number of anatomical labels is very high compared to the number of available training images. To address this problem, previous DL methods proposed to use a single convolution neural network (CNN) or few independent CNNs. In this paper, we present a novel ensemble method based on a large number of CNNs processing different overlapping brain areas. Inspired by parliamentary decision-making systems, we propose a framework called AssemblyNet, made of two "assemblies" of U-Nets. Such a parliamentary system is capable of dealing with complex decisions, unseen problem and reaching a consensus quickly. AssemblyNet introduces sharing of knowledge among neighboring U-Nets, an "amendment" procedure made by the second assembly at higher-resolution to refine the decision taken by the first one, and a final decision obtained by majority voting. During our validation, AssemblyNet showed competitive performance compared to state-of-the-art methods such as U-Net, Joint label fusion and SLANT. Moreover, we investigated the scan-rescan consistency and the robustness to disease effects of our method. These experiences demonstrated the reliability of AssemblyNet. Finally, we showed the interest of using semi-supervised learning to improve the performance of our method., Comment: arXiv admin note: substantial text overlap with arXiv:1906.01862
Published: 2019

55. Smart Hypothesis Generation for Efficient and Robust Room Layout Estimation

Author: Hirzer, Martin, Roth, Peter M., and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a novel method to efficiently estimate the spatial layout of a room from a single monocular RGB image. As existing approaches based on low-level feature extraction, followed by a vanishing point estimation are very slow and often unreliable in realistic scenarios, we build on semantic segmentation of the input image. To obtain better segmentations, we introduce a robust, accurate and very efficient hypothesize-and-test scheme. The key idea is to use three segmentation hypotheses, each based on a different number of visible walls. For each hypothesis, we predict the image locations of the room corners and select the hypothesis for which the layout estimated from the room corners is consistent with the segmentation. We demonstrate the efficiency and robustness of our method on three challenging benchmark datasets, where we significantly outperform the state-of-the-art., Comment: Accepted: Winter Conference on Applications of Computer Vision (WACV) 2020
Published: 2019

56. LU-Net: An Efficient Network for 3D LiDAR Point Cloud Semantic Segmentation Based on End-to-End-Learned 3D Features and U-Net

Author: Biasutti, Pierre, Lepetit, Vincent, Aujol, Jean-François, Brédif, Mathieu, and Bugeau, Aurélie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose LU-Net -- for LiDAR U-Net, a new method for the semantic segmentation of a 3D LiDAR point cloud. Instead of applying some global 3D segmentation method such as PointNet, we propose an end-to-end architecture for LiDAR point cloud semantic segmentation that efficiently solves the problem as an image processing problem. We first extract high-level 3D features for each point given its 3D neighbors. Then, these features are projected into a 2D multichannel range-image by considering the topology of the sensor. Thanks to these learned features and this projection, we can finally perform the segmentation using a simple U-Net segmentation network, which performs very well while being very efficient. In this way, we can exploit both the 3D nature of the data and the specificity of the LiDAR sensor. This approach outperforms the state-of-the-art by a large margin on the KITTI dataset, as our experiments show. Moreover, this approach operates at 24fps on a single GPU. This is above the acquisition rate of common LiDAR sensors which makes it suitable for real-time applications., Comment: 9 pages, 9 figures. arXiv admin note: substantial text overlap with arXiv:1905.08748
Published: 2019

57. CorNet: Generic 3D Corners for 6D Pose Estimation of New Objects without Retraining

Author: Pitteri, Giorgia, Ilic, Slobodan, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a novel approach to the detection and 3D pose estimation of objects in color images. Its main contribution is that it does not require any training phases nor data for new objects, while state-of-the-art methods typically require hours of training time and hundreds of training registered images. Instead, our method relies only on the objects' geometries. Our method focuses on objects with prominent corners, which covers a large number of industrial objects. We first learn to detect object corners of various shapes in images and also to predict their 3D poses, by using training images of a small set of objects. To detect a new object in a given image, we first identify its corners from its CAD model; we also detect the corners visible in the image and predict their 3D poses. We then introduce a RANSAC-like algorithm that robustly and efficiently detects and estimates the object's 3D pose by matching its corners on the CAD model with their detected counterparts in the image. Because we also estimate the 3D poses of the corners in the image, detecting only 1 or 2 corners is sufficient to estimate the pose of the object, which makes the approach robust to occlusions. We finally rely on a final check that exploits the full 3D geometry of the objects, in case multiple objects have the same corner spatial arrangement. The advantages of our approach make it particularly attractive for industrial contexts, and we demonstrate our approach on the challenging T-LESS dataset.
Published: 2019

58. On Object Symmetries and 6D Pose Estimation from Images

Author: Pitteri, Giorgia, Ramamonjisoa, Michaël, Ilic, Slobodan, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Objects with symmetries are common in our daily life and in industrial contexts, but are often ignored in the recent literature on 6D pose estimation from images. In this paper, we study in an analytical way the link between the symmetries of a 3D object and its appearance in images. We explain why symmetrical objects can be a challenge when training machine learning algorithms that aim at estimating their 6D pose from images. We propose an efficient and simple solution that relies on the normalization of the pose rotation. Our approach is general and can be used with any 6D pose estimation algorithm. Moreover, our method is also beneficial for objects that are 'almost symmetrical', i.e. objects for which only a detail breaks the symmetry. We validate our approach within a Faster-RCNN framework on a synthetic dataset made with objects from the T-Less dataset, which exhibit various types of symmetries, as well as real sequences from T-Less., Comment: International Conference on 3D Vision
Published: 2019

59. Location Field Descriptors: Single Image 3D Model Retrieval in the Wild

Author: Grabner, Alexander, Roth, Peter M., and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present Location Field Descriptors, a novel approach for single image 3D model retrieval in the wild. In contrast to previous methods that directly map 3D models and RGB images to an embedding space, we establish a common low-level representation in the form of location fields from which we compute pose invariant 3D shape descriptors. Location fields encode correspondences between 2D pixels and 3D surface coordinates and, thus, explicitly capture 3D shape and 3D pose information without appearance variations which are irrelevant for the task. This early fusion of 3D models and RGB images results in three main advantages: First, the bottleneck location field prediction acts as a regularizer during training. Second, major parts of the system benefit from training on a virtually infinite amount of synthetic data. Finally, the predicted location fields are visually interpretable and unblackbox the system. We evaluate our proposed approach on three challenging real-world datasets (Pix3D, Comp, and Stanford) with different object categories and significantly outperform the state-of-the-art by up to 20% absolute in multiple 3D retrieval metrics., Comment: Accepted to International Conference on 3D Vision (3DV) 2019 (Oral)
Published: 2019

60. GP2C: Geometric Projection Parameter Consensus for Joint 3D Pose and Focal Length Estimation in the Wild

Author: Grabner, Alexander, Roth, Peter M., and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a joint 3D pose and focal length estimation approach for object categories in the wild. In contrast to previous methods that predict 3D poses independently of the focal length or assume a constant focal length, we explicitly estimate and integrate the focal length into the 3D pose estimation. For this purpose, we combine deep learning techniques and geometric algorithms in a two-stage approach: First, we estimate an initial focal length and establish 2D-3D correspondences from a single RGB image using a deep network. Second, we recover 3D poses and refine the focal length by minimizing the reprojection error of the predicted correspondences. In this way, we exploit the geometric prior given by the focal length for 3D pose estimation. This results in two advantages: First, we achieve significantly improved 3D translation and 3D pose accuracy compared to existing methods. Second, our approach finds a geometric consensus between the individual projection parameters, which is required for precise 2D-3D alignment. We evaluate our proposed approach on three challenging real-world datasets (Pix3D, Comp, and Stanford) with different object categories and significantly outperform the state-of-the-art by up to 20% absolute in multiple different metrics., Comment: Accepted to International Conference on Computer Vision (ICCV) 2019
Published: 2019

61. Sparse-to-Dense Hypercolumn Matching for Long-Term Visual Localization

Author: Germain, Hugo, Bourmaud, Guillaume, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a novel approach to feature point matching, suitable for robust and accurate outdoor visual localization in long-term scenarios. Given a query image, we first match it against a database of registered reference images, using recent retrieval techniques. This gives us a first estimate of the camera pose. To refine this estimate, like previous approaches, we match 2D points across the query image and the retrieved reference image. This step, however, is prone to fail as it is still very difficult to detect and match sparse feature points across images captured in potentially very different conditions. Our key contribution is to show that we need to extract sparse feature points only in the retrieved reference image: We then search for the corresponding 2D locations in the query image exhaustively. This search can be performed efficiently using convolutional operations, and robustly by using hypercolumn descriptors, i.e. image features computed for retrieval. We refer to this method as Sparse-to-Dense Hypercolumn Matching. Because we know the 3D locations of the sparse feature points in the reference images thanks to an offline reconstruction stage, it is then possible to accurately estimate the camera pose from these matches. Our experiments show that this method allows us to outperform the state-of-the-art on several challenging outdoor datasets.
Published: 2019

62. HOnnotate: A method for 3D Annotation of Hand and Object Poses

Author: Hampali, Shreyas, Rad, Mahdi, Oberweger, Markus, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a method for annotating images of a hand manipulating an object with the 3D poses of both the hand and the object, together with a dataset created using this method. Our motivation is the current lack of annotated real images for this problem, as estimating the 3D poses is challenging, mostly because of the mutual occlusions between the hand and the object. To tackle this challenge, we capture sequences with one or several RGB-D cameras and jointly optimize the 3D hand and object poses over all the frames simultaneously. This method allows us to automatically annotate each frame with accurate estimates of the poses, despite large mutual occlusions. With this method, we created HO-3D, the first markerless dataset of color images with 3D annotations for both the hand and object. This dataset is currently made of 77,558 frames, 68 sequences, 10 persons, and 10 objects. Using our dataset, we develop a single RGB image-based method to predict the hand pose when interacting with objects under severe occlusions and show it generalizes to objects not seen in the dataset., Comment: Accepted to CVPR2020
Published: 2019

63. AssemblyNet: A Novel Deep Decision-Making Process for Whole Brain MRI Segmentation

Author: Coupé, Pierrick, Mansencal, Boris, Clément, Michaël, Giraud, Rémi, de Senneville, Baudouin Denis, Ta, Vinh-Thong, Lepetit, Vincent, and Manjon, José V.
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Quantitative Biology - Neurons and Cognition
Abstract: Whole brain segmentation using deep learning (DL) is a very challenging task since the number of anatomical labels is very high compared to the number of available training images. To address this problem, previous DL methods proposed to use a global convolution neural network (CNN) or few independent CNNs. In this paper, we present a novel ensemble method based on a large number of CNNs processing different overlapping brain areas. Inspired by parliamentary decision-making systems, we propose a framework called AssemblyNet, made of two "assemblies" of U-Nets. Such a parliamentary system is capable of dealing with complex decisions and reaching a consensus quickly. AssemblyNet introduces sharing of knowledge among neighboring U-Nets, an "amendment" procedure made by the second assembly at higher-resolution to refine the decision taken by the first one, and a final decision obtained by majority voting. When using the same 45 training images, AssemblyNet outperforms global U-Net by 28% in terms of the Dice metric, patch-based joint label fusion by 15% and SLANT-27 by 10%. Finally, AssemblyNet demonstrates high capacity to deal with limited training data to achieve whole brain segmentation in practical training and testing times.
Published: 2019

64. SharpNet: Fast and Accurate Recovery of Occluding Contours in Monocular Depth Estimation

Author: Ramamonjisoa, Michaël and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce SharpNet, a method that predicts an accurate depth map for an input color image, with a particular attention to the reconstruction of occluding contours: Occluding contours are an important cue for object recognition, and for realistic integration of virtual objects in Augmented Reality, but they are also notoriously difficult to reconstruct accurately. For example, they are a challenge for stereo-based reconstruction methods, as points around an occluding contour are visible in only one image. Inspired by recent methods that introduce normal estimation to improve depth prediction, we introduce a novel term that constrains depth and occluding contours predictions. Since ground truth depth is difficult to obtain with pixel-perfect accuracy along occluding contours, we use synthetic images for training, followed by fine-tuning on real data. We demonstrate our approach on the challenging NYUv2-Depth dataset, and show that our method outperforms the state-of-the-art along occluding contours, while performing on par with the best recent methods for the rest of the images. Its accuracy along the occluding contours is actually better than the `ground truth' acquired by a depth camera based on structured light. We show this by introducing a new benchmark based on NYUv2-Depth for evaluating occluding contours in monocular reconstruction, which is our second contribution., Comment: Accepted at ICCV "3D Reconstruction in the Wild" workshop
Published: 2019

65. Casting Geometric Constraints in Semantic Segmentation as Semi-Supervised Learning

Author: Stekovic, Sinisa, Fraundorfer, Friedrich, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We propose a simple yet effective method to learn to segment new indoor scenes from video frames: State-of-the-art methods trained on one dataset, even as large as the SUNRGB-D dataset, can perform poorly when applied to images that are not part of the dataset, because of the dataset bias, a common phenomenon in computer vision. To make semantic segmentation more useful in practice, one can exploit geometric constraints. Our main contribution is to show that these constraints can be cast conveniently as semi-supervised terms, which enforce the fact that the same class should be predicted for the projections of the same 3D location in different images. This is interesting as we can exploit general existing techniques developed for semi-supervised learning to efficiently incorporate the constraints. We show that this approach can efficiently and accurately learn to segment target sequences of ScanNet and our own target sequences using only annotations from SUNRGB-D, and geometric relations between the video frames of target sequences., Comment: To be presented at WACV 2020
Published: 2019

66. Speed Invariant Time Surface for Learning to Detect Corner Points with Event-Based Cameras

Author: Manderscheid, Jacques, Sironi, Amos, Bourdis, Nicolas, Migliore, Davide, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a learning approach to corner detection for event-based cameras that is stable even under fast and abrupt motions. Event-based cameras offer high temporal resolution, power efficiency, and high dynamic range. However, the properties of event-based data are very different compared to standard intensity images, and simple extensions of corner detection methods designed for these images do not perform well on event-based data. We first introduce an efficient way to compute a time surface that is invariant to the speed of the objects. We then show that we can train a Random Forest to recognize events generated by a moving corner from our time surface. Random Forests are also extremely efficient, and therefore a good choice to deal with the high capture frequency of event-based cameras ---our implementation processes up to 1.6Mev/s on a single CPU. Thanks to our time surface formulation and this learning approach, our method is significantly more robust to abrupt changes of direction of the corners compared to previous ones. Our method also naturally assigns a confidence score for the corners, which can be useful for postprocessing. Moreover, we introduce a high-resolution dataset suitable for quantitative evaluation and comparison of corner detection methods for event-based cameras. We call our approach SILC, for Speed Invariant Learned Corners, and compare it to the state-of-the-art with extensive experiments, showing better performance., Comment: 8 pages, 7 figures, accepted at CVPR 2019
Published: 2019

67. Generalized Feedback Loop for Joint Hand-Object Pose Estimation

Author: Oberweger, Markus, Wohlhart, Paul, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose an approach to estimating the 3D pose of a hand, possibly handling an object, given a depth image. We show that we can correct the mistakes made by a Convolutional Neural Network trained to predict an estimate of the 3D pose by using a feedback loop. The components of this feedback loop are also Deep Networks, optimized using training data. This approach can be generalized to a hand interacting with an object. Therefore, we jointly estimate the 3D pose of the hand and the 3D pose of the object. Our approach performs en-par with state-of-the-art methods for 3D hand pose estimation, and outperforms state-of-the-art methods for joint hand-object pose estimation when using depth images only. Also, our approach is efficient as our implementation runs in real-time on a single GPU., Comment: arXiv admin note: substantial text overlap with arXiv:1609.09698
Published: 2019

68. Simultaneous completion and spatiotemporal grouping of corrupted motion tracks

Author: Agudo, Antonio, Lepetit, Vincent, and Moreno-Noguer, Francesc
Published: 2022
Full Text: View/download PDF

69. Deep Learning: Basics and Convolutional Neural Networks (CNNs)

Author: Vakalopoulou, Maria, primary, Christodoulidis, Stergios, additional, Burgos, Ninon, additional, Colliot, Olivier, additional, and Lepetit, Vincent, additional
Published: 2023
Full Text: View/download PDF

70. S4-Net: Geometry-Consistent Semi-Supervised Semantic Segmentation

Author: Stekovic, Sinisa, Fraundorfer, Friedrich, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We show that it is possible to learn semantic segmentation from very limited amounts of manual annotations, by enforcing geometric 3D constraints between multiple views. More exactly, image locations corresponding to the same physical 3D point should all have the same label. We show that introducing such constraints during learning is very effective, even when no manual label is available for a 3D point, and can be done simply by employing techniques from 'general' semi-supervised learning to the context of semantic segmentation. To demonstrate this idea, we use RGB-D image sequences of rigid scenes, for a 4-class segmentation problem derived from the ScanNet dataset. Starting from RGB-D sequences with a few annotated frames, we show that we can incorporate RGB-D sequences without any manual annotations to improve the performance, which makes our approach very convenient. Furthermore, we demonstrate our approach for semantic segmentation of objects on the LabelFusion dataset, where we show that one manually labeled image in a scene is sufficient for high performance on the whole scene., Comment: 8 pages, 5 figures
Published: 2018

71. Improving Nighttime Retrieval-Based Localization

Author: Germain, Hugo, Bourmaud, Guillaume, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Outdoor visual localization is a crucial component to many computer vision systems. We propose an approach to localization from images that is designed to explicitly handle the strong variations in appearance happening between daytime and nighttime. As revealed by recent long-term localization benchmarks, both traditional feature-based and retrieval-based approaches still struggle to handle such changes. Our novel localization method combines a state-of-the-art image retrieval architecture with condition-specific sub-networks allowing the computation of global image descriptors that are explicitly dependent of the capturing conditions. We show that our approach improves localization by a factor of almost 300\% compared to the popular VLAD-based methods on nighttime localization.
Published: 2018

72. HANDS18: Methods, Techniques and Applications for Hand Observation

Author: Oikonomidis, Iason, Garcia-Hernando, Guillermo, Yao, Angela, Argyros, Antonis, Lepetit, Vincent, and Kim, Tae-Kyun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This report outlines the proceedings of the Fourth International Workshop on Observing and Understanding Hands in Action (HANDS 2018). The fourth instantiation of this workshop attracted significant interest from both academia and the industry. The program of the workshop included regular papers that are published as the workshop's proceedings, extended abstracts, invited posters, and invited talks. Topics of the submitted works and invited talks and posters included novel methods for hand pose estimation from RGB, depth, or skeletal data, datasets for special cases and real-world applications, and techniques for hand motion re-targeting and hand gesture recognition. The invited speakers are leaders in their respective areas of specialization, coming from both industry and academia. The main conclusions that can be drawn are the turn of the community towards RGB data and the maturation of some methods and techniques, which in turn has led to increasing interest for real-world applications., Comment: 11 pages, 1 figure, Discussion of the HANDS 2018 workshop held in conjunction with ECCV 2018
Published: 2018

73. Domain Transfer for 3D Pose Estimation from Color Images without Manual Annotations

Author: Rad, Mahdi, Oberweger, Markus, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce a novel learning method for 3D pose estimation from color images. While acquiring annotations for color images is a difficult task, our approach circumvents this problem by learning a mapping from paired color and depth images captured with an RGB-D camera. We jointly learn the pose from synthetic depth images that are easy to generate, and learn to align these synthetic depth images with the real depth images. We show our approach for the task of 3D hand pose estimation and 3D object pose estimation, both from color images only. Our method achieves performances comparable to state-of-the-art methods on popular benchmark datasets, without requiring any annotations for the color images., Comment: ACCV 2018 (oral)
Published: 2018

74. A Summary of the 4th International Workshop on Recovering 6D Object Pose

Author: Hodan, Tomas, Kouskouridas, Rigas, Kim, Tae-Kyun, Tombari, Federico, Bekris, Kostas, Drost, Bertram, Groueix, Thibault, Walas, Krzysztof, Lepetit, Vincent, Leonardis, Ales, Steger, Carsten, Michel, Frank, Sahin, Caner, Rother, Carsten, and Matas, Jiri
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: This document summarizes the 4th International Workshop on Recovering 6D Object Pose which was organized in conjunction with ECCV 2018 in Munich. The workshop featured four invited talks, oral and poster presentations of accepted workshop papers, and an introduction of the BOP benchmark for 6D object pose estimation. The workshop was attended by 100+ people working on relevant topics in both academia and industry who shared up-to-date advances and discussed open problems., Comment: In: Computer Vision - ECCV 2018 Workshops - Munich, Germany, September 8-9 and 14, 2018, Proceedings
Published: 2018
Full Text: View/download PDF

75. Geometry-Aware Network for Non-Rigid Shape Prediction from a Single View

Author: Pumarola, Albert, Agudo, Antonio, Porzi, Lorenzo, Sanfeliu, Alberto, Lepetit, Vincent, and Moreno-Noguer, Francesc
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a method for predicting the 3D shape of a deformable surface from a single view. By contrast with previous approaches, we do not need a pre-registered template of the surface, and our method is robust to the lack of texture and partial occlusions. At the core of our approach is a {\it geometry-aware} deep architecture that tackles the problem as usually done in analytic solutions: first perform 2D detection of the mesh and then estimate a 3D shape that is geometrically consistent with the image. We train this architecture in an end-to-end manner using a large dataset of synthetic renderings of shapes under different levels of deformation, material properties, textures and lighting conditions. We evaluate our approach on a test split of this dataset and available real benchmarks, consistently improving state-of-the-art solutions with a significantly lower computational time., Comment: Accepted at CVPR 2018
Published: 2018

76. Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation

Author: Oberweger, Markus, Rad, Mahdi, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce a novel method for robust and accurate 3D object pose estimation from a single color image under large occlusions. Following recent approaches, we first predict the 2D projections of 3D points related to the target object and then compute the 3D pose from these correspondences using a geometric method. Unfortunately, as the results of our experiments show, predicting these 2D projections using a regular CNN or a Convolutional Pose Machine is highly sensitive to partial occlusions, even when these methods are trained with partially occluded examples. Our solution is to predict heatmaps from multiple small patches independently and to accumulate the results to obtain accurate and robust predictions. Training subsequently becomes challenging because patches with similar appearances but different positions on the object correspond to different heatmaps. However, we provide a simple yet effective solution to deal with such ambiguities. We show that our approach outperforms existing methods on two challenging datasets: The Occluded LineMOD dataset and the YCB-Video dataset, both exhibiting cluttered scenes with highly occluded objects. Project website: https://www.tugraz.at/institute/icg/research/team-lepetit/research-projects/robust-object-pose-estimation/
Published: 2018

77. 3D Pose Estimation and 3D Model Retrieval for Objects in the Wild

Author: Grabner, Alexander, Roth, Peter M., and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: We propose a scalable, efficient and accurate approach to retrieve 3D models for objects in the wild. Our contribution is twofold. We first present a 3D pose estimation approach for object categories which significantly outperforms the state-of-the-art on Pascal3D+. Second, we use the estimated pose as a prior to retrieve 3D models which accurately represent the geometry of objects in RGB images. For this purpose, we render depth images from 3D models under our predicted pose and match learned image descriptors of RGB images against those of rendered depth images using a CNN-based multi-view metric learning approach. In this way, we are the first to report quantitative results for 3D model retrieval on Pascal3D+, where our method chooses the same models as human annotators for 50% of the validation images on average. In addition, we show that our method, which was trained purely on Pascal3D+, retrieves rich and accurate 3D models from ShapeNet given RGB images of objects in the wild., Comment: Accepted to Conference on Computer Vision and Pattern Recognition (CVPR) 2018
Published: 2018

78. Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images

Author: Rad, Mahdi, Oberweger, Markus, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a simple and efficient method for exploiting synthetic images when training a Deep Network to predict a 3D pose from an image. The ability of using synthetic images for training a Deep Network is extremely valuable as it is easy to create a virtually infinite training set made of such images, while capturing and annotating real images can be very cumbersome. However, synthetic images do not resemble real images exactly, and using them for training can result in suboptimal performance. It was recently shown that for exemplar-based approaches, it is possible to learn a mapping from the exemplar representations of real images to the exemplar representations of synthetic images. In this paper, we show that this approach is more general, and that a network can also be applied after the mapping to infer a 3D pose: At run time, given a real image of the target object, we first compute the features for the image, map them to the feature space of synthetic images, and finally use the resulting features as input to another network which predicts the 3D pose. Since this network can be trained very effectively by using synthetic images, it performs very well in practice, and inference is faster and more accurate than with an exemplar-based approach. We demonstrate our approach on the LINEMOD dataset for 3D object pose estimation from color images, and the NYU dataset for 3D hand pose estimation from depth maps. We show that it allows us to outperform the state-of-the-art on both datasets., Comment: CVPR 2018
Published: 2017

79. Learning to Find Good Correspondences

Author: Yi, Kwang Moo, Trulls, Eduard, Ono, Yuki, Lepetit, Vincent, Salzmann, Mathieu, and Fua, Pascal
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We develop a deep architecture to learn to find good correspondences for wide-baseline stereo. Given a set of putative sparse matches and the camera intrinsics, we train our network in an end-to-end fashion to label the correspondences as inliers or outliers, while simultaneously using them to recover the relative pose, as encoded by the essential matrix. Our architecture is based on a multi-layer perceptron operating on pixel coordinates rather than directly on the image, and is thus simple and small. We introduce a novel normalization technique, called Context Normalization, which allows us to process each data point separately while imbuing it with global information, and also makes the network invariant to the order of the correspondences. Our experiments on multiple challenging datasets demonstrate that our method is able to drastically improve the state of the art with little training data., Comment: CVPR 2018 (Oral)
Published: 2017

80. HandSeg: An Automatically Labeled Dataset for Hand Segmentation from Depth Images

Author: Bojja, Abhishake Kumar, Mueller, Franziska, Malireddi, Sri Raghu, Oberweger, Markus, Lepetit, Vincent, Theobalt, Christian, Yi, Kwang Moo, and Tagliasacchi, Andrea
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose an automatic method for generating high-quality annotations for depth-based hand segmentation, and introduce a large-scale hand segmentation dataset. Existing datasets are typically limited to a single hand. By exploiting the visual cues given by an RGBD sensor and a pair of colored gloves, we automatically generate dense annotations for two hand segmentation. This lowers the cost/complexity of creating high quality datasets, and makes it easy to expand the dataset in the future. We further show that existing datasets, even with data augmentation, are not sufficient to train a hand segmentation algorithm that can distinguish two hands. Source and datasets will be made publicly available.
Published: 2017

81. Going Further with Point Pair Features

Author: Hinterstoisser, Stefan, Lepetit, Vincent, Rajkumar, Naresh, and Konolige, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Point Pair Features is a widely used method to detect 3D objects in point clouds, however they are prone to fail in presence of sensor noise and background clutter. We introduce novel sampling and voting schemes that significantly reduces the influence of clutter and sensor noise. Our experiments show that with our improvements, PPFs become competitive against state-of-the-art methods as it outperforms them on several objects from challenging benchmarks, at a low computational cost., Comment: Corrected post-print of manuscript accepted to the European Conference on Computer Vision (ECCV) 2016; https://link.springer.com/chapter/10.1007/978-3-319-46487-9_51
Published: 2017
Full Text: View/download PDF

82. On Pre-Trained Image Features and Synthetic Images for Deep Learning

Author: Hinterstoisser, Stefan, Lepetit, Vincent, Wohlhart, Paul, and Konolige, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep Learning methods usually require huge amounts of training data to perform at their full potential, and often require expensive manual labeling. Using synthetic images is therefore very attractive to train object detectors, as the labeling comes for free, and several approaches have been proposed to combine synthetic and real images for training. In this paper, we show that a simple trick is sufficient to train very effectively modern object detectors with synthetic images only: We freeze the layers responsible for feature extraction to generic layers pre-trained on real images, and train only the remaining layers with plain OpenGL rendering. Our experiments with very recent deep architectures for object recognition (Faster-RCNN, R-FCN, Mask-RCNN) and image feature extractors (InceptionResnet and Resnet) show this simple approach performs surprisingly well.
Published: 2017

83. ALCN: Meta-Learning for Contrast Normalization Applied to Robust 3D Pose Estimation

Author: Rad, Mahdi, Roth, Peter M., and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: To be robust to illumination changes when detecting objects in images, the current trend is to train a Deep Network with training images captured under many different lighting conditions. Unfortunately, creating such a training set is very cumbersome, or sometimes even impossible, for some applications such as 3D pose estimation of specific objects, which is the application we focus on in this paper. We therefore propose a novel illumination normalization method that lets us learn to detect objects and estimate their 3D pose under challenging illumination conditions from very few training samples. Our key insight is that normalization parameters should adapt to the input image. In particular, we realized this via a Convolutional Neural Network trained to predict the parameters of a generalization of the Difference-of-Gaussians method. We show that our method significantly outperforms standard normalization methods and demonstrate it on two challenging 3D detection and pose estimation problems., Comment: BMVC' 17
Published: 2017

84. DeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation

Author: Oberweger, Markus and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: DeepPrior is a simple approach based on Deep Learning that predicts the joint 3D locations of a hand given a depth map. Since its publication early 2015, it has been outperformed by several impressive works. Here we show that with simple improvements: adding ResNet layers, data augmentation, and better initial hand localization, we achieve better or similar performance than more sophisticated recent methods on the three main benchmarks (NYU, ICVL, MSRA) while keeping the simplicity of the original method. Our new implementation is available at https://github.com/moberweger/deep-prior-pp ., Comment: To appear in ICCV Workshops 2017
Published: 2017

85. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth

Author: Rad, Mahdi and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce a novel method for 3D object detection and pose estimation from color images only. We first use segmentation to detect the objects of interest in 2D even in presence of partial occlusions and cluttered background. By contrast with recent patch-based methods, we rely on a "holistic" approach: We apply to the detected objects a Convolutional Neural Network (CNN) trained to predict their 3D poses in the form of 2D projections of the corners of their 3D bounding boxes. This, however, is not sufficient for handling objects from the recent T-LESS dataset: These objects exhibit an axis of rotational symmetry, and the similarity of two images of such an object under two different poses makes training the CNN challenging. We solve this problem by restricting the range of poses used for training, and by introducing a classifier to identify the range of a pose at run-time before estimating it. We also use an optional additional step that refines the predicted poses. We improve the state-of-the-art on the LINEMOD dataset from 73.7% to 89.3% of correctly registered RGB frames. We are also the first to report results on the Occlusion dataset using color images only. We obtain 54% of frames passing the Pose 6D criterion on average on several sequences of the T-LESS dataset, compared to the 67% of the state-of-the-art on the same sequences which uses both color and depth. The full approach is also scalable, as a single network can be trained for multiple objects simultaneously., Comment: ICCV 2017
Published: 2017

86. Monocular LSD-SLAM Integration within AR System

Author: Höll, Markus and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Software Engineering
Abstract: In this paper, we cover the process of integrating Large-Scale Direct Simultaneous Localization and Mapping (LSD-SLAM) algorithm into our existing AR stereo engine, developed for our modified "Augmented Reality Oculus Rift". With that, we are able to track one of our realworld cameras which are mounted on the rift, within a complete unknown environment. This makes it possible to achieve a constant and full augmentation, synchronizing our 3D movement (x, y, z) in both worlds, the real world and the virtual world. The development for the basic AR setup using the Oculus Rift DK1 and two fisheye cameras is fully documented in our previous paper. After an introduction to image-based registration, we detail the LSD-SLAM algorithm and document our code implementing our integration. The AR stereo engine with Oculus Rift support can be accessed via the GIT repository https://github.com/MaXvanHeLL/ARift.git and the modified LSD-SLAM project used for the integration is available here https://github.com/MaXvanHeLL/LSD-SLAM.git.
Published: 2017
Full Text: View/download PDF

87. Image Descriptors and Similarity Measures

Author: Lepetit, Vincent and Ikeuchi, Katsushi, editor
Published: 2021
Full Text: View/download PDF

88. 3D Object Detection and Pose Estimation of Unseen Objects in Color Images with Local Surface Embeddings

Author: Pitteri, Giorgia, Bugeau, Aurélie, Ilic, Slobodan, Lepetit, Vincent, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ishikawa, Hiroshi, editor, Liu, Cheng-Lin, editor, Pajdla, Tomas, editor, and Shi, Jianbo, editor
Published: 2021
Full Text: View/download PDF

89. S2DNet: Learning Image Features for Accurate Sparse-to-Dense Matching

Author: Germain, Hugo, Bourmaud, Guillaume, Lepetit, Vincent, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Vedaldi, Andrea, editor, Bischof, Horst, editor, Brox, Thomas, editor, and Frahm, Jan-Michael, editor
Published: 2020
Full Text: View/download PDF

90. Training a Feedback Loop for Hand Pose Estimation

Author: Oberweger, Markus, Wohlhart, Paul, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose an entirely data-driven approach to estimating the 3D pose of a hand given a depth image. We show that we can correct the mistakes made by a Convolutional Neural Network trained to predict an estimate of the 3D pose by using a feedback loop. The components of this feedback loop are also Deep Networks, optimized using training data. They remove the need for fitting a 3D model to the input data, which requires both a carefully designed fitting function and algorithm. We show that our approach outperforms state-of-the-art methods, and is efficient as our implementation runs at over 400 fps on a single GPU., Comment: Presented at ICCV 2015 (oral)
Published: 2016

91. Fine Hand Segmentation using Convolutional Neural Networks

Author: Vodopivec, Tadej, Lepetit, Vincent, and Peer, Peter
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a method for extracting very accurate masks of hands in egocentric views. Our method is based on a novel Deep Learning architecture: In contrast with current Deep Learning methods, we do not use upscaling layers applied to a low-dimensional representation of the input image. Instead, we extract features with convolutional layers and map them directly to a segmentation mask with a fully connected layer. We show that this approach, when applied in a multi-scale fashion, is both accurate and efficient enough for real-time. We demonstrate it on a new dataset made of images captured in various environments, from the outdoors to offices.
Published: 2016

92. Hashmod: A Hashing Method for Scalable 3D Object Detection

Author: Kehl, Wadim, Tombari, Federico, Navab, Nassir, Ilic, Slobodan, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a scalable method for detecting objects and estimating their 3D poses in RGB-D data. To this end, we rely on an efficient representation of object views and employ hashing techniques to match these views against the input frame in a scalable way. While a similar approach already exists for 2D detection, we show how to extend it to estimate the 3D pose of the detected objects. In particular, we explore different hashing strategies and identify the one which is more suitable to our problem. We show empirically that the complexity of our method is sublinear with the number of objects and we enable detection and pose estimation of many 3D objects with high accuracy while outperforming the state-of-the-art in terms of runtime., Comment: BMVC 2015
Published: 2016

93. Structured Prediction of 3D Human Pose with Deep Neural Networks

Author: Tekin, Bugra, Katircioglu, Isinsu, Salzmann, Mathieu, Lepetit, Vincent, and Fua, Pascal
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Most recent approaches to monocular 3D pose estimation rely on Deep Learning. They either train a Convolutional Neural Network to directly regress from image to 3D pose, which ignores the dependencies between human joints, or model these dependencies via a max-margin structured learning framework, which involves a high computational cost at inference time. In this paper, we introduce a Deep Learning regression architecture for structured prediction of 3D human pose from monocular images that relies on an overcomplete auto-encoder to learn a high-dimensional latent pose representation and account for joint dependencies. We demonstrate that our approach outperforms state-of-the-art ones both in terms of structure preservation and prediction accuracy.
Published: 2016

94. Efficiently Creating 3D Training Data for Fine Hand Pose Estimation

Author: Oberweger, Markus, Riegler, Gernot, Wohlhart, Paul, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Human-Computer Interaction
Abstract: While many recent hand pose estimation methods critically rely on a training set of labelled frames, the creation of such a dataset is a challenging task that has been overlooked so far. As a result, existing datasets are limited to a few sequences and individuals, with limited accuracy, and this prevents these methods from delivering their full potential. We propose a semi-automated method for efficiently and accurately labeling each frame of a hand depth video with the corresponding 3D locations of the joints: The user is asked to provide only an estimate of the 2D reprojections of the visible joints in some reference frames, which are automatically selected to minimize the labeling work by efficiently optimizing a sub-modular loss function. We then exploit spatial, temporal, and appearance constraints to retrieve the full 3D poses of the hand over the complete sequence. We show that this data can be used to train a recent state-of-the-art hand pose estimation method, leading to increased accuracy. The code and dataset can be found on our website https://cvarlab.icg.tugraz.at/projects/hand_detection/, Comment: added link to source https://github.com/moberweger/semi-auto-anno. Appears in Proc. of CVPR 2016
Published: 2016

95. Augmented Reality Oculus Rift

Author: Höll, Markus, Heran, Nikolaus, and Lepetit, Vincent
Subjects: Computer Science - Graphics
Abstract: This paper covers the whole process of developing an Augmented Reality Stereoscopig Render Engine for the Oculus Rift. To capture the real world in form of a camera stream, two cameras with fish-eye lenses had to be installed on the Oculus Rift DK1 hardware. The idea was inspired by Steptoe \cite{steptoe2014presence}. After the introduction, a theoretical part covers all the most neccessary elements to achieve an AR System for the Oculus Rift, following the implementation part where the code from the AR Stereo Engine is explained in more detail. A short conclusion section shows some results, reflects some experiences and in the final chapter some future works will be discussed. The project can be accessed via the git repository https://github.com/MaXvanHeLL/ARift.git.
Published: 2016

96. LIFT: Learned Invariant Feature Transform

Author: Yi, Kwang Moo, Trulls, Eduard, Lepetit, Vincent, and Fua, Pascal
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individually, we show how to learn to do all three in a unified manner while preserving end-to-end differentiability. We then demonstrate that our Deep pipeline outperforms state-of-the-art methods on a number of benchmark datasets, without the need of retraining., Comment: Accepted to ECCV 2016 (spotlight)
Published: 2016

97. MonteBoxFinder: Detecting and Filtering Primitives to Fit a Noisy Point Cloud

Author: Ramamonjisoa, Michaël, primary, Stekovic, Sinisa, additional, and Lepetit, Vincent, additional
Published: 2022
Full Text: View/download PDF

98. Direct Prediction of 3D Body Poses from Motion Compensated Sequences

Author: Tekin, Bugra, Rozantsev, Artem, Lepetit, Vincent, and Fua, Pascal
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose an efficient approach to exploiting motion information from consecutive frames of a video sequence to recover the 3D pose of people. Previous approaches typically compute candidate poses in individual frames and then link them in a post-processing step to resolve ambiguities. By contrast, we directly regress from a spatio-temporal volume of bounding boxes to a 3D pose in the central frame. We further show that, for this approach to achieve its full potential, it is essential to compensate for the motion in consecutive frames so that the subject remains centered. This then allows us to effectively overcome ambiguities and improve upon the state-of-the-art by a large margin on the Human3.6m, HumanEva, and KTH Multiview Football 3D human pose estimation benchmarks., Comment: Published in CVPR 2016. supersedes arXiv:1504.08200
Published: 2015

99. Learning to Assign Orientations to Feature Points

Author: Yi, Kwang Moo, Verdie, Yannick, Fua, Pascal, and Lepetit, Vincent
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We show how to train a Convolutional Neural Network to assign a canonical orientation to feature points given an image patch centered on the feature point. Our method improves feature point matching upon the state-of-the art and can be used in conjunction with any existing rotation sensitive descriptors. To avoid the tedious and almost impossible task of finding a target orientation to learn, we propose to use Siamese networks which implicitly find the optimal orientations during training. We also propose a new type of activation function for Neural Networks that generalizes the popular ReLU, maxout, and PReLU activation functions. This novel activation performs better for our task. We validate the effectiveness of our method extensively with four existing datasets, including two non-planar datasets, as well as our own dataset. We show that we outperform the state-of-the-art without the need of retraining for each dataset., Comment: Accepted as Oral presentation in Computer Vision and Pattern Recognition, 2016
Published: 2015

100. Predicting People's 3D Poses from Short Sequences

Author: Tekin, Bugra, Sun, Xiaolu, Wang, Xinchao, Lepetit, Vincent, and Fua, Pascal
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose an efficient approach to exploiting motion information from consecutive frames of a video sequence to recover the 3D pose of people. Instead of computing candidate poses in individual frames and then linking them, as is often done, we regress directly from a spatio-temporal block of frames to a 3D pose in the central one. We will demonstrate that this approach allows us to effectively overcome ambiguities and to improve upon the state-of-the-art on challenging sequences., Comment: superseded by arXiv:1511.06692
Published: 2015

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

813 results on '"Lepetit, Vincent"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources