189 results on '"Gim Hee Lee"'
Search Results
102. Motion Estimation for Self-Driving Cars with a Generalized Camera.
- Author
-
Gim Hee Lee, Friedrich Fraundorfer, and Marc Pollefeys
- Published
- 2013
- Full Text
- View/download PDF
103. Toward automated driving in cities using close-to-market sensors: An overview of the V-Charge Project.
- Author
-
Paul Timothy Furgale, Ulrich Schwesinger, Martin Rufli, Wojciech Derendarz, Hugo Grimmett, Peter Mühlfellner, Stefan Wonneberger, Julian Timpner, Stephan Rottmann, Bo Li 0018, Bastian Schmidt, Thien-Nghia Nguyen, Elena Cardarelli, Stefano Cattani, Stefan Bruning, Sven Horstmann, Martin Stellmacher, Holger Mielenz, Kevin Köser, Markus Beermann, Christian Hane, Lionel Heng, Gim Hee Lee, Friedrich Fraundorfer, René Iser, Rudolph Triebel, Ingmar Posner, Paul Newman 0001, Lars C. Wolf, Marc Pollefeys, Stefan Brosig, Jan Effertz, Cédric Pradalier, and Roland Siegwart
- Published
- 2013
- Full Text
- View/download PDF
104. Minimal Solutions for Pose Estimation of a Multi-Camera System.
- Author
-
Gim Hee Lee, Bo Li 0018, Marc Pollefeys, and Friedrich Fraundorfer
- Published
- 2013
- Full Text
- View/download PDF
105. Vision-based autonomous mapping and exploration using a quadrotor MAV.
- Author
-
Friedrich Fraundorfer, Lionel Heng, Dominik Honegger, Gim Hee Lee, Lorenz Meier, Petri Tanskanen, and Marc Pollefeys
- Published
- 2012
- Full Text
- View/download PDF
106. RS-SLAM: RANSAC sampling for visual FastSLAM.
- Author
-
Gim Hee Lee, Friedrich Fraundorfer, and Marc Pollefeys
- Published
- 2011
- Full Text
- View/download PDF
107. Real-time photo-realistic 3D mapping for micro aerial vehicles.
- Author
-
Lionel Heng, Gim Hee Lee, Friedrich Fraundorfer, and Marc Pollefeys
- Published
- 2011
- Full Text
- View/download PDF
108. MAV visual SLAM with plane constraint.
- Author
-
Gim Hee Lee, Friedrich Fraundorfer, and Marc Pollefeys
- Published
- 2011
- Full Text
- View/download PDF
109. A benchmarking tool for MAV visual pose estimation.
- Author
-
Gim Hee Lee, Markus Achtelik, Friedrich Fraundorfer, Marc Pollefeys, and Roland Siegwart
- Published
- 2010
- Full Text
- View/download PDF
110. Self-Calibration and Visual SLAM with a Multi-Camera System on a Micro Aerial Vehicle.
- Author
-
Lionel Heng, Gim Hee Lee, and Marc Pollefeys
- Published
- 2014
- Full Text
- View/download PDF
111. PDR: Progressive Depth Regularization for Monocular 3D Object Detection
- Author
-
Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Min-Jian Zhao, and Gim Hee Lee
- Subjects
Media Technology ,Electrical and Electronic Engineering - Published
- 2023
112. Cascaded Refinement Network for Point Cloud Completion With Self-Supervision
- Author
-
Marcelo H. Ang, Xiaogang Wang, and Gim Hee Lee
- Subjects
FOS: Computer and information sciences ,Ground truth ,Computer Science - Artificial Intelligence ,Computer science ,business.industry ,Applied Mathematics ,Computer Vision and Pattern Recognition (cs.CV) ,Feature extraction ,Supervised learning ,Computer Science - Computer Vision and Pattern Recognition ,Point cloud ,computer.software_genre ,Task (project management) ,Kernel (linear algebra) ,Artificial Intelligence (cs.AI) ,Computational Theory and Mathematics ,Artificial Intelligence ,Feature (computer vision) ,Point (geometry) ,Computer Vision and Pattern Recognition ,Data mining ,Artificial intelligence ,business ,computer ,Software - Abstract
Point clouds are often sparse and incomplete, which imposes difficulties for real-world applications. Existing shape completion methods tend to generate rough shapes without fine-grained details. Considering this, we introduce a two-branch network for shape completion. The first branch is a cascaded shape completion sub-network to synthesize complete objects, where we propose to use the partial input together with the coarse output to preserve the object details during the dense point reconstruction. The second branch is an auto-encoder to reconstruct the original partial input. The two branches share a same feature extractor to learn an accurate global feature for shape completion. Furthermore, we propose two strategies to enable the training of our network when ground truth data are not available. This is to mitigate the dependence of existing approaches on large amounts of ground truth training data that are often difficult to obtain in real-world applications. Additionally, our proposed strategies are also able to improve the reconstruction quality for fully supervised learning. We verify our approach in self-supervised, semi-supervised and fully supervised settings with superior performances. Quantitative and qualitative results on different datasets demonstrate that our method achieves more realistic outputs than state-of-the-art approaches on the point cloud completion task., Comment: Accepted by PAMI. Extended version of the following paper: Cascaded Refinement Network for Point Cloud Completion. CVPR 2020. arXiv link: arXiv:2004.03327
- Published
- 2021
113. SFly: Swarm of micro flying robots.
- Author
-
Markus Achtelik, Michael Achtelik, Yorick Brunet, Margarita Chli, Savvas A. Chatzichristofis, Jean-Dominique Decotignie, Klaus-Michael Doth, Friedrich Fraundorfer, Laurent Kneip, Daniel Gurdan, Lionel Heng, Elias B. Kosmatopoulos, Lefteris Doitsidis, Gim Hee Lee, Simon Lynen, Agostino Martinelli, Lorenz Meier, Marc Pollefeys, Damien Piguet, Alessandro Renzaglia, Davide Scaramuzza 0001, Roland Siegwart, Jan Stumpf, Petri Tanskanen, Chiara Troiani, and Stephan Weiss 0002
- Published
- 2012
- Full Text
- View/download PDF
114. Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo
- Author
-
Gim Hee Lee and Jiahao Lin
- Subjects
FOS: Computer and information sciences ,Plane (geometry) ,Computer science ,business.industry ,Pipeline (computing) ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,3D pose estimation ,Sweep line algorithm ,Consistency (database systems) ,Code (cryptography) ,Benchmark (computing) ,Computer vision ,Artificial intelligence ,business ,Pose - Abstract
Existing approaches for multi-view multi-person 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views and solve for the 3D pose estimation for each person. Establishing cross-view correspondences is challenging in multi-person scenes, and incorrect correspondences will lead to sub-optimal performance for the multi-stage pipeline. In this work, we present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot. Specifically, we propose to perform depth regression for each joint of each 2D pose in a target camera view. Cross-view consistency constraints are implicitly enforced by multiple reference camera views via the plane sweep algorithm to facilitate accurate depth regression. We adopt a coarse-to-fine scheme to first regress the person-level depth followed by a per-person joint-level relative depth estimation. 3D poses are obtained from a simple back-projection given the estimated depths. We evaluate our approach on benchmark datasets where it outperforms previous state-of-the-arts while being remarkably efficient. Our code is available at https://github.com/jiahaoLjh/PlaneSweepPose., Comment: 10 pages, 5 figures. Accepted in CVPR 2021
- Published
- 2021
- Full Text
- View/download PDF
115. City-scale Scene Change Detection using Point Clouds
- Author
-
Gim Hee Lee and Zi Jian Yew
- Subjects
FOS: Computer and information sciences ,business.industry ,Computer science ,Deep learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Point cloud ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Thresholding ,Automation ,Tree traversal ,Robustness (computer science) ,GNSS applications ,Computer vision ,Artificial intelligence ,business ,Change detection - Abstract
We propose a method for detecting structural changes in a city using images captured from vehicular mounted cameras over traversals at two different times. We first generate 3D point clouds for each traversal from the images and approximate GNSS/INS readings using Structure-from-Motion (SfM). A direct comparison of the two point clouds for change detection is not ideal due to inaccurate geo-location information and possible drifts in the SfM. To circumvent this problem, we propose a deep learning-based non-rigid registration on the point clouds which allows us to compare the point clouds for structural change detection in the scene. Furthermore, we introduce a dual thresholding check and post-processing step to enhance the robustness of our method. We collect two datasets for the evaluation of our approach. Experiments show that our method is able to detect scene changes effectively, even in the presence of viewpoint and illumination differences., Comment: 8 pages, 10 figures. To be presented at ICRA2021
- Published
- 2021
- Full Text
- View/download PDF
116. Learning Spatial Context with Graph Neural Network for Multi-Person Pose Grouping
- Author
-
Gim Hee Lee and Jiahao Lin
- Subjects
FOS: Computer and information sciences ,Spatial contextual awareness ,Source code ,Computer science ,business.industry ,media_common.quotation_subject ,Computer Vision and Pattern Recognition (cs.CV) ,Graph partition ,Computer Science - Computer Vision and Pattern Recognition ,Context (language use) ,Pattern recognition ,Spectral clustering ,Graph (abstract data type) ,Artificial intelligence ,business ,Spatial analysis ,Pose ,media_common - Abstract
Bottom-up approaches for image-based multi-person pose estimation consist of two stages: (1) keypoint detection and (2) grouping of the detected keypoints to form person instances. Current grouping approaches rely on learned embedding from only visual features that completely ignore the spatial configuration of human poses. In this work, we formulate the grouping task as a graph partitioning problem, where we learn the affinity matrix with a Graph Neural Network (GNN). More specifically, we design a Geometry-aware Association GNN that utilizes spatial information of the keypoints and learns local affinity from the global context. The learned geometry-based affinity is further fused with appearance-based affinity to achieve robust keypoint association. Spectral clustering is used to partition the graph for the formation of the pose instances. Experimental results on two benchmark datasets show that our proposed method outperforms existing appearance-only grouping frameworks, which shows the effectiveness of utilizing spatial context for robust grouping. Source code is available at: https://github.com/jiahaoLjh/PoseGrouping., Comment: 7 pages, 4 figures. Accepted in ICRA 2021
- Published
- 2021
- Full Text
- View/download PDF
117. Novel Class Discovery in Semantic Segmentation
- Author
-
Yuyang Zhao, Zhun Zhong, Nicu Sebe, and Gim Hee Lee
- Subjects
FOS: Computer and information sciences ,grouping and shape analysis ,Segmentation ,Self-& semi-& meta- Transfer/low-shot/long-tail learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce a new setting of Novel Class Discovery in Semantic Segmentation (NCDSS), which aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes. In contrast to existing approaches that look at novel class discovery in image classification, we focus on the more challenging semantic segmentation. In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image, which increases the difficulty in using the unlabeled data. To tackle this new setting, we leverage the labeled base data and a saliency model to coarsely cluster novel classes for model training in our basic framework. Additionally, we propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels, further improving the model performance on the novel classes. Our EUMS utilizes an entropy ranking technique and a dynamic reassignment to distill clean labels, thereby making full use of the noisy data via self-supervised learning. We build the NCDSS benchmark on the PASCAL-5$^i$ dataset and COCO-20$^i$ dataset. Extensive experiments demonstrate the feasibility of the basic framework (achieving an average mIoU of 49.81% on PASCAL-5$^i$) and the effectiveness of EUMS framework (outperforming the basic framework by 9.28% mIoU on PASCAL-5$^i$)., Comment: CVPR 2022
- Published
- 2021
- Full Text
- View/download PDF
118. Source-Free Open Compound Domain Adaptation in Semantic Segmentation
- Author
-
Yuyang Zhao, Zhun Zhong, Zhiming Luo, Gim Hee Lee, and Nicu Sebe
- Subjects
FOS: Computer and information sciences ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Media Technology ,open compound domain adaptation ,Semantic segmentation ,source-free domain adaptation ,Electrical and Electronic Engineering - Abstract
In this work, we introduce a new concept, named source-free open compound domain adaptation (SF-OCDA), and study it in semantic segmentation. SF-OCDA is more challenging than the traditional domain adaptation but it is more practical. It jointly considers (1) the issues of data privacy and data storage and (2) the scenario of multiple target domains and unseen open domains. In SF-OCDA, only the source pre-trained model and the target data are available to learn the target model. The model is evaluated on the samples from the target and unseen open domains. To solve this problem, we present an effective framework by separating the training process into two stages: (1) pre-training a generalized source model and (2) adapting a target model with self-supervised learning. In our framework, we propose the Cross-Patch Style Swap (CPSS) to diversify samples with various patch styles in the feature-level, which can benefit the training of both stages. First, CPSS can significantly improve the generalization ability of the source model, providing more accurate pseudo-labels for the latter stage. Second, CPSS can reduce the influence of noisy pseudo-labels and also avoid the model overfitting to the target domain during selfsupervised learning, consistently boosting the performance on the target and open domains. Experiments demonstrate that our method produces state-of-the-art results on the C-Driving dataset. Furthermore, our model also achieves the leading performance on CityScapes for domain generalization.
- Published
- 2021
- Full Text
- View/download PDF
119. From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation
- Author
-
Gim Hee Lee and Chen Li
- Subjects
FOS: Computer and information sciences ,business.industry ,Generalization ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Overfitting ,Machine learning ,computer.software_genre ,Synthetic data ,Field (computer science) ,Domain (software engineering) ,Consistency (database systems) ,Margin (machine learning) ,Artificial intelligence ,business ,computer ,Pose - Abstract
Animal pose estimation is an important field that has received increasing attention in the recent years. The main challenge for this task is the lack of labeled data. Existing works circumvent this problem with pseudo labels generated from data of other easily accessible domains such as synthetic data. However, these pseudo labels are noisy even with consistency check or confidence-based filtering due to the domain shift in the data. To solve this problem, we design a multi-scale domain adaptation module (MDAM) to reduce the domain gap between the synthetic and real data. We further introduce an online coarse-to-fine pseudo label updating strategy. Specifically, we propose a self-distillation module in an inner coarse-update loop and a mean-teacher in an outer fine-update loop to generate new pseudo labels that gradually replace the old ones. Consequently, our model is able to learn from the old pseudo labels at the early stage, and gradually switch to the new pseudo labels to prevent overfitting in the later stage. We evaluate our approach on the TigDog and VisDA 2019 datasets, where we outperform existing approaches by a large margin. We also demonstrate the generalization ability of our model by testing extensively on both unseen domains and unseen animal categories. Our code is available at the project website., Comment: CVPR2021
- Published
- 2021
- Full Text
- View/download PDF
120. Point Cloud Completion by Learning Shape Priors
- Author
-
Marcelo H. Ang, Gim Hee Lee, and Xiaogang Wang
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistical distance ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Feature vector ,Image and Video Processing (eess.IV) ,Point cloud ,Computer Science - Computer Vision and Pattern Recognition ,020207 software engineering ,02 engineering and technology ,010501 environmental sciences ,Electrical Engineering and Systems Science - Image and Video Processing ,01 natural sciences ,Machine Learning (cs.LG) ,Feature (computer vision) ,Kernel (statistics) ,Prior probability ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Point (geometry) ,Algorithm ,0105 earth and related environmental sciences ,Reproducing kernel Hilbert space - Abstract
In view of the difficulty in reconstructing object details in point cloud completion, we propose a shape prior learning method for object completion. The shape priors include geometric information in both complete and the partial point clouds. We design a feature alignment strategy to learn the shape prior from complete points, and a coarse to fine strategy to incorporate partial prior in the fine stage. To learn the complete objects prior, we first train a point cloud auto-encoder to extract the latent embeddings from complete points. Then we learn a mapping to transfer the point features from partial points to that of the complete points by optimizing feature alignment losses. The feature alignment losses consist of a L2 distance and an adversarial loss obtained by Maximum Mean Discrepancy Generative Adversarial Network (MMD-GAN). The L2 distance optimizes the partial features towards the complete ones in the feature space, and MMD-GAN decreases the statistical distance of two point features in a Reproducing Kernel Hilbert Space. We achieve state-of-the-art performances on the point cloud completion task. Our code is available at https://github.com/xiaogangw/point-cloud-completion-shape-prior., IROS 2020
- Published
- 2020
121. Few-shot 3D Point Cloud Semantic Segmentation
- Author
-
Tat-Seng Chua, Gim Hee Lee, and Na Zhao
- Subjects
FOS: Computer and information sciences ,I.2.10 ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,I.4.6 ,Point cloud ,Computer Science - Computer Vision and Pattern Recognition ,Pattern recognition ,Solid modeling ,Semantics ,Data modeling ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Benchmark (computing) ,Segmentation ,Artificial intelligence ,business ,Feature learning - Abstract
Many existing approaches for 3D point cloud semantic segmentation are fully supervised. These fully supervised approaches heavily rely on large amounts of labeled training data that are difficult to obtain and cannot segment new classes after training. To mitigate these limitations, we propose a novel attention-aware multi-prototype transductive few-shot point cloud semantic segmentation method to segment new classes given a few labeled examples. Specifically, each class is represented by multiple prototypes to model the complex data distribution of labeled points. Subsequently, we employ a transductive label propagation method to exploit the affinities between labeled multi-prototypes and unlabeled points, and among the unlabeled points. Furthermore, we design an attention-aware multi-level feature learning network to learn the discriminative features that capture the geometric dependencies and semantic correlations between points. Our proposed method shows significant and consistent improvements compared to baselines in different few-shot point cloud semantic segmentation settings (i.e., 2/3-way 1/5-shot) on two benchmark datasets. Our code is available at https://github.com/Na-Z/attMPTI., CVPR 2021
- Published
- 2020
122. RPM-Net: Robust Point Matching Using Learned Features
- Author
-
Gim Hee Lee and Zi Jian Yew
- Subjects
FOS: Computer and information sciences ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Point cloud ,Initialization ,Iterative closest point ,Point set registration ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Maxima and minima ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Algorithm ,Rigid transformation ,0105 earth and related environmental sciences - Abstract
Iterative Closest Point (ICP) solves the rigid point cloud registration problem iteratively in two steps: (1) make hard assignments of spatially closest point correspondences, and then (2) find the least-squares rigid transformation. The hard assignments of closest point correspondences based on spatial distances are sensitive to the initial rigid transformation and noisy/outlier points, which often cause ICP to converge to wrong local minima. In this paper, we propose the RPM-Net -- a less sensitive to initialization and more robust deep learning-based approach for rigid point cloud registration. To this end, our network uses the differentiable Sinkhorn layer and annealing to get soft assignments of point correspondences from hybrid features learned from both spatial coordinates and local geometry. To further improve registration performance, we introduce a secondary network to predict optimal annealing parameters. Unlike some existing methods, our RPM-Net handles missing correspondences and point clouds with partial visibility. Experimental results show that our RPM-Net achieves state-of-the-art performance compared to existing non-deep learning and recent deep learning methods. Our source code is available at the project website https://github.com/yewzijian/RPMNet ., Comment: 10 pages, 4 figures. To appear in CVPR2020
- Published
- 2020
123. Robust 6D Object Pose Estimation by Learning RGB-D Features
- Author
-
Marcelo H. Ang, Meng Tian, Gim Hee Lee, and Liang Pan
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,Object (computer science) ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Rotation (mathematics) ,Pose - Abstract
Accurate 6D object pose estimation is fundamental to robotic manipulation and grasping. Previous methods follow a local optimization approach which minimizes the distance between closest point pairs to handle the rotation ambiguity of symmetric objects. In this work, we propose a novel discrete-continuous formulation for rotation regression to resolve this local-optimum problem. We uniformly sample rotation anchors in SO(3), and predict a constrained deviation from each anchor to the target, as well as uncertainty scores for selecting the best prediction. Additionally, the object location is detected by aggregating point-wise vectors pointing to the 3D center. Experiments on two benchmarks: LINEMOD and YCB-Video, show that the proposed method outperforms state-of-the-art approaches. Our code is available at https://github.com/mentian/object-posenet., Comment: Accepted at ICRA 2020
- Published
- 2020
124. Cascaded Refinement Network for Point Cloud Completion
- Author
-
Gim Hee Lee, Marcelo H. Ang, and Xiaogang Wang
- Subjects
FOS: Computer and information sciences ,Ground truth ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Feature extraction ,Computer Science - Computer Vision and Pattern Recognition ,Point cloud ,020207 software engineering ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Task (project management) ,Point distribution model ,0202 electrical engineering, electronic engineering, information engineering ,Point (geometry) ,Artificial intelligence ,business ,Algorithm ,0105 earth and related environmental sciences - Abstract
Point clouds are often sparse and incomplete. Existing shape completion methods are incapable of generating details of objects or learning the complex point distributions. To this end, we propose a cascaded refinement network together with a coarse-to-fine strategy to synthesize the detailed object shapes. Considering the local details of partial input with the global shape information together, we can preserve the existing details in the incomplete point set and generate the missing parts with high fidelity. We also design a patch discriminator that guarantees every local area has the same pattern with the ground truth to learn the complicated point distribution. Quantitative and qualitative experiments on different datasets show that our method achieves superior results compared to existing state-of-the-art approaches on the 3D point cloud completion task. Our source code is available at https://github.com/xiaogangw/cascaded-point-completion.git., Comment: CVPR2020
- Published
- 2020
- Full Text
- View/download PDF
125. Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-view Geometry
- Author
-
Pengfei Guo, He Chen, Gim Hee Lee, Gregory S. Chirikjian, and Pengfei Li
- Subjects
Computer science ,business.industry ,Epipolar geometry ,020207 software engineering ,02 engineering and technology ,3D pose estimation ,Robustness (computer science) ,Euclidean geometry ,0202 electrical engineering, electronic engineering, information engineering ,Maximum a posteriori estimation ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Correspondence problem ,Pose - Abstract
Epipolar constraints are at the core of feature matching and depth estimation in current multi-person multi-camera 3D human pose estimation methods. Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances mainly due to two sources of ambiguity. The first is the mismatch of human joints resulting from the simple cues provided by the Euclidean distances between joints and epipolar lines. The second is the lack of robustness from the naive formulation of the problem as a least squares minimization. In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation. Our method consists of two key components: a graph model for fast cross-view matching, and a maximum a posteriori (MAP) estimator for the reconstruction of the 3D human poses. We demonstrate the effectiveness and superiority of our proposed method on four benchmark datasets. Our code is available at: https://github.com/HeCraneChen/3D-Crowd-Pose-Estimation-Based-on-MVG.
- Published
- 2020
126. Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation
- Author
-
Gim Hee Lee, Marcelo H. Ang, and Meng Tian
- Subjects
Computer science ,business.industry ,Prior probability ,Embedding ,Object model ,RGB color model ,Pattern recognition ,Artificial intelligence ,Object (computer science) ,business ,Autoencoder ,Categorical variable ,Image (mathematics) - Abstract
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image. To handle the intra-class shape variation, we propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior. Additionally, our network infers the dense correspondences between the depth observation of the object instance and the reconstructed 3D model to jointly estimate the 6D object pose and size. We design an autoencoder that trains on a collection of object models and compute the mean latent embedding for each category to learn the categorical shape priors. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach significantly outperforms the state of the art. Our code is available at https://github.com/mentian/object-deformnet.
- Published
- 2020
127. Relative Pose Estimation of Calibrated Cameras with Known $$\mathrm {SE}(3)$$ Invariants
- Author
-
Bo Li, Evgeniy V. Martyushev, and Gim Hee Lee
- Subjects
Polynomial (hyperelastic model) ,Robustness (computer science) ,Coordinate system ,Invariant (mathematics) ,Rigid body ,Translation (geometry) ,Pose ,Algorithm ,Rotation (mathematics) ,Mathematics - Abstract
The \(\mathrm {SE}(3)\) invariants of a pose include its rotation angle and screw translation. In this paper, we present a complete comprehensive study of the relative pose estimation problem for a calibrated camera constrained by known \(\mathrm {SE}(3)\) invariant, which involves 5 minimal problems in total. These problems reduces the minimal number of point pairs for relative pose estimation and improves the estimation efficiency and robustness. The \(\mathrm {SE}(3)\) invariant constraints can come from extra sensor measurements or motion assumption. Unlike conventional relative pose estimation with extra constraints, no extrinsic calibration is required to transform the constraints to the camera frame. This advantage comes from the invariance of \(\mathrm {SE}(3)\) invariants cross different coordinate systems on a rigid body and makes the solvers more convenient and flexible in practical applications. In addition to the concept of relative pose estimation constrained by \(\mathrm {SE}(3)\) invariants, we also present a comprehensive study of existing polynomial formulations for relative pose estimation and discover their relationship. Different formulations are carefully chosen for each proposed problems to achieve best efficiency. Experiments on synthetic and real data shows performance improvement compared to conventional relative pose estimation methods. Our source code is available at: http://github.com/prclibo/relative_pose.
- Published
- 2020
128. SESS: Self-Ensembling Semi-Supervised 3D Object Detection
- Author
-
Tat-Seng Chua, Gim Hee Lee, and Na Zhao
- Subjects
FOS: Computer and information sciences ,Contextual image classification ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Supervised learning ,Point cloud ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Object detection ,ComputingMethodologies_PATTERNRECOGNITION ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,RGB color model ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,0105 earth and related environmental sciences - Abstract
The performance of existing point cloud-based 3D object detection methods heavily relies on large-scale high-quality 3D annotations. However, such annotations are often tedious and expensive to collect. Semi-supervised learning is a good alternative to mitigate the data annotation issue, but has remained largely unexplored in 3D object detection. Inspired by the recent success of self-ensembling technique in semi-supervised image classification task, we propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data. Furthermore, we propose three consistency losses to enforce the consistency between two sets of predicted 3D object proposals, to facilitate the learning of structure and semantic invariances of objects. Extensive experiments conducted on SUN RGB-D and ScanNet datasets demonstrate the effectiveness of SESS in both inductive and transductive semi-supervised 3D object detection. Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data. Our code is available at https://github.com/Na-Z/sess., CVPR 2020 Oral
- Published
- 2019
129. PointAtrousGraph: Deep Hierarchical Encoder-Decoder with Point Atrous Convolution for Unorganized 3D Points
- Author
-
Gim Hee Lee, Chee-Meng Chew, and Liang Pan
- Subjects
FOS: Computer and information sciences ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Point cloud ,Computer Science - Computer Vision and Pattern Recognition ,020207 software engineering ,02 engineering and technology ,Image (mathematics) ,Convolution ,Upsampling ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Point (geometry) ,Enhanced Data Rates for GSM Evolution ,Spatial analysis ,Algorithm - Abstract
Motivated by the success of encoding multi-scale contextual information for image analysis, we propose our PointAtrousGraph (PAG) - a deep permutation-invariant hierarchical encoder-decoder for efficiently exploiting multi-scale edge features in point clouds. Our PAG is constructed by several novel modules, such as Point Atrous Convolution (PAC), Edge-preserved Pooling (EP) and Edge-preserved Unpooling (EU). Similar with atrous convolution, our PAC can effectively enlarge receptive fields of filters and thus densely learn multi-scale point features. Following the idea of non-overlapping max-pooling operations, we propose our EP to preserve critical edge features during subsampling. Correspondingly, our EU modules gradually recover spatial information for edge features. In addition, we introduce chained skip subsampling/upsampling modules that directly propagate edge features to the final stage. Particularly, our proposed auxiliary loss functions can further improve our performance. Experimental results show that our PAG outperform previous state-of-the-art methods on various 3D semantic perception applications., 11 pages, 10 figures
- Published
- 2019
130. Learning Low-Rank Images for Robust All-Day Feature Matching
- Author
-
Marcelo H. Ang, Mengdan Feng, and Gim Hee Lee
- Subjects
0209 industrial biotechnology ,Matching (statistics) ,business.industry ,Computer science ,media_common.quotation_subject ,Rank (computer programming) ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Image (mathematics) ,020901 industrial engineering & automation ,Feature (computer vision) ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Contrast (vision) ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,media_common ,Feature detection (computer vision) - Abstract
Image-based localization plays an important role in today's autonomous driving technologies. However, in large scale outdoor environments, challenging conditions, e.g., lighting changes or different weather, heavily affect image appearance and quality. As a key component of feature-based visual localization, image feature detection and matching deteriorate severely and cause worse localization performance. In this paper, we propose a novel method for robust image feature matching under drastically changing outdoor environments. In contrast to existing approaches which try to learn robust feature descriptors, we train a deep network that outputs the low-rank representations of the images where the undesired variations on the images are removed, and perform feature extraction and matching on the learned low-rank space. We demonstrate that our learned low-rank images largely improve the performance of image feature matching under varying conditions over a long period of time.
- Published
- 2019
131. PS^2-Net: A Locally and Globally Aware Network for Point-Based Semantic Segmentation
- Author
-
Gim Hee Lee, Tat-Seng Chua, and Na Zhao
- Subjects
FOS: Computer and information sciences ,Theoretical computer science ,Artificial neural network ,business.industry ,Computer science ,Deep learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Point cloud ,Context (language use) ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Permutation ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Segmentation ,Artificial intelligence ,business ,Encoder ,0105 earth and related environmental sciences - Abstract
In this paper, we present the PS^2-Net -- a locally and globally aware deep learning framework for semantic segmentation on 3D scene-level point clouds. In order to deeply incorporate local structures and global context to support 3D scene segmentation, our network is built on four repeatedly stacked encoders, where each encoder has two basic components: EdgeConv that captures local structures and NetVLAD that models global context. Different from existing start-of-the-art methods for point-based scene semantic segmentation that either violate or do not achieve permutation invariance, our PS^2-Net is designed to be permutation invariant which is an essential property of any deep network used to process unordered point clouds. We further provide theoretical proof to guarantee the permutation invariance property of our network. We perform extensive experiments on two large-scale 3D indoor scene datasets and demonstrate that our PS2-Net is able to achieve state-of-the-art performances as compared to existing approaches.
- Published
- 2019
- Full Text
- View/download PDF
132. Baseline Desensitizing In Translation Averaging
- Author
-
Bingbing Zhuang, Gim Hee Lee, and Loong-Fah Cheong
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Linear programming ,Computer science ,business.industry ,Epipolar geometry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Normalization (image processing) ,Bilinear interpolation ,02 engineering and technology ,Nonlinear system ,020901 industrial engineering & automation ,Outlier ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Coordinate descent ,business ,Algorithm - Abstract
Many existing translation averaging algorithms are either sensitive to disparate camera baselines and have to rely on extensive preprocessing to improve the observed Epipolar Geometry graph, or if they are robust against disparate camera baselines, require complicated optimization to minimize the highly nonlinear angular error objective. In this paper, we carefully design a simple yet effective bilinear objective function, introducing a variable to perform the requisite normalization. The objective function enjoys the baseline-insensitive property of the angular error and yet is amenable to simple and efficient optimization by block coordinate descent, with good empirical performance. A rotation-assisted Iterative Reweighted Least Squares scheme is further put forth to help deal with outliers. We also contribute towards a better understanding of the behavior of two recent convex algorithms, LUD and Shapefit/kick, clarifying the underlying subtle difference that leads to the performance gap. Finally, we demonstrate that our algorithm achieves overall superior accuracies in benchmark dataset compared to state-of-theart methods, and is also several times faster., Comment: 8 pages
- Published
- 2019
- Full Text
- View/download PDF
133. Generating Multiple Hypotheses for 3D Human Pose Estimation with Mixture Density Network
- Author
-
Gim Hee Lee and Chen Li
- Subjects
FOS: Computer and information sciences ,Computer science ,Generalization ,business.industry ,Gaussian ,Deep learning ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Pattern recognition ,02 engineering and technology ,Inverse problem ,01 natural sciences ,symbols.namesake ,Face (geometry) ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,Mixture distribution ,020201 artificial intelligence & image processing ,Artificial intelligence ,010306 general physics ,business ,Pose - Abstract
3D human pose estimation from a monocular image or 2D joints is an ill-posed problem because of depth ambiguity and occluded joints. We argue that 3D human pose estimation from a monocular input is an inverse problem where multiple feasible solutions can exist. In this paper, we propose a novel approach to generate multiple feasible hypotheses of the 3D pose from 2D joints.In contrast to existing deep learning approaches which minimize a mean square error based on an unimodal Gaussian distribution, our method is able to generate multiple feasible hypotheses of 3D pose based on a multimodal mixture density networks. Our experiments show that the 3D poses estimated by our approach from an input of 2D joints are consistent in 2D reprojections, which supports our argument that multiple solutions exist for the 2D-to-3D inverse problem. Furthermore, we show state-of-the-art performance on the Human3.6M dataset in both best hypothesis and multi-view settings, and we demonstrate the generalization capacity of our model by testing on the MPII and MPI-INF-3DHP datasets. Our code is available at the project website., Comment: CVPR 2019
- Published
- 2019
- Full Text
- View/download PDF
134. 2D3D-MatchNet: Learning to Match Keypoints Across 2D Image and 3D Point Cloud
- Author
-
Marcelo H. Ang, Mengdan Feng, Sixing Hu, and Gim Hee Lee
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Ground truth ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Point cloud ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Image (mathematics) ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Pose - Abstract
Large-scale point cloud generated from 3D sensors is more accurate than its image-based counterpart. However, it is seldom used in visual pose estimation due to the difficulty in obtaining 2D-3D image to point cloud correspondences. In this paper, we propose the 2D3D-MatchNet - an end-to-end deep network architecture to jointly learn the descriptors for 2D and 3D keypoint from image and point cloud, respectively. As a result, we are able to directly match and establish 2D-3D correspondences from the query image and 3D point cloud reference map for visual pose estimation. We create our Oxford 2D-3D Patches dataset from the Oxford Robotcar dataset with the ground truth camera poses and 2D-3D image to point cloud correspondences for training and testing the deep network. Experimental results verify the feasibility of our approach.
- Published
- 2019
- Full Text
- View/download PDF
135. Towards Precise Vehicle-Free Point Cloud Mapping: An On-vehicle System with Deep Vehicle Detection and Tracking
- Author
-
Gim Hee Lee, Mengdan Feng, Marcelo H. Ang, and Sixing Hu
- Subjects
0209 industrial biotechnology ,business.industry ,Computer science ,Point cloud ,02 engineering and technology ,Tracking (particle physics) ,Object detection ,020901 industrial engineering & automation ,Lidar ,Vehicle detection ,0202 electrical engineering, electronic engineering, information engineering ,Global Positioning System ,RGB color model ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business - Abstract
While 3D LiDAR has become a common practice for more and more autonomous driving systems, precise 3D mapping and robust localization is of great importance. However, current 3D map is always noisy and unreliable due to the existence of moving objects, leading to worse localization. In this paper, we propose a general vehicle-free point cloud mapping framework for better on-vehicle localization. For each laser scan, vehicle points are detected, tracked and then removed. Simultaneously, 3D map is reconstructed by registering each vehicle-free laser scan to global coordinate based on GPS/INS data. Instead of direct 3D object detection from point cloud, we first detect vehicles from RGB images using the proposed YVDN. In case of false or missing detection, which may result in the existence of vehicles in the map, we propose the K-Frames forward-backward object tracking algorithm to link detection from neighborhood images. Laser scan points falling into the detected bounding boxes are then removed. We conduct our experiments on the Oxford RobotCar Dataset and show the qualitative results to validate the feasibility of our vehicle-free 3D mapping system. Besides, our vehicle-free mapping system can be generalized to any autonomous driving system equipped with LiDAR, camera and/or GPS.
- Published
- 2018
136. Object Detection and Motion Planning for Automated Welding of Tubular Joints
- Author
-
Chee-Meng Chew, Gim Hee Lee, Syeda Mariam Ahmed, Chee Khiang Pang, and Yan Zhi Tan
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Engineering ,business.industry ,010401 analytical chemistry ,02 engineering and technology ,Welding ,Solid modeling ,Collision ,01 natural sciences ,Object detection ,0104 chemical sciences ,law.invention ,Robot welding ,Computer Science - Robotics ,020901 industrial engineering & automation ,law ,Trajectory ,Robot ,Computer vision ,Motion planning ,Artificial intelligence ,business ,Robotics (cs.RO) - Abstract
Automatic welding of tubular TKY joints is an important and challenging task for the marine and offshore industry. In this paper, a framework for tubular joint detection and motion planning is proposed. The pose of the real tubular joint is detected using RGB-D sensors, which is used to obtain a real-to-virtual mapping for positioning the workpiece in a virtual environment. For motion planning, a Bi-directional Transition based Rapidly exploring Random Tree (BiTRRT) algorithm is used to generate trajectories for reaching the desired goals. The complete framework is verified with experiments, and the results show that the robot welding torch is able to transit without collision to desired goals which are close to the tubular joint.
- Published
- 2018
137. Project AutoVision: Localization and 3D Scene Perception for an Autonomous Vehicle with a Multi-Camera System
- Author
-
Sixing Hu, Peidong Liu, Ye Chuan Yeo, Andreas Geiger, Benjamin Choi, Gim Hee Lee, Lionel Heng, Rang Nguyen, Benson Kuan, Torsten Sattler, Marc Pollefeys, Marcel Geppert, and Zhaopeng Cui
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Computer science ,business.industry ,Deep learning ,media_common.quotation_subject ,Real-time computing ,02 engineering and technology ,Multi camera ,Computer Science - Robotics ,020901 industrial engineering & automation ,Perception ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Robotics (cs.RO) ,Envelope (motion) ,media_common - Abstract
Project AutoVision aims to develop localization and 3D scene perception capabilities for a self-driving vehicle. Such capabilities will enable autonomous navigation in urban and rural environments, in day and night, and with cameras as the only exteroceptive sensors. The sensor suite employs many cameras for both 360-degree coverage and accurate multi-view stereo; the use of low-cost cameras keeps the cost of this sensor suite to a minimum. In addition, the project seeks to extend the operating envelope to include GNSS-less conditions which are typical for environments with tall buildings, foliage, and tunnels. Emphasis is placed on leveraging multi-view geometry and deep learning to enable the vehicle to localize and perceive in 3D space. This paper presents an overview of the project, and describes the sensor suite and current progress in the areas of calibration, localization, and perception.
- Published
- 2018
138. Convolutional Sequence to Sequence Model for Human Dynamics
- Author
-
Zhen Zhang, Gim Hee Lee, Wee Sun Lee, and Chen Li
- Subjects
FOS: Computer and information sciences ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,020207 software engineering ,02 engineering and technology ,Convolutional neural network ,Motion capture ,Convolutional code ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Hidden Markov model ,business ,Encoder ,Algorithm ,Decoding methods - Abstract
Human motion modeling is a classic problem in computer vision and graphics. Challenges in modeling human motion include high dimensional prediction as well as extremely complicated dynamics.We present a novel approach to human motion modeling based on convolutional neural networks (CNN). The hierarchical structure of CNN makes it capable of capturing both spatial and temporal correlations effectively. In our proposed approach,a convolutional long-term encoder is used to encode the whole given motion sequence into a long-term hidden variable, which is used with a decoder to predict the remainder of the sequence. The decoder itself also has an encoder-decoder structure, in which the short-term encoder encodes a shorter sequence to a short-term hidden variable, and the spatial decoder maps the long and short-term hidden variable to motion predictions. By using such a model, we are able to capture both invariant and dynamic information of human motion, which results in more accurate predictions. Experiments show that our algorithm outperforms the state-of-the-art methods on the Human3.6M and CMU Motion Capture datasets. Our code is available at the project website., CVPR2018
- Published
- 2018
139. SO-Net: Self-Organizing Network for Point Cloud Analysis
- Author
-
Ben M. Chen, Gim Hee Lee, and Jiaxin Li
- Subjects
FOS: Computer and information sciences ,Self-organizing map ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Deep learning ,Feature vector ,Feature extraction ,Point cloud ,Computer Science - Computer Vision and Pattern Recognition ,020207 software engineering ,Self-organizing network ,02 engineering and technology ,computer.software_genre ,k-nearest neighbors algorithm ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Segmentation ,Graphical model ,Data mining ,Artificial intelligence ,business ,computer - Abstract
This paper presents SO-Net, a permutation invariant architecture for deep learning with orderless point clouds. The SO-Net models the spatial distribution of point cloud by building a Self-Organizing Map (SOM). Based on the SOM, SO-Net performs hierarchical feature extraction on individual points and SOM nodes, and ultimately represents the input point cloud by a single feature vector. The receptive field of the network can be systematically adjusted by conducting point-to-node k nearest neighbor search. In recognition tasks such as point cloud reconstruction, classification, object part segmentation and shape retrieval, our proposed network demonstrates performance that is similar with or better than state-of-the-art approaches. In addition, the training speed is significantly faster than existing point cloud recognition networks because of the parallelizability and simplicity of the proposed architecture. Our code is available at the project website. https://github.com/lijx10/SO-Net, 17 pages, CVPR 2018
- Published
- 2018
140. PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition
- Author
-
Gim Hee Lee and Mikaela Angelina Uy
- Subjects
FOS: Computer and information sciences ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Feature extraction ,Point cloud ,Computer Science - Computer Vision and Pattern Recognition ,Inference ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Leverage (statistics) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,business ,computer - Abstract
Unlike its image based counterpart, point cloud based retrieval for place recognition has remained as an unexplored and unsolved problem. This is largely due to the difficulty in extracting local feature descriptors from a point cloud that can subsequently be encoded into a global descriptor for the retrieval task. In this paper, we propose the PointNetVLAD where we leverage on the recent success of deep networks to solve point cloud based retrieval for place recognition. Specifically, our PointNetVLAD is a combination/modification of the existing PointNet and NetVLAD, which allows end-to-end training and inference to extract the global descriptor from a given 3D point cloud. Furthermore, we propose the "lazy triplet and quadruplet" loss functions that can achieve more discriminative and generalizable global descriptors to tackle the retrieval task. We create benchmark datasets for point cloud based retrieval for place recognition, and the experimental results on these datasets show the feasibility of our PointNetVLAD. Our code and the link for the benchmark dataset downloads are available in our project website. http://github.com/mikacuy/pointnetvlad/, Comment: CVPR 2018, 11 pages, 10 figures
- Published
- 2018
- Full Text
- View/download PDF
141. Minimal solutions for the multi-camera pose estimation problem
- Author
-
Marc Pollefeys, Bo Li, Friedrich Fraundorfer, Gim Hee Lee, and Hollerbach, John
- Subjects
Photogrammetrie und Bildanalyse ,Polynomial ,business.industry ,Applied Mathematics ,Mechanical Engineering ,minimal solutions ,RANSAC ,3D pose estimation ,localization ,Nonlinear system ,Singular value ,Non-perspective pose estimation ,multi-camera system ,Artificial Intelligence ,Modeling and Simulation ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,Plucker ,business ,Pose ,Software ,Rigid transformation ,Mathematics - Abstract
In this paper, we propose a novel formulation to solve the pose estimation problem of a calibrated multi-camera system. The non-central rays that pass through the 3D world points and multi-camera system are elegantly represented as Plücker lines. This allows us to solve for the depth of the points along the Plücker lines with a minimal set of three-point correspondences. We show that the minimal solution for the depth of the points along the Plücker lines is an eight-degree polynomial that gives up to eight real solutions. The coordinates of the 3D world points in the multi-camera frame are computed from the known depths. Consequently, the pose of the multi-camera system, i.e. the rigid transformation between the world and multi-camera frames can be obtained from absolute orientation. We also derive a closed-form minimal solution for the absolute orientation. This removes the need for the computationally expensive singular value decompositions during the evaluations of the possible solutions for the depths. We identify the correct solution and do robust estimation with RANSAC. Finally, the solution is further refined by including all the inlier correspondences in a nonlinear refinement step. We verify our approach by showing comparisons with other existing approaches and results from large-scale real-world datasets.
- Published
- 2015
142. Rolling-Shutter-Aware Differential SfM and Image Rectification
- Author
-
Loong-Fah Cheong, Bingbing Zhuang, and Gim Hee Lee
- Subjects
FOS: Computer and information sciences ,050210 logistics & transportation ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,05 social sciences ,3D reconstruction ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,Rolling shutter ,02 engineering and technology ,Acceleration ,Shutter ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Image rectification ,Image warping ,business ,Pose - Abstract
In this paper, we develop a modified differential Structure from Motion (SfM) algorithm that can estimate relative pose from two consecutive frames despite of Rolling Shutter (RS) artifacts. In particular, we show that under constant velocity assumption, the errors induced by the rolling shutter effect can be easily rectified by a linear scaling operation on each optical flow. We further propose a 9-point algorithm to recover the relative pose of a rolling shutter camera that undergoes constant acceleration motion. We demonstrate that the dense depth maps recovered from the relative pose of the RS camera can be used in a RS-aware warping for image rectification to recover high-quality Global Shutter (GS) images. Experiments on both synthetic and real RS images show that our RS-aware differential SfM algorithm produces more accurate results on relative pose estimation and 3D reconstruction from images distorted by RS effect compared to standard SfM algorithms that assume a GS camera model. We also demonstrate that our RS-aware warping for image rectification method outperforms state-of-the-art commercial software products, i.e. Adobe After Effects and Apple Imovie, at removing RS artifacts.
- Published
- 2017
143. 3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection
- Author
-
Torsten Sattler, Gim Hee Lee, Lionel Heng, Marc Pollefeys, Paul Furgale, Christian Häne, and Friedrich Fraundorfer
- Subjects
FOS: Computer and information sciences ,0209 industrial biotechnology ,Visual perception ,Computer science ,media_common.quotation_subject ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,020901 industrial engineering & automation ,Depth map ,Perception ,11. Sustainability ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,media_common ,Multi-camera system ,Fisheye camera ,business.industry ,Blind spot ,Obstacle detection ,Pipeline (software) ,Pipeline transport ,Mapping ,Localization ,Obstacle ,Calibration ,Signal Processing ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Camera resectioning - Abstract
Cameras are a crucial exteroceptive sensor for self-driving cars as they are low-cost and small, provide appearance information about the environment, and work in various weather conditions. They can be used for multiple purposes such as visual navigation and obstacle detection. We can use a surround multi-camera system to cover the full 360-degree field-of-view around the car. In this way, we avoid blind spots which can otherwise lead to accidents. To minimize the number of cameras needed for surround perception, we utilize fisheye cameras. Consequently, standard vision pipelines for 3D mapping, visual localization, obstacle detection, etc. need to be adapted to take full advantage of the availability of multiple cameras rather than treat each camera individually. In addition, processing of fisheye images has to be supported. In this paper, we describe the camera calibration and subsequent processing pipeline for multi-fisheye-camera systems developed as part of the V-Charge project. This project seeks to enable automated valet parking for self-driving cars. Our pipeline is able to precisely calibrate multi-camera systems, build sparse 3D maps for visual navigation, visually localize the car with respect to these maps, generate accurate dense maps, as well as detect obstacles based on real-time depth map extraction.
- Published
- 2017
- Full Text
- View/download PDF
144. Vision-Controlled Micro Flying Robots: From System Design to Autonomous Navigation and Mapping in GPS-Denied Environments
- Author
-
Petri Tanskanen, Friedrich Fraundorfer, Laurent Kneip, Lorenz Meier, Alessandro Renzaglia, Roland Siegwart, Markus W. Achtelik, Gim Hee Lee, Lionel Heng, Elias B. Kosmatopoulos, Margarita Chli, Chiara Troiani, Jan Stumpf, Savvas A. Chatzichristofis, Agostino Martinelli, Marc Pollefeys, Davide Scaramuzza, Lefteris Doitsidis, Daniel Gurdan, Michael Achtelik, Stephan Weiss, Simon Lynen, University of Zurich, and Scaramuzza, Davide
- Subjects
Engineering ,10009 Department of Informatics ,business.industry ,2208 Electrical and Electronic Engineering ,Real-time computing ,2207 Control and Systems Engineering ,Mobile robot ,Robotics ,000 Computer science, knowledge & systems ,Mobile robot navigation ,Computer Science Applications ,Control and Systems Engineering ,Inertial measurement unit ,1706 Computer Science Applications ,Global Positioning System ,Robot ,Onboard camera ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Search and rescue ,Simulation - Abstract
Autonomous microhelicopters will soon play a major role in tasks like search and rescue, environment monitoring, security surveillance, and inspection. If they are further realized in small scale, they can also be used in narrow outdoor and indoor environments and represent only a limited risk for people. However, for such operations, navigating based only on global positioning system (GPS) information is not sufficient. Fully autonomous operation in cities or other dense environments requires microhelicopters to fly at low altitudes, where GPS signals are often shadowed, or indoors and to actively explore unknown environments while avoiding collisions and creating maps. This involves a number of challenges on all levels of helicopter design, perception, actuation, control, and navigation, which still have to be solved. The Swarm of Micro Flying Robots (SFLY) project was a European Union-funded project with the goal of creating a swarm of vision-controlled microaerial vehicles (MAVs) capable of autonomous navigation, three-dimensional (3-D) mapping, and optimal surveillance coverage in GPS-denied environments. The SFLY MAVs do not rely on remote control, radio beacons, or motion-capture systems but can fly all by themselves using only a single onboard camera and an inertial measurement unit (IMU). This article describes the technical challenges that have been faced and the results achieved from hardware design and embedded programming to vision-based navigation and mapping, with an overview of how all the modules work and how they have been integrated into the final system. Code, data sets, and videos are publicly available to the robotics community. Experimental results demonstrating three MAVs navigating autonomously in an unknown GPS-denied environment and performing 3-D mapping and optimal surveillance coverage are presented.
- Published
- 2014
145. Autonomous Visual Mapping and Exploration With a Micro Aerial Vehicle
- Author
-
Petri Tanskanen, Lorenz Meier, Dominik Honegger, Gim Hee Lee, Lionel Heng, Friedrich Fraundorfer, and Marc Pollefeys
- Subjects
Computer science ,Payload ,business.industry ,Optical flow ,Field of view ,Computer Science Applications ,Octree ,Control and Systems Engineering ,Inertial measurement unit ,Metric (mathematics) ,Computer vision ,Artificial intelligence ,Visual odometry ,business ,Stereo camera - Abstract
Cameras are a natural fit for micro aerial vehicles MAVs due to their low weight, low power consumption, and two-dimensional field of view. However, computationally-intensive algorithms are required to infer the 3D structure of the environment from 2D image data. This requirement is made more difficult with the MAV's limited payload which only allows for one CPU board. Hence, we have to design efficient algorithms for state estimation, mapping, planning, and exploration. We implement a set of algorithms on two different vision-based MAV systems such that these algorithms enable the MAVs to map and explore unknown environments. By using both self-built and off-the-shelf systems, we show that our algorithms can be used on different platforms. All algorithms necessary for autonomous mapping and exploration run on-board the MAV. Using a front-looking stereo camera as the main sensor, we maintain a tiled octree-based 3D occupancy map. The MAV uses this map for local navigation and frontier-based exploration. In addition, we use a wall-following algorithm as an alternative exploration algorithm in open areas where frontier-based exploration under-performs. During the exploration, data is transmitted to the ground station which runs large-scale visual SLAM. We estimate the MAV's state with inertial data from an IMU together with metric velocity measurements from a custom-built optical flow sensor and pose estimates from visual odometry. We verify our approaches with experimental results, which to the best of our knowledge, demonstrate our MAVs to be the first vision-based MAVs to autonomously explore both indoor and outdoor environments.
- Published
- 2014
146. Minimal Solutions for Pose Estimation of a Multi-Camera System
- Author
-
Gim Hee Lee, Marc Pollefeys, Friedrich Fraundorfer, and Bo Li
- Subjects
Computer science ,Reprojection error ,02 engineering and technology ,RANSAC ,3D pose estimation ,01 natural sciences ,010309 optics ,0103 physical sciences ,Singular value decomposition ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Degree of a polynomial ,Plucker ,Algorithm ,Pose ,Rigid transformation - Abstract
In this paper, we propose a novel formulation to solve the pose estimation problem of a calibrated multi-camera system. The non-central rays that pass through the 3D world points and multi-camera system are elegantly represented as Plucker lines. This allows us to solve for the depth of the points along the Plucker lines with a minimal set of 3-point correspondences. We show that the minimal solution for the depth of the points along the Plucker lines is an 8 degree polynomial that gives up to 8 real solutions. The coordinates of the 3D world points in the multi-camera frame are computed from the known depths. Consequently, the pose of the multi-camera system, i.e. the rigid transformation between the world and multi-camera frames can be obtained from absolute orientation. We also derive a closed-form minimal solution for the absolute orientation. This removes the need for the computationally expensive Singular Value Decompositions (SVD) during the evaluations of the possible solutions for the depths. We identify the correct solution and do robust estimation with RANSAC. Finally, the solution is further refined by including all the inlier correspondences in a non-linear refinement step. We verify our approach by showing comparisons with other existing approaches and results from large-scale real-world datasets.
- Published
- 2016
147. Mobile Robots Navigation, Mapping, and Localization Part II.
- Author
-
Gim Hee Lee and Marcelo H. Ang Jr.
- Published
- 2009
148. Line-sweep: Cross-ratio for wide-baseline matching and 3D reconstruction
- Author
-
Michel Antunes, Daniel Snow, Srikumar Ramalingam, Gim Hee Lee, and Sudeep Pillai
- Subjects
Matching (statistics) ,business.industry ,3D reconstruction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Cross-ratio ,Sweep line algorithm ,Set (abstract data type) ,Feature (computer vision) ,Line (geometry) ,Point (geometry) ,Computer vision ,Artificial intelligence ,business ,Mathematics - Abstract
We propose a simple and useful idea based on cross-ratio constraint for wide-baseline matching and 3D reconstruction. Most existing methods exploit feature points and planes from images. Lines have always been considered notorious for both matching and reconstruction due to the lack of good line descriptors. We propose a method to generate and match new points using virtual lines constructed using pairs of keypoints, which are obtained using standard feature point detectors. We use cross-ratio constraints to obtain an initial set of new point matches, which are subsequently used to obtain line correspondences. We develop a method that works for both calibrated and uncalibrated camera configurations. We show compelling line-matching and large-scale 3D reconstruction.
- Published
- 2015
149. Self-Calibration and Visual SLAM with a Multi-Camera System on a Micro Aerial Vehicle
- Author
-
Gim Hee Lee, Lionel Heng, and Marc Pollefeys
- Subjects
Monocular ,Stereo cameras ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Astrophysics::Instrumentation and Methods for Astrophysics ,Gyroscope ,Triangulation (computer vision) ,Simultaneous localization and mapping ,law.invention ,Computer Science::Robotics ,Artificial Intelligence ,law ,Feature (computer vision) ,Computer Science::Computer Vision and Pattern Recognition ,Robot ,Computer vision ,Artificial intelligence ,business ,Stereo camera - Abstract
The use of a multi-camera system enables a robot to obtain a surround view, and thus, maximize its perceptual awareness of its environment. If vision-based simultaneous localization and mapping (vSLAM) is expected to provide reliable pose estimates for a micro aerial vehicle (MAV) with a multi-camera system, an accurate calibration of the multi-camera system is a necessary prerequisite. We propose a novel vSLAM-based self-calibration method for a multi-camera system that includes at least one calibrated stereo camera, and an arbitrary number of monocular cameras. We assume overlapping fields of view to only exist within stereo cameras. Our self-calibration estimates the inter-camera transforms with metric scale; metric scale is inferred from calibrated stereo. On our MAV, we set up each camera pair in a stereo configuration which facilitates the estimation of the MAV's pose with metric scale. Once the MAV is calibrated, the MAV is able to estimate its global pose via a multi-camera vSLAM implementation based on the generalized camera model. We propose a novel minimal and linear 3-point algorithm that uses relative rotation angle measurements from a 3-axis gyroscope to recover the relative motion of the MAV with metric scale and from 2D-2D feature correspondences. This relative motion estimation does not involve scene point triangulation. Our constant-time vSLAM implementation with loop closures runs on-board the MAV in real-time. To the best of our knowledge, no published work has demonstrated real-time on-board vSLAM with loop closures. We show experimental results from simulation experiments, and real-world experiments in both indoor and outdoor environments.
- Published
- 2014
150. Infrastructure-based calibration of a multi-camera rig
- Author
-
Paul Furgale, Mathias Bürki, Gim Hee Lee, Lionel Heng, Marc Pollefeys, and Roland Siegwart
- Subjects
Set (abstract data type) ,Camera auto-calibration ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Code (cryptography) ,Calibration ,Computer vision ,Artificial intelligence ,Multi camera ,Simultaneous localization and mapping ,Fiducial marker ,business - Abstract
The online recalibration of multi-sensor systems is a fundamental problem that must be solved before complex automated systems are deployed in situations such as automated driving. In such situations, accurate knowledge of calibration parameters is critical for the safe operation of automated systems. However, most existing calibration methods for multisensor systems are computationally expensive, use installations of known fiducial patterns, and require expert supervision. We propose an alternative approach called infrastructure-based calibration that is efficient, requires no modification of the infrastructure, and is completely unsupervised. In a survey phase, a computationally expensive simultaneous localization and mapping (SLAM) method is used to build a highly accurate map of a calibration area. Once the map is built, many other vehicles are able to use it for calibration as if it were a known fiducial pattern. We demonstrate the effectiveness of this method to calibrate the extrinsic parameters of a multi-camera system. The method does not assume that the cameras have an overlapping field of view and it does not require an initial guess. As the camera rig moves through the previously mapped area, we match features between each set of synchronized camera images and the map. Subsequently, we find the camera poses and inlier 2D-3D correspondences. From the camera poses, we obtain an initial estimate of the camera extrinsics and rig poses, and optimize these extrinsics and rig poses via non-linear refinement. The calibration code is publicly available as a standalone C++ package.
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.