Descriptor: "monocular" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"monocular"' showing total 9,636 results

Start Over Descriptor "monocular"

9,636 results on '"monocular"'

401. Unsupervised framework for depth estimation and camera motion prediction from video

Author: Dongbing Gu, Delong Yang, Huosheng Hu, Xunyu Zhong, and Xiafu Peng
Subjects: 0209 industrial biotechnology, Ground truth, Monocular, business.industry, Computer science, Cognitive Neuroscience, Epipolar geometry, Supervised learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Inference, 02 engineering and technology, Computer Science Applications, Consistency (database systems), 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Unsupervised learning, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Spatial analysis
Abstract: Depth estimation from monocular video plays a crucial role in scene perception. The significant drawback of supervised learning models is the need for vast amounts of manually labeled data (ground truth) for training. To overcome this limitation, unsupervised learning strategies without the requirement for ground truth have achieved extensive attention from researchers in the past few years. This paper presents a novel unsupervised framework for estimating single-view depth and predicting camera motion jointly. Stereo image sequences are used to train the model while monocular images are required for inference. The presented framework is composed of two CNNs (depth CNN and pose CNN) which are trained concurrently and tested independently. The objective function is constructed on the basis of the epipolar geometry constraints between stereo image sequences. To improve the accuracy of the model, a left-right consistency loss is added to the objective function. The use of stereo image sequences enables us to utilize both spatial information between stereo images and temporal photometric warp error from image sequences. Experimental results on the KITTI and Cityscapes datasets show that our model not only outperforms prior unsupervised approaches but also achieving better results comparable with several supervised methods. Moreover, we also train our model on the Euroc dataset which is captured in an indoor environment. Experiments in indoor and outdoor scenes are conducted to test the generalization capability of the model.
Published: 2020

402. Unseen Salient Object Discovery for Monocular Robot Vision

Author: Darren M. Chan and Laurel D. Riek
Subjects: Control and Optimization, Computer science, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Biomedical Engineering, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Artificial Intelligence, Perception, 0202 electrical engineering, electronic engineering, information engineering, Segmentation, Computer vision, 0105 earth and related environmental sciences, media_common, Monocular, business.industry, Mechanical Engineering, Motion blur, Robotics, Object detection, Computer Science Applications, Human-Computer Interaction, Control and Systems Engineering, Robot, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business
Abstract: A key challenge in robotics is the capability to perceive unseen objects, which can improve a robot's ability to learn from and adapt to its surroundings. One approach is to employ unsupervised, salient object discovery methods, which has shown promise in the computer vision literature. However, most state-of-the-art methods are unsuitable for robotics because they are limited to processing whole video segments before discovering objects, which can constrain real-time perception. To address these gaps, we introduce Unsupervised Foraging of Objects (UFO), a novel, unsupervised, salient object discovery method designed for monocular robot vision. We designed UFO with a parallel discover-prediction paradigm, permitting it to discover arbitrary, salient objects on a frame-by-frame basis, which can help robots to engage in scalable object learning. We compared UFO to the two fastest and most accurate methods for unsupervised salient object discovery (Fast Segmentation and Saliency-Aware Geodesic), and show that UFO 6.5 times faster, achieving state-of-the-art precision, recall, and accuracy. Furthermore our evaluation suggests that UFO is robust to real-world perception challenges encountered by robots, including moving cameras and moving objects, motion blur, and occlusion. It is our goal that this work will be used with other robot perception methods, to design robots that can learn novel object concepts, leading to improved autonomy.
Published: 2020

403. MFuseNet: Robust Depth Estimation With Learned Multiscopic Fusion

Author: Michael Yu Wang, Rui Fan, Qifeng Chen, and Weihao Yuan
Subjects: Control and Optimization, Computer Science - Artificial Intelligence, Computer science, Machine vision, Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Biomedical Engineering, 02 engineering and technology, Computer Science - Robotics, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), Computer vision, Monocular, Heuristic, business.industry, Mechanical Engineering, 020207 software engineering, Computer Science Applications, Human-Computer Interaction, Control and Systems Engineering, Fuse (electrical), Robot, RGB color model, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Parallax, business
Abstract: We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system controls the motion of a camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. In this system, we propose a new heuristic method and a robust learning-based method to fuse multiple cost volumes between the reference image and its surrounding images. To obtain training data, we build a synthetic dataset with multiscopic images. The experiments on the real-world Middlebury dataset and real robot demonstration show that our multiscopic vision system outperforms traditional two-frame stereo matching methods in depth estimation. Our code and dataset are available at https://sites.google.com/view/multiscopic., Comment: IEEE International Conference on Robotics and Automation (ICRA) + IEEE Robotics and Automation Letters (RA-L). arXiv admin note: substantial text overlap with arXiv:2001.08212
Published: 2020

404. Pedestrian Planar LiDAR Pose (PPLP) Network for Oriented Pedestrian Detection Based on Planar LiDAR and Monocular Images

Author: Fan Bu, Ram Vasudevan, Trinh Le, Xiaoxiao Du, and Matthew Johnson-Roberson
Subjects: Control and Optimization, Occupancy grid mapping, Computer science, Pedestrian detection, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 0211 other engineering and technologies, Biomedical Engineering, Point cloud, 02 engineering and technology, Artificial Intelligence, Minimum bounding box, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, 021101 geological & geomatics engineering, Monocular, Orientation (computer vision), business.industry, Mechanical Engineering, Ranging, Computer Science Applications, Human-Computer Interaction, Lidar, Control and Systems Engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business
Abstract: Pedestrian detection is an important task for human-robot interaction and autonomous driving applications. Most previous pedestrian detection methods rely on data collected from three-dimensional (3D) Light Detection and Ranging (LiDAR) sensors in addition to camera imagery, which can be expensive to deploy. In this letter, we propose a novel Pedestrian Planar LiDAR Pose Network (PPLP Net) based on two-dimensional (2D) LiDAR data and monocular camera imagery, which offers a far more affordable solution to the oriented pedestrian detection problem. The proposed PPLP Net consists of three sub-networks: an orientation detection network (OrientNet), a Region Proposal Network (RPN), and a PredictorNet. The OrientNet leverages state-of-the-art neural-network-based 2D pedestrian detection algorithms, including Mask R-CNN and ResNet, to detect the Bird's Eye View (BEV) orientation of each pedestrian. The RPN transfers 2D LiDAR point clouds into occupancy grid map and uses a frustum-based matching strategy for estimating non-oriented 3D pedestrian bounding boxes. Outputs from both OrientNet and RPN are passed through the PredictorNet for a final regression. The overall outputs of our proposed network are 3D bounding box locations and orientation values for all pedestrians in the scene. We present oriented pedestrian detection results on two datasets, the CMU Panoptic Dataset and a newly collected FCAV M-Air Pedestrian (FMP) Dataset, and show that our proposed PPLP network based on 2D LiDAR and monocular camera achieves similar or better performance to previous state-of-the-art 3D-LiDAR-based pedestrian detection methods in both indoor and outdoor environments.
Published: 2020

405. Hybrid Camera Pose Estimation With Online Partitioning for SLAM

Author: Xinyi Li and Haibin Ling
Subjects: FOS: Computer and information sciences, 0209 industrial biotechnology, Control and Optimization, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Biomedical Engineering, Initialization, Bundle adjustment, 02 engineering and technology, Simultaneous localization and mapping, Computer Science - Robotics, 020901 industrial engineering & automation, Artificial Intelligence, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Pose, Monocular, business.industry, Mechanical Engineering, Computer Science Applications, Human-Computer Interaction, Control and Systems Engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Robotics (cs.RO)
Abstract: This paper presents a hybrid real-time camera pose estimation framework with a novel partitioning scheme and introduces motion averaging to monocular Simultaneous Localization and Mapping (SLAM) systems. Breaking through the limitations of fixed-size temporal partitioning in many conventional SLAM pipelines, our approach significantly improves the accuracy of local bundle adjustment by gathering spatially-strongly-connected cameras into each block. With the dynamic initialization using intermediate computation values, \XL{we improve the Levenberg-Marquardt solver to further enhance the efficiency of the local optimization.} Moreover, the dense data association between blocks by our co-visibility-based partitioning enables us to explore and implement motion averaging to efficiently align the blocks globally, updating camera motion estimations on-the-fly. Experiments on benchmarks convincingly demonstrate the practicality and robustness of our proposed approach by significantly outperforming conventional approaches.
Published: 2020

406. A sematic and prior‐knowledge‐aided monocular localization method for construction‐related entities

Author: Chengqian Li, Heng Li, Xiaochun Luo, Qi Fang, and Wangpeng An
Subjects: Monocular, Computational Theory and Mathematics, business.industry, Computer science, Computer vision, Artificial intelligence, business, Computer Graphics and Computer-Aided Design, Computer Science Applications, Civil and Structural Engineering
Published: 2020

407. Building footprint extraction in Yangon city from monocular optical satellite image using deep learning

Author: Sao Hone Pha, Hein Thura Aung, and Wataru Takeuchi
Subjects: Momentum (technical analysis), Monocular, 010504 meteorology & atmospheric sciences, Computer science, business.industry, Deep learning, Geography, Planning and Development, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 0211 other engineering and technologies, Training (meteorology), 02 engineering and technology, 01 natural sciences, Footprint, Satellite image, Extraction (military), Computer vision, Artificial intelligence, business, Generative adversarial network, 021101 geological & geomatics engineering, 0105 earth and related environmental sciences, Water Science and Technology
Abstract: In this research, building footprints in Yangon City, Myanmar are extracted only from monocular optical satellite image by using conditional generative adversarial network (CGAN). Both training dat...
Published: 2020

408. Loosely-Coupled Ultra-wideband-Aided Scale Correction for Monocular Visual Odometry

Author: Lihua Xie, Muqing Cao, Thien-Minh Nguyen, and Thien Hoang Nguyen
Subjects: 0209 industrial biotechnology, Control and Optimization, Monocular, Scale (ratio), Computer science, business.industry, media_common.quotation_subject, Aerospace Engineering, Ultra-wideband, 02 engineering and technology, Ambiguity, Sensor fusion, 020901 industrial engineering & automation, Control and Systems Engineering, Automotive Engineering, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Visual odometry, business, media_common
Abstract: In this paper, we propose a method to address the problem of scale uncertainty in monocular visual odometry (VO), which includes scale ambiguity and scale drift, using distance measurements from a single ultra-wideband (UWB) anchor. A variant of Levenberg–Marquardt (LM) nonlinear least squares regression method is proposed to rectify unscaled position data from monocular odometry with 1D point-to-point distance measurements. As a loosely-coupled approach, our method is flexible in that each input block can be replaced with one’s preferred choices for monocular odometry/SLAM algorithm and UWB sensor. Furthermore, we do not require the location of the UWB anchor as prior knowledge and will estimate both scale and anchor location simultaneously. However, it is noted that a good initial guess for anchor position can result in more accurate scale estimation. The performance of our method is compared with state-of-the-art on both public datasets and real-life experiments.
Published: 2020

409. Effects of simulated anisometropia and aniseikonia on stereopsis

Author: David A. Atchison, Jeongmin Lee, Alex S. Baldwin, Jianing Lu, Ann L. Webber, Robert F. Hess, and Katrina L. Schmid
Subjects: Adult, Male, medicine.medical_specialty, Visual Acuity, Anisometropia, Young Adult, 03 medical and health sciences, 0302 clinical medicine, Ophthalmology, Aniseikonia, medicine, Humans, Computer Simulation, Physics, Depth Perception, Vision, Binocular, Monocular, Middle Aged, medicine.disease, Sensory Systems, Stereoscopic acuity, Eyeglasses, Meridian (perimetry, visual field), Stereopsis, Random dot stereogram, 030221 ophthalmology & optometry, Female, Binocular vision, 030217 neurology & neurosurgery, Optometry
Abstract: Purpose: Stereopsis depends on horizontally disparate retinal images but otherwise concordance between eyes. Here we investigate the effect of spherical and meridional simulated anisometropia and aniseikonia on stereopsis thresholds. The aims were to determine effects of meridian, magnitude and the relative effects of the two conditions. Methods: Ten participants with normal binocular vision viewed McGill modified random dot stereograms through synchronised shutter glasses. Stereoacuities were determined using a four-alternative forced-choice procedure. To induce anisometropia, trial lenses of varying power and axes were placed in front of right eyes. Seventeen combinations were used: zero (no lens) and both positive and negative, 1 and 2 D powers, at 45, 90 and 180 axes; spherical lenses were also tested. To induce aniseikonia 17 magnification power and axis combinations were used. This included zero (no lens), and 3%, 6%, 9% and 12% at axes 45, 90 and 180; overall magnifications were also tested. Results: For induced anisometropia, stereopsis loss increased as cylindrical axis rotated from 180° to 90°, at which the loss was similar to that for spherical blur. For example, for 2 D meridional anisometropia threshold increased from 1.53 log sec arc (i.e. 34 sec arc) for x 180 to 1.89 log sec arc (78 sec arc) for x 90. Anisometropia induced with either positive or negative lenses had similar detrimental effects on stereopsis. Unlike anisometropia, the stereopsis loss with induced meridional aniseikonia was not affected by axis and was about 64% of that for overall aniseikonia of the same amount. Approximately, each 1 D of induced anisometropia had the same effect on threshold as did each 6% of induced aniseikonia. Conclusion: The axes of meridional anisometropia but not aniseikonia affected stereopsis. This suggests differences in the way that monocular blur (anisometropia) and interocular shape differences (aniseikonia) are processed during the production of stereopsis.
Published: 2020

410. Thermoelectric device for contact cooling of the human eye

Author: N. V. Pasechnikova, Lukyan Anatychuk, V. A. Tiumentsev, O. S. Zadorozhnyi, R. R. Kobylianskyi, S. L. Danyliuk, V. Naumenko, and M. V. Havryliukйй
Subjects: Intraocular pressure, Thermoelectric cooling, Monocular, Materials science, genetic structures, thermoelectric device, hypothermia of the human eye, Condensed Matter Physics, eye diseases, lcsh:QC1-999, thermoelectric cooling, medicine.anatomical_structure, Thermoelectric effect, medicine, General Materials Science, Human eye, sense organs, Physical and Theoretical Chemistry, lcsh:Physics, Biomedical engineering
Abstract: The paper presents the results of the development of a thermoelectric device in the form of a monocular dressing for contact cooling of the human eye through the eyelids. The developed device allows controlled local contact cooling of the eye structures through the eyelids and is designed to treat the acute and chronic eye diseases, reduce intraocular pressure, and reduce pain and inflammatory processes of the eye. The design features of the device and its technical characteristics are presented.
Published: 2020

411. Contrast Normalization Accounts for Binocular Interactions in Human Striate and Extra-striate Visual Cortex

Author: Preeti Verghese, Chuan Hou, and Spero Nicholas
Subjects: Adult, Male, genetic structures, Normalization (image processing), Visual evoked potentials, Biology, Stimulus (physiology), 050105 experimental psychology, Contrast Sensitivity, 03 medical and health sciences, 0302 clinical medicine, medicine, Humans, 0501 psychology and cognitive sciences, Research Articles, Visual Cortex, Vision, Binocular, Monocular, General Neuroscience, 05 social sciences, Electroencephalography, Middle Aged, eye diseases, Visual cortex, medicine.anatomical_structure, Excitatory postsynaptic potential, Female, Binocular interaction, Neuroscience, Photic Stimulation, 030217 neurology & neurosurgery, Lateral occipital cortex
Abstract: During binocular viewing, visual inputs from the two eyes interact at the level of visual cortex. Here we studied binocular interactions in human visual cortex, including both sexes, using source-imaged steady-state visual evoked potentials over a wide range of relative contrast between two eyes. The ROIs included areas V1, V3a, hV4, hMT+, and lateral occipital cortex. Dichoptic parallel grating stimuli in each eye modulated at distinct temporal frequencies allowed us to quantify spectral components associated with the individual stimuli from monocular inputs (self-terms) and responses due to interaction between the inputs from the two eyes (intermodulation [IM] terms). Data with self-terms revealed an interocular suppression effect, in which the responses to the stimulus in one eye were reduced when a stimulus was presented simultaneously to the other eye. The suppression magnitude varied depending on visual area, and the relative contrast between the two eyes. Suppression was strongest in V1 and V3a (50% reduction) and was least in lateral occipital cortex (20% reduction). Data with IM terms revealed another form of binocular interaction, compared with self-terms. IM response was strongest at V1 and was least in hV4. Fits of a family of divisive gain control models to both self- and IM-term responses within each cortical area indicated that both forms of binocular interaction shared a common gain control nonlinearity. However, our model fits revealed different patterns of binocular interaction along the cortical hierarchy, particularly in terms of excitatory and suppressive contributions.SIGNIFICANCE STATEMENTUsing source-imaged steady-state visual evoked potentials and frequency-domain analysis of dichoptic stimuli, we measured two forms of binocular interactions: one is associated with the individual stimuli that represent interocular suppression from each eye, and the other is a direct measure of interocular interaction between inputs from the two eyes. We demonstrated that both forms of binocular interactions share a common gain control mechanism in striate and extra-striate cortex. Furthermore, our model fits revealed different patterns of binocular interaction along the visual cortical hierarchy, particularly in terms of excitatory and suppressive contributions.
Published: 2020

412. A novel standardized test system to evaluate dynamic visual acuity post trifocal or monofocal intraocular lens implantation: a multicenter study

Author: Yuexin Wang, Xuemin Li, Jiarui Yang, Lei Wu, Zhimin Chen, Baohua Wu, Xiaotong Ren, Yanhui Xu, and Dengting Wang
Subjects: medicine.medical_specialty, Visual acuity, Pseudophakia, genetic structures, Mesopic vision, media_common.quotation_subject, medicine.medical_treatment, Visual Acuity, Intraocular lens, Prosthesis Design, Refraction, Ocular, Article, Contrast Sensitivity, Lens Implantation, Intraocular, Ophthalmology, medicine, Humans, Contrast (vision), Prospective Studies, Retrospective Studies, media_common, Lenses, Intraocular, Phacoemulsification, Monocular, business.industry, Glare (vision), eye diseases, Patient Satisfaction, sense organs, medicine.symptom, business, Photopic vision
Abstract: OBJECTIVES: To compare the dynamic visual acuity (DVA) following implantation of trifocal with monofocal intraocular lenses (IOL) and using a novel test system. METHODS: The present research was a retrospective, multicenter clinical study. Two hundred and ten eyes of 149 patients that underwent cataract phacoemulsification and IOL implantation were enrolled. One hundred and ten eyes of patients received trifocal (AT LISA tri839MP, Carl Zeiss Meditec, Germany) and 100 eyes received monofocal (Tecnis ZCB00, Abbott, United States) lenses and were evaluated 3 months after implantation. Outcome measures included monocular uncorrected distance (UDVA), intermediate (UIVA) and near (UNVA) visual acuity and best corrected distance visual acuity (BCDVA; logMAR units); contrast sensitivity under photopic, mesopic, with glare conditions; and dynamic visual acuity using a self-developed system. RESULTS: There was no statistically significant difference in baseline characteristics between groups. Monocular UDVA, UIVA, and UNVA were significantly better (all p
Published: 2020

413. Digging into the multi-scale structure for a more refined depth map and 3D reconstruction

Author: Lu Lin, Yinzhang Ding, Lianghao Wang, Dongxiao Li, and Ming Zhang
Subjects: 0209 industrial biotechnology, Monocular, Computer science, business.industry, Epipolar geometry, Deep learning, 3D reconstruction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 020901 industrial engineering & automation, Artificial Intelligence, Depth map, Scale structure, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Software, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Extracting dense depth from a single image is an important yet challenging computer vision task. Compared with stereo depth estimation, sensing the depth of a scene from monocular images is much more difficult and ambiguous because the epipolar geometry constraints cannot be exploited. The recent development of deep learning technologies has introduced significant progress in monocular depth estimation. This paper aims to explore the effects of multi-scale structures on the performance of monocular depth estimation and further obtain a more refined 3D reconstruction by using our predicted depth and corresponding uncertainty. First, we explore three multi-scale architectures and compare the qualitative and quantitative results of some state-of-the-art approaches. Second, in order to improve the robustness of the system and provide the reliability of the predicted depth for subsequent 3D reconstruction, we estimate the uncertainty of noisy data by modeling such uncertainty in a new loss function. Last, the predicted depth map and corresponding depth uncertainty are incorporated into a monocular reconstruction system. The experiments of monocular depth estimation are mainly performed on the widely used NYU V2 depth dataset, on which the proposed method achieves a state-of-the-art performance. For the 3D reconstruction, the implementation of our proposed framework can reconstruct more smooth and dense models on various scenes.
Published: 2020

414. Ground-Plane-Based Absolute Scale Estimation for Monocular Visual Odometry

Author: Dingfu Zhou, Hongdong Li, and Yuchao Dai
Subjects: 050210 logistics & transportation, Monocular, Scale (ratio), Computer science, business.industry, Mechanical Engineering, 05 social sciences, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Process (computing), Simultaneous localization and mapping, Computer Science Applications, Robustness (computer science), Computer Science::Computer Vision and Pattern Recognition, 0502 economics and business, Automotive Engineering, Computer vision, Artificial intelligence, Visual odometry, business, Absolute scale, Ground plane
Abstract: Recovering an absolute metric scale from a monocular camera is a challenging but highly desirable problem for monocular camera-based systems. By using different kinds of cues, various approaches have been proposed for scale estimation, such as camera height and object size. In this paper, first, we summarize different kinds of scale estimation approaches. Then, we propose a robust divide-and-conquer absolute scale estimation method based on the ground plane and camera height by analyzing the advantages and disadvantages of different approaches. By using the estimated scale, an effective scale correction strategy has been proposed to reduce the scale drift during the monocular visual odometry estimation process. Finally, the effectiveness and robustness of the proposed method have been verified on both public and self-collected image sequences.
Published: 2020

415. Perceptual Limits of Optical See-Through Visors for Augmented Reality Guidance of Manual Tasks

Author: Vincenzo Ferrari, Sara Condino, Roberta Piazza, Marina Carbone, and Mauro Ferrari
Subjects: Adult, Male, Adolescent, Computer science, media_common.quotation_subject, 0206 medical engineering, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Biomedical Engineering, Fixation, Ocular, 02 engineering and technology, Task (project management), User-Computer Interface, Young Adult, Perception, Task Performance and Analysis, medicine, Humans, Focal length, Head-mounted Displays, Computer vision, Focus Cues, media_common, Retina, Augmented Reality, Monocular, business.industry, Accommodation, Ocular, Vergence-Accommodation Conflict, Equipment Design, 020601 biomedical engineering, medicine.anatomical_structure, Fixation (visual), Female, Augmented reality, Artificial intelligence, Naked eye, business, Optical See-Through, Accommodation, Binocular vision
Abstract: Objective: The focal length of available optical see-through (OST) head-mounted displays (HMDs) is at least 2 m; therefore, during manual tasks, the user eye cannot keep in focus both the virtual and real content at the same time. Another perceptual limitation is related to the vergence-accommodation conflict, the latter being present in binocular vision only. This paper investigates the effect of incorrect focus cues on the user performance, visual comfort, and workload during the execution of augmented reality (AR)-guided manual task with one of the most advanced OST HMD, the Microsoft HoloLens. Methods: An experimental study was designed to investigate the performance of 20 subjects in a connect-the-dots task, with and without the use of AR. The following tests were planned: AR-guided monocular and binocular, and naked-eye monocular and binocular. Each trial was analyzed to evaluate the accuracy in connecting dots. NASA Task Load Index and Likert questionnaires were used to assess the workload and the visual comfort. Results: No statistically significant differences were found in the workload, and in the perceived comfort between the AR-guided binocular and monocular test. User performances were significantly better during the naked eye tests. No statistically significant differences in performances were found in the monocular and binocular tests. The maximum error in AR tests was 5.9 mm. Conclusion: Even if there is a growing interest in using commercial OST HMD, for guiding high-precision manual tasks, attention should be paid to the limitations of the available technology not designed for the peripersonal space.
Published: 2020

416. Toward Improving the Mobility of Patients with Peripheral Visual Field Defects with Novel Digital Spectacles

Author: Vatookarn Roongpoovapatr, Mohamed Abou Shousha, Mostafa Abdel-Mottaleb, Taher Eleiwa, Rashed Kashem, Mohamed Abdel-Mottaleb, Richard K. Parrish, and Ahmed Sayed
Subjects: Adult, Male, Test strategy, medicine.medical_specialty, genetic structures, Wilcoxon signed-rank test, Computer science, Walking, Article, 03 medical and health sciences, 0302 clinical medicine, Physical medicine and rehabilitation, Patient performance, Static testing, medicine, Humans, Prospective Studies, Scotoma, Aged, 030304 developmental biology, 0303 health sciences, Monocular, Automated perimetry, Virtual Reality, Glaucoma, Middle Aged, Visual field, Peripheral, Ophthalmology, Eyeglasses, 030221 ophthalmology & optometry, Female, Visual Fields
Abstract: Purpose To assess the efficacy of novel Digital spectacles (DSpecs) to improve mobility of patients with peripheral visual field (VF) loss. Design Prospective case series. Methods Binocular VF defects were quantified with the DSpecs testing strategy. An algorithm was implemented that generated personalized visual augmentation profiles based on the measured VF. These profiles were achieved by relocating and resizing video signals to fit within the remaining VF in real time. Twenty patients with known binocular VF defects were tested using static test images, followed by dynamic walking simulations to determine if they could identify objects and avoid obstacles in an environment mimicking a real-life situation. The effect of the DSpecs were assessed for visual/hand coordination with object-grasping tests. Patients performed these tests with and without the DSpecs correction profile. Results The diagnostic binocular VF testing with the DSpecs was comparable to the integrated monocular standard automated perimetry based on point-by-point assessment with a mismatch error of 7.0%. Eighteen of 20 patients (90%) could identify peripheral objects in test images with the DSpecs that they could not previously. Visual/hand coordination was successful for 17 patients (85%) from the first trial. The object-grasping performance improved to 100% by the third trial. Patient performance, judged by finding and identifying objects in the periphery in a simulated walking environment, was significantly better with the DSpecs (P = 0.02, Wilcoxon rank sum test). Conclusions DSpecs may improve mobility by facilitating the ability of patients to better identify moving peripheral hazardous objects.
Published: 2020

417. Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention

Author: Zhong Wei, Xin Fan, Mingliang Zhang, and Xinchen Ye
Subjects: 0209 industrial biotechnology, Monocular, Channel (digital image), Color image, business.industry, Computer science, Cognitive Neuroscience, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Convolutional neural network, Computer Science Applications, View synthesis, 020901 industrial engineering & automation, Artificial Intelligence, Feature (computer vision), Depth map, 0202 electrical engineering, electronic engineering, information engineering, Unsupervised learning, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Stereo camera
Abstract: Most existing methods based on convolutional neural networks (CNNs) are supervised, which require a large amount of ground-truth data for training. Recently, some unsupervised methods utilize stereo image pairs as input by transforming depth estimation into a view synthesis problem, but need stereo camera as an additional equipment for data acquisition. Therefore, we use more available monocular videos captured from monocular camera as our input, and propose an unsupervised learning framework to predict scene depth maps from monocular video frames. First, we design a novel unsupervised hybrid geometric-refined loss, which can explicitly explore more accurate geometric relationship between the input color image and the predicted depth map, and preserve depth boundaries and fine structures in depth maps. Then, we design a contextual attention module to capture nonlocal dependencies along the spatial and channel dimensions in a dual path, which can improve the ability of feature representation and further preserve fine depth details. In addition, we also utilize an adversarial loss to discriminate synthetic or realistic color images by training a discriminator so as to produce realistic results. Experimental results demonstrate that the proposed framework achieves comparable or even better results than those trained with monocular videos or stereo image pairs.
Published: 2020

418. Fast, robust, and accurate monocular peer-to-peer tracking for surgical navigation

Author: Simon Hazubski, Simon Strzeletz, José Luis Moctezuma, and Harald Hoppe
Subjects: Matching (graph theory), BitTorrent tracker, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Biomedical Engineering, Monocular tracking, Health Informatics, Image processing, 02 engineering and technology, Peer-to-peer navigation, 03 medical and health sciences, 0202 electrical engineering, electronic engineering, information engineering, Humans, Radiology, Nuclear Medicine and imaging, Computer vision, Pose, Pose estimation, 030304 developmental biology, 0303 health sciences, Monocular, business.industry, Orientation (computer vision), Optical Devices, Navigation system, Marker assignment, General Medicine, Computer Graphics and Computer-Aided Design, Computer Science Applications, Surgery, Computer-Assisted, Feature (computer vision), Original Article, 020201 artificial intelligence & image processing, Surgery, Computer Vision and Pattern Recognition, Artificial intelligence, business, Algorithms
Abstract: Purpose This work presents a new monocular peer-to-peer tracking concept overcoming the distinction between tracking tools and tracked tools for optical navigation systems. A marker model concept based on marker triplets combined with a fast and robust algorithm for assigning image feature points to the corresponding markers of the tracker is introduced. Also included is a new and fast algorithm for pose estimation. Methods A peer-to-peer tracker consists of seven markers, which can be tracked by other peers, and one camera which is used to track the position and orientation of other peers. The special marker layout enables a fast and robust algorithm for assigning image feature points to the correct markers. The iterative pose estimation algorithm is based on point-to-line matching with Lagrange–Newton optimization and does not rely on initial guesses. Uniformly distributed quaternions in 4D (the vertices of a hexacosichora) are used as starting points and always provide the global minimum. Results Experiments have shown that the marker assignment algorithm robustly assigns image feature points to the correct markers even under challenging conditions. The pose estimation algorithm works fast, robustly and always finds the correct pose of the trackers. Image processing, marker assignment, and pose estimation for two trackers are handled in less than 18 ms on an Intel i7-6700 desktop computer at 3.4 GHz. Conclusion The new peer-to-peer tracking concept is a valuable approach to a decentralized navigation system that offers more freedom in the operating room while providing accurate, fast, and robust results.
Published: 2020

419. Efficient deformable 3D face model tracking with limited hardware resources

Author: Oihana Otaegui, Luis Unzueta, Jon Goenetxea, and Fadi Dornaika
Subjects: Monocular, Pixel, Computer Networks and Communications, business.industry, Facial motion capture, Computer science, 020207 software engineering, 02 engineering and technology, Hardware and Architecture, Face model, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, 020201 artificial intelligence & image processing, Android (operating system), business, Software, Computer hardware, Gesture, Parametric statistics
Abstract: Face fitting methods align deformable models to faces on images using the information given by the image pixels. However, most algorithms are designed to be used in desktop personal computers (PC), or hardware with significant computational power. These approaches are therefore too demanding for devices with limited computational power, like the increasingly used ARM-based devices. Besides the hardware limitations, the particularities of each operating system include additional challenges to the implementation of real-time face tracking solutions. To fill the lack of methods designed for platforms with a limited computational power we present an efficient way to fit 3D human face models to monocular images. This approach estimates the head pose and gesture in a 3D environment based on a full perspective projection, using parametric non-linear optimisation. We compare the performance of this method running it on similar ARM-based devices with different operating systems (Linux, Android, and iOS). In all cases, we have measured both accuracy and performance. The efficiency of the method makes it possible to run it in real-time ($\backsim $30fps) on devices with limited computational power like smartphones and embedded systems. These kind of efficient methods are a vital component for human behaviour analysis applications, like driver monitoring systems and human-machine interfaces for disabled people among others.
Published: 2020

420. 3D Human Pose Estimation With Generative Adversarial Networks

Author: Hailun Xia and Meng Xiao
Subjects: Monocular, General Computer Science, Computer science, business.industry, General Engineering, Pattern recognition, 02 engineering and technology, 3D human pose estimation, 010501 environmental sciences, Overfitting, 01 natural sciences, Spatial relation, 0202 electrical engineering, electronic engineering, information engineering, Leverage (statistics), 020201 artificial intelligence & image processing, General Materials Science, graph convolutional networks, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, generative adversarial networks, business, lcsh:TK1-9971, Pose, 0105 earth and related environmental sciences
Abstract: 3D human pose estimation from a monocular RGB image is a challenging task in computer vision because of depth ambiguity in a single RGB image. As most methods consider joint locations independently which can lead to an overfitting problem on specific datasets, it’s crucial to consider the plausibility of 3D poses in terms of their overall structures. In this paper, we present Generative Adversarial Networks (GANs) for 3D human pose estimation, which learn plausible 3D human body representations by adversarial training. In GANs, the generator regresses 3D joint positions from a 2D input and the discriminator aims to distinguish the ground-truth 3D samples from the predicted ones. We leverage Graph Convolutional Networks (GCNs) in both generator and discriminator to fully exploit the spatial relations of input and output coordinates. The combination of GANs and GCNs promotes the network to predict more accurate 3D joint locations and learn more reasonable human body structures at the same time. We demonstrate the effectiveness of our approach on standard benchmark datasets (i.e. Human3.6M and HumanEva-I ) where it outperforms state-of-the-art methods. Furthermore, we propose a new evaluation metric distance-based Pose Structure Score (dPSS) for evaluating the structural similarity degree between the predicted 3D pose and its ground-truth.
Published: 2020

421. Automatic Dense Annotation for Monocular 3D Scene Understanding

Author: Md Alimoor Reza, Kai Chen, Akshay Naik, David J. Crandall, and Soon-Heung Jung
Subjects: semi-supervised learning, Conditional random field, Ground truth, Monocular, Training set, General Computer Science, business.industry, Computer science, Scene understanding, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, General Engineering, Object (computer science), computer vision, Annotation, General Materials Science, Computer vision, Segmentation, 3D reconstruction, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971
Abstract: Deep neural networks have revolutionized many areas of computer vision, but they require notoriously large amounts of labeled training data. For tasks such as semantic segmentation and monocular 3d scene layout estimation, collecting high-quality training data is extremely laborious because dense, pixel-level ground truth is required and must be annotated by hand. In this paper, we present two techniques for significantly reducing the manual annotation effort involved in collecting large training datasets. The tools are designed to allow rapid annotation of entire videos collected by RGBD cameras, thus generating thousands of ground-truth frames to use for training. First, we propose a fully-automatic approach to produce dense pixel-level semantic segmentation maps. The technique uses noisy evidence from pre-trained object detectors and scene layout estimators and incorporates spatial and temporal context in a conditional random field formulation. Second, we propose a semi-automatic technique for dense annotation of 3d geometry, and in particular, the 3d poses of planes in indoor scenes. This technique requires a human to quickly annotate just a handful of keyframes per video, and then uses the camera poses and geometric reasoning to propagate these labels through an entire video sequence. Experimental results indicate that the technique could be used as an alternative or complementary source of training data, allowing large-scale data to be collected with minimal human effort.
Published: 2020

422. Joint Attention Mechanisms for Monocular Depth Estimation With Multi-Scale Convolutions and Adaptive Weight Adjustment

Author: Zonghua Zhang, Zhaozong Meng, Nan Gao, and Peng Liu
Subjects: 0209 industrial biotechnology, multi-scale convolutions, General Computer Science, Channel (digital image), Computer science, Feature extraction, 02 engineering and technology, Convolutional neural network, Field (computer science), 020901 industrial engineering & automation, Dimension (vector space), 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Computer vision, Image resolution, Block (data storage), Monocular, business.industry, General Engineering, joint attention mechanisms, Feature (computer vision), weight adjustment, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971, Monocular depth estimation
Abstract: Monocular depth estimation is a fundamental problem for various vision applications, and is therefore gaining increasing attention in the field of computer vision. Though a great improvement has been made thanks to the rapid progress of deep convolutional neural networks, depth estimation of the object at finer details remains an unsatisfactory issue, especially in complex scenes that has rich structure information. In this article, we proposed a deep end-to-end learning framework with the combination of multi-scale convolutions and joint attention mechanisms to tackle this challenge. Specifically, we firstly elaborately designed a lightweight up-convolution to generate multi-scale feature maps. Then we introduced an attention-based residual block to aggregate different feature maps in joint channel and spatial dimension, which could enhance the discriminant ability of feature fusion at finer details. Furthermore, we explored an effective adaptive weight adjustment strategy for the loss function to further improve the performance, which adjusts the weight of each loss term during training without additional hyper-parameters. The proposed framework was evaluated using challenging NYU Depth v2 and KITTI datasets. Experimental results demonstrated that the proposed approach is superior to most of the state-of-the-art methods.
Published: 2020

423. Soft Regression of Monocular Depth Using Scale-Semantic Exchange Network

Author: Wen Su and Haifeng Zhang
Subjects: mutual attention, soft regression, Monocular, General Computer Science, Scale (ratio), business.industry, Computer science, General Engineering, Pattern recognition, Regression, scale-semantic exchange network, General Materials Science, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, Depth estimation, business, Exchange network, lcsh:TK1-9971
Abstract: This paper focuses on depth estimation from single monocular image. Most of existing methods regress depth values or classify depth labels, based on single scale feature representations. However, neither regression nor classification can avoid their inherent defects. Single scale context and low-level semantic cannot support accurate depth estimations. We innovatively address single monocular depth estimation by performing soft regression on probability distribution of classification generated by our proposed scale-semantic exchange network (SSE-Net). Our network maintains rich high-resolution representations. With adding high-to-low resolution to form more stages, the repeated context fusions guarantee each representation receives scale information from other parallel representations over and over. Mutual channel attention mechanism is proposed for emphasizing specific semantics of feature representation. We allow each depth class to be shifted and scaled adaptively. Depth estimation is calculated as expected value of probability distribution. The experimental results verify the efficiency of every proposed components and show competitive results compared with the recent state-of-the-art methods.
Published: 2020

424. Efficient and High-Quality Monocular Depth Estimation via Gated Multi-Scale Network

Author: Liwei Zhang, Chen Yanjie, He Bingwei, Lin Lixiong, and Huang Guohui
Subjects: General Computer Science, Computer science, 02 engineering and technology, 010501 environmental sciences, super resolution, 01 natural sciences, monocular vision, Depth map, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Computer vision, gated multi-scale network, 0105 earth and related environmental sciences, Monocular, business.industry, Deep learning, General Engineering, 020207 software engineering, Filter (signal processing), Superresolution, Key (cryptography), Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, Depth estimation, Scale (map), business, Monocular vision, lcsh:TK1-9971
Abstract: The key issue in monocular depth estimation is how to construct the depth image better and improve the quality of the depth map. At present, most of the monocular depth estimation methods based on deep learning manipulate images at low resolution that leads to loss of detail and blurring of boundaries. Nevertheless, deep learning with a large number of parameters needs highly computational complexity, which makes it difficult to apply high-resolution (HR) images to the depth estimate. In this work, model accuracy and runtime are two important factors to be considered. To improve the depth map quality and reduce the running time of the network, we introduce super-resolution techniques as methods of up-sampling to generate high-quality depth images at a faster rate for the depth estimation network. A novel approach is proposed for collecting high-level features that are captured under different receptive fields. The gated multi-scale decoder allows us to effectively filter information by the gated module. By combining the gated module to aid the super resolution of depth images, our method reduces memory consumption while improves reconstruction quality. Experiment results on the challenging NYU Depth v2 dataset demonstrate that both contributions provide significant performance gains over the state-of-the-art in self-supervised depth estimation.
Published: 2020

425. Image-Based Rendering for Large-Scale Outdoor Scenes With Fusion of Monocular and Multi-View Stereo Depth

Author: Tianlu Mao, Shuang Liu, Minghao Li, Zhaoxin Li, Xiaona Zhang, Shaohua Liu, and Jing Liu
Subjects: Monocular, General Computer Science, business.industry, Computer science, Deep learning, General Engineering, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, monocular depth estimation, Image-based modeling and rendering, Rendering (computer graphics), View synthesis, outdoor scenes, view synthesis, multi-view stereo, General Materials Science, Computer vision, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, Image warping, business, lcsh:TK1-9971, Image-based rendering, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Image-based rendering (IBR) attempts to synthesize novel views using a set of observed images. Some IBR approaches (such as light fields) have yielded impressive high-quality results on small-scale scenes with dense photo capture. However, available wide-baseline IBR methods are still restricted by the low geometric accuracy and completeness of multi-view stereo (MVS) reconstruction on low-textured and non-Lambertian surfaces. The issues become more significant in large-scale outdoor scenes due to challenging scene content, e.g., buildings, trees, and sky. To address these problems, we present a novel IBR algorithm that consists of two key components. First, we propose a novel depth refinement method that combines MVS depth maps with monocular depth maps predicted via deep learning. A lookup table remap is proposed for converting the scale of the monocular depths to be consistent with the scale of the MVS depths. Then, the rescaled monocular depth is used as the constraint in the minimum spanning tree (MST)-based nonlocal filter to refine the per-view MVS depth. Second, we present an efficient shape-preserving warping algorithm that uses superpixels to generate the warped images and blend expected novel views of scenes. The proposed method has been evaluated on public MVS and view synthesis datasets, as well as newly captured large-scale outdoor datasets. In comparison with state-of-the-art methods, the experimental results demonstrated that the proposed method can obtain more complete and reliable depth maps for the challenging large-scale outdoor scenes, thereby resulting in more promising novel view synthesis.
Published: 2020

426. Monocular Depth Prediction With Residual DenseASPP Network

Author: Kewei Wu, Shunran Zhang, and Zhao Xie
Subjects: Monocular, General Computer Science, business.industry, Computer science, Deep learning, General Engineering, deep learning, 020207 software engineering, Pattern recognition, feature reuse, 02 engineering and technology, Residual, Convolutional neural network, residual DenseASPP network, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, General Materials Science, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, Depth estimation, Focus (optics), business, lcsh:TK1-9971, Block (data storage)
Abstract: Monocular depth estimation is an ill-posed problem because infinite 3D scenes can be projected to the same 2D scenes. Most recent methods focus on image-level information from deep convolutional neural networks, while training them may suffer from slow convergence and accuracy degeneration, especially for deeper network and more feature channels. Based on an encoder-decoder framework, we propose a novel Residual DenseASPP Network. In our Residual DenseASPP network, we define features as low/mid/high vision features and use two-kinds of skip connection to learn useful features with certain layers, where feature concentration in the dense block is used to generate more features in the same layer, and feature summation in the residual block is used to increase backward gradient. The experimental results show that high vision features require more channels by feature concentration, while low/mid vision features need better convergence by feature summation. Experiments show that our proposed approach achieves state-of-the-art performance on both NYUv2 and Make3D datasets.
Published: 2020

427. Scene Target 3D Point Cloud Reconstruction Technology Combining Monocular Focus Stack and Deep Learning

Author: Yanzhu Hu, Song Wang, and Yingjian Wang
Subjects: Divide and conquer algorithms, Monocular, General Computer Science, Artificial neural network, business.industry, Computer science, Deep learning, General Engineering, Point cloud, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, light field reconstruction, deep learning, Focus stack image, all focus image, Focal length, General Materials Science, Computer vision, Artificial intelligence, 3D reconstruction, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, Parallax, Focus (optics), lcsh:TK1-9971
Abstract: In order to obtain the depth information of the target in the scene and realize three-dimensional (3D) reconstruction, in this paper, a target reconstruction method combining monocular focus stack image and deep neural network is proposed. This method makes full use of the advantages of light field imaging technology and can generate the all focus image. The method first collects multiple frames of continuous images at different focal lengths of the scene, using a divide and conquer algorithm strategy, uplink uses YOLO neural network to identify the target in 3D space and track the position information; the downlink reconstructs the four-dimensional (4D) light field data based on the focus stack image frequency domain back projection, and then uses light field imaging technology to invert the scene parallax; subsequently, achieve scene depth estimation and reconstruction of all focus image; finally, the uplink and downlink are merged to realize the reconstruction of the 3D point cloud of the space target. Experimental results on real scenes show the effectiveness of the proposed algorithm.
Published: 2020

428. Leveraging Contextual Information for Monocular Depth Estimation

Author: Doyeon Kim, Sihaeng Lee, Janghyeon Lee, and Junmo Kim
Subjects: Estimation, contextual information, Monocular, General Computer Science, Computer science, business.industry, General Engineering, Contextual information, General Materials Science, Computer vision, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971, Monocular depth estimation
Abstract: Humans strongly rely on visual cues to understand scenes such as segmenting, detecting objects, or measuring the distance from nearby objects. Recent studies suggest that deep neural networks can take advantage of contextual representation for the estimation of a depth map for a given image. Therefore, focusing on the scene context can be beneficial for successful depth estimation. In this study, a novel network architecture is proposed to improve the performance by leveraging the contextual information for monocular depth estimation. We introduce a depth prediction network with the proposed attentive skip connection and a global context module, to obtain meaningful semantic features and enhance the performance of the model. Furthermore, our model is validated through several experiments on the KITTI and NYU Depth V2 datasets. The experimental results demonstrate the effectiveness of the proposed network, which achieves a state-of-the-art monocular depth estimation performance while maintaining a high running speed.
Published: 2020

429. Vision-Enhanced Low-Cost Localization in Crowdsourced Maps

Author: David Betaille, Gijs Dubbelman, Oihana Otaegui, Gorka Velez, Anweshan Das, Axel Koppert, Benedict Flade, Julian Eggert, Mobile Perception Systems Lab, Video Coding & Architectures, EAISI Mobility, and EAISI Foundational
Subjects: Computer science, Geometry, Advanced driver assistance systems, 02 engineering and technology, Overlay, Receivers, 01 natural sciences, Inertial measurement unit, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Monocular, Sensors, business.industry, Mechanical Engineering, 010401 analytical chemistry, 020206 networking & telecommunications, Global navigation satellite system, Kalman filter, Cameras, Sensor fusion, 0104 chemical sciences, Computer Science Applications, Meters, GNSS applications, Automotive Engineering, Geostationary orbit, Three-dimensional displays, Artificial intelligence, business
Abstract: The lane-level localization of vehicles with low-cost sensors is a challenging task. In situations in which Global Navigation Satellite Systems (GNSSs) suffer from weak observation geometry or from the influence of reflected signals, the fusion of heterogeneous information presents a suitable approach for improving the localization accuracy. We propose a solution based on a monocular front-facing camera, a low-cost inertial measurement unit (IMU), and a single-frequency GNSS receiver. The sensor data fusion is implemented as a tightly coupled Kalman filter that corrects the IMU-based trajectory with GNSS observations while employing European Geostationary Overlay Service correction data. Further, we consider vision-based complementary data that serve as an additional source of information. In contrast to other approaches, the camera is not used to infer the motion of the vehicle, but rather for directly correcting the localization results under the usage of map information. More specifically, the so-called camera-to-map alignment is done by comparing virtual 3D views (candidates) created from projected map data with lane geometry features that are extracted from the camera image. One strength of the proposed solution is its compatibility with state-of-the-art map data, which are publicly available from different sources. We validate the approach on real-world data recorded in The Netherlands and show that it presents a promising and cost-efficient means to support future advanced driver assistance systems.
Published: 2020

430. Transfer2Depth: Dual Attention Network With Transfer Learning for Monocular Depth Estimation

Author: Chuan-Yu Chang, Yao-Pao Huang, Chia-Hung Yeh, and Chih-Yang Lin
Subjects: 0209 industrial biotechnology, General Computer Science, Computer science, monocular depth estimation, 02 engineering and technology, transfer learning, Machine learning, computer.software_genre, Ordinal regression, Convolutional neural network, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Representation (mathematics), Network architecture, Monocular, business.industry, Deep learning, General Engineering, deep learning, Computer vision, 020201 artificial intelligence & image processing, spatial-channel attention module, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, Transfer of learning, business, lcsh:TK1-9971, computer
Abstract: Monocular depth estimation poses a fundamental problem in many tasks. Although recent convolutional neural network-based methods can achieve high accuracy with very deep networks and complex architectures to exploit different cues and features, doing so not only increases the vulnerability of the model, but also increases the difficulty of convergence. Moreover, recent depth estimation methods for indoor environments are impractical for outdoor environments. In this work, we aim to develop a simple deep network structure to improve model effectiveness for depth estimation. We apply a dual attention module that can be inserted into any type of network to improve the power of representation, and additionally propose a training strategy which combines transfer learning and ordinal regression to improve training convergence. Even with a simple end-to-end encoder-decoder type of network architecture, we are able to achieve state-of-the-art performance on two of the biggest datasets for indoor and outdoor depth estimation: NYU Depth v2 and KITTI.
Published: 2020

431. Unsupervised Monocular Training Method for Depth Estimation Using Statistical Masks

Author: Xiangtong Wang, Peng Cheng, Wei Li, Menglong Yang, and Binbin Liang
Subjects: General Computer Science, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, statistical masks, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Segmentation, error map, 0105 earth and related environmental sciences, Estimation, Monocular, Pixel, business.industry, General Engineering, Pattern recognition, Variance (accounting), Training methods, Computer Science::Computer Vision and Pattern Recognition, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, problematic pixels, business, lcsh:TK1-9971, Monocular depth estimation
Abstract: Recently, unsupervised monocular training methods based on convolutional neural networks have already shown surprisingly progress in improving the accuracy of depth estimation. However, the performance of these methods suffers deeply from problematic pixels such as occluded pixels, low-texture pixels, and so on. In this paper, we introduce a method to a mask by the statistic of error maps for segmenting the problematic pixels. Different from the conventional methods which use additional segmentation networks to classify problematic pixels, we use a multi-task learning architecture to generate identical mask, mean mask, and variance mask for filtering the problematic pixels. Experimental results show that our proposed method has satisfactory performance compared with other relative methods on the KITTI dataset. Moreover, we also apply our method to the UAV dataset VisDrone, and the results also indicate the effectiveness of the method in detecting moving objects.
Published: 2020

432. Integral-Sliding-Mode-Observer-Based Structure and Motion Estimation of a Single Object in General Motion Using a Monocular Dynamic Camera

Author: Dongkyoung Chwa
Subjects: State variable, integral sliding mode observer, structure and motion estimation, Monocular, General Computer Science, Observer (quantum physics), Computer science, business.industry, General Engineering, Camera-object relative motion dynamics, Object (computer science), Motion (physics), Integral sliding mode, Robustness (computer science), Motion estimation, monocular dynamic camera, General Materials Science, Computer vision, object in general motion, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971
Abstract: An integral sliding mode observer (ISMO)-based method for estimating the structure and motion (SaM) of an object in general motion is proposed using a monocular dynamic (moving) camera. As the unknown range and object velocity can be considered as disturbances and should be quickly estimated, it is necessary to maintain the robustness against these disturbances from the start by eliminating the reaching mode. Therefore, the ISMO-based method is proposed on the basis of a relative camera-object motion model. By formulating the relative motion model with three-dimensional measurable state variables, three unknown components among the range, unknown object velocity components, and unknown camera velocity components can be estimated by the proposed method in the following cases: i) the camera is in dynamic motion and the object is in semigeneral motion [i.e., initially static (stationary) and then in general (static or dynamic) motion], ii) the camera is in dynamic motion and the object is in general motion within a constrained space, and iii) both the object and camera are in general motion within a less constrained space when the range information is available. Simulation and experimental results demonstrate that the range and object velocity can be estimated with a satisfactory transient response using a monocular dynamic camera.
Published: 2020

433. Virtual Stereovision Pose Measurement of Noncooperative Space Targets for a Dual-Arm Space Robot

Author: Ai-Guo Wu, Jianqing Peng, Wenfu Xu, and Bin Liang
Subjects: Monocular, Quadrilateral, Spacecraft, business.industry, Computer science, 020208 electrical & electronic engineering, 02 engineering and technology, Laser tracker, Position (vector), 0202 electrical engineering, electronic engineering, information engineering, Robot, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business, Instrumentation
Abstract: Noncooperative target pose (position and attitude) measurement is very important for on-orbital servicing tasks, such as malfunctioned spacecraft capturing and repairing, space debris removal, and so on. Traditional stereovision-based methods require the two cameras to observe the same object and form a common observation area, limiting system flexibility, and measurement distance. In this paper, a virtual stereovision (VSV) modeling and pose measurement method are proposed for a dual-arm space robotic system. Each arm only carries one camera to observe different objects. The geometry features (i.e., the triangle, quadrilateral, or circular features) are independently extracted from the images of each camera. The two separate cameras are mapped to a stereo measurement range and the, respectively, identified features are converted into an equivalent “virtual common feature.” Then, a “VSV measurement system” is constructed. The rocket nozzle and the triangular support of the satellite are considered as different objects to be observed by the two cameras of a dual-arm space robot. The pose of the nozzles and the triangular brackets is further achieved based on the VSV information. Finally, we developed an experimental system, which is composed of a satellite mockup, two monocular cameras, and high precision laser tracker (used to evaluate the vision measure accuracy). The experimental results verified the proposed method. The average position error of the method is less than 6.1367 mm, and the average attitude error is less than 2.415°.
Published: 2020

434. Vehicle global 6-DoF pose estimation under traffic surveillance camera

Author: Shanxin Zhang, Qing Li, Cheng Wang, Jonathan Li, Xiuhong Lin, Xin Li, Juyong Zhang, Zijian He, and Chenhui Yang
Subjects: Monocular, Landmark, 010504 meteorology & atmospheric sciences, Computer science, business.industry, Deep learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 0211 other engineering and technologies, Point cloud, 02 engineering and technology, 01 natural sciences, Atomic and Molecular Physics, and Optics, Computer Science Applications, Six degrees of freedom, Computer vision, Artificial intelligence, Computers in Earth Sciences, Architecture, business, Engineering (miscellaneous), Intelligent transportation system, Pose, 021101 geological & geomatics engineering, 0105 earth and related environmental sciences
Abstract: Accurately sensing the global position and posture of vehicles in traffic surveillance videos is a challenging but valuable issue for future intelligent transportation systems. Although in recent years, deep learning has brought about major breakthroughs in the six degrees of freedom (6-DoF) pose estimation of objects from monocular images, accurate estimation of the geographic 6-DoF poses of vehicles using images from traffic surveillance cameras remains challenging. We present an architecture that computes continuous global 6-DoF poses throughout joint 2D landmark estimation and 3D pose reconstruction. The architecture infers the 6-DoF pose of a vehicle from the appearance of the image of the vehicle and 3D information. The architecture, which does not rely on intrinsic camera parameters, can be applied to all surveillance cameras by a pre-trained model. Also, with the help of 3D information from the point clouds and the 3D model itself, the architecture can predict landmarks with few and/or blurred textures. Moreover, because of the lack of public training datasets, we release a large-scale dataset, ADFSC, that contains 120 K groups of data with random viewing angles. Regarding both 2D and 3D metrics, our architecture outperforms existing state-of-the-art algorithms in vehicle 6-DoF estimation.
Published: 2020

435. MonoFENet: Monocular 3D Object Detection With Feature Enhancement Networks

Author: Wentao Bao, Bin Xu, and Zhenzhong Chen
Subjects: Monocular, Artificial neural network, business.industry, Computer science, Point cloud, 02 engineering and technology, Computer Graphics and Computer-Aided Design, Object detection, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, Benchmark (computing), 020201 artificial intelligence & image processing, Point (geometry), Computer vision, Artificial intelligence, business, Software
Abstract: Monocular 3D object detection has the merit of low cost and can be served as an auxiliary module for autonomous driving system, becoming a growing concern in recent years. In this paper, we present a monocular 3D object detection method with feature enhancement networks, which we call MonoFENet . Specifically, with the estimated disparity from the input monocular image, the features of both the 2D and 3D streams can be enhanced and utilized for accurate 3D localization. For the 2D stream, the input image is used to generate 2D region proposals as well as to extract appearance features. For the 3D stream, the estimated disparity is transformed into 3D dense point cloud, which is then enhanced by the associated front view maps. With the RoI Mean Pooling layer, 3D geometric features of RoI point clouds are further enhanced by the proposed point feature enhancement ( PointFE ) network. The region-wise features of image and point cloud are fused for the final 2D and 3D bounding boxes regression. The experimental results on the KITTI benchmark reveal that our method can achieve state-of-the-art performance for monocular 3D object detection.
Published: 2020

436. Temporally Refined Graph U-Nets for Human Shape and Pose Estimation From Monocular Videos

Author: Jiashi Feng, Yong Dou, and Yang Zhao
Subjects: Monocular, Computer science, business.industry, Applied Mathematics, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, 02 engineering and technology, Solid modeling, Convolutional neural network, Graph, Vertex (geometry), Signal Processing, 0202 electrical engineering, electronic engineering, information engineering, Graph (abstract data type), RGB color model, Computer vision, Polygon mesh, Artificial intelligence, Electrical and Electronic Engineering, business, Pose
Abstract: This work addresses a challenging problem of estimating the full 3D human shape and pose from monocular videos. Since real-world 3D mesh-labeled datasets are limited, most current methods in 3D human shape reconstruction only focus on single RGB images, losing all the temporal information. In contrast, we propose temporally refined Graph U-Nets, including an image-level module and a video-level module, to solve this problem. The image-level module is Graph U-Nets for human shape and pose estimation from images, where the Graph Convolutional Neural Network (Graph CNN) helps the information communication of neighboring vertices, and the U-Nets architecture enlarges the receptive field of each vertex and fuses high-level and low-level features. The video-level module is a small Residual Temporal Graph CNN (Residual TG-CNN), which learns temporal dynamics from both structural and temporal neighbors. The temporal dynamics of each vertex are continuous in the temporal dimension and highly relevant to the structural neighbors, so it is helpful to diminish the ambiguity of the body in single images by fusing temporal dynamics. Our algorithm makes full use of labels from image-level datasets and refines the image-level results through video-level module. Evaluated on Human3.6 M and 3DPW datasets, our model produces accurate 3D human meshes and achieves superior 3D human pose estimation accuracy when compared with state-of-the-art methods.
Published: 2020

437. Blind Binocular Visual Quality Predictor Using Deep Fusion Network

Author: Ting Luo, Lu Yu, Qiuping Jiang, Wujie Zhou, and Jingsheng Lei
Subjects: Monocular, genetic structures, business.industry, Computer science, Deep learning, media_common.quotation_subject, Feature extraction, Pattern recognition, Convolutional neural network, eye diseases, Computer Science Applications, Visualization, Computational Mathematics, Feature (computer vision), Encoding (memory), Signal Processing, Contrast (vision), Artificial intelligence, business, media_common
Abstract: Blind binocular visual quality prediction (BVQP) is more challenging than blind monocular visual quality prediction (MVQP). Recently, the application of convolutional neural networks (CNNs) to blind MVQP has resulted in significant progress in that area. In contrast, the adoption of deep learning for blind BVQP has received scant attention. In this study, we devised an end-to-end deep fusion network (DFNet) model trained in a unified framework for blind BVQP. This core prediction engine comprises monocular feature encoding networks and binocular feature fusion networks, followed by a quality prediction layer. The monocular feature encoding networks are first established to capture the low- and high-level monocular features of the left and right retinal views, respectively. Subsequently, these monocular features are integrated by the binocular feature fusion networks to obtain binocular deep features. Finally, the final binocular visual quality is predicted by quality prediction networks. Comparisons via experiments using two standard subject-rated BVQP datasets indicate that the proposed DFNet architecture achieves highly consistent alignment with human assessment and outperforms most relevant existing models.
Published: 2020

438. Attention-Based Dense Decoding Network for Monocular Depth Estimation

Author: Ge Zhang, Jianrong Wang, Tianyi Xu, Mei Yu, and Luo Tao
Subjects: Atrous spatial pyramid pooling, self-attention, Monocular, General Computer Science, business.industry, Computer science, Pooling, General Engineering, Process (computing), depth estimation, dense decoding module, General Materials Science, Computer vision, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, Representation (mathematics), Scale (map), Spatial analysis, lcsh:TK1-9971, Decoding methods, Communication channel
Abstract: Depth estimation is a classic computer vision task and provides rich representation of objects and environment. In recent years, the performance of end-to-end depth estimation has been significantly improved. However, the stack of convolutions and pooling operations result in losing local detail spatial information, which is extremely important to monocular depth estimation. In order to overcome this problem, in this work, we propose an encoder-decoder framework with skip connections. Based on the self-attention mechanism, we apply the channel-spatial attention module as a transition layer, which captures the depth and spatial positional relationship and improves the presentation ability of channel and space. Then we propose a dense decoding module to make full use of the attention features of different scale ranges in the decoding process. It achieves a more massive and denser receptive field while obtaining multi-scale information. Finally, a novel distance-aware loss is introduced to predict more meticulous edges and local details in the distance. Experiments demonstrate that the proposed method outperforms the state-of-the-art on KITTI and NYU Depth V2 datasets.
Published: 2020

439. SRHandNet: Real-Time 2D Hand Pose Estimation With Simultaneous Region Localization

Author: Cong Peng, Baowen Zhang, and Yangang Wang
Subjects: Boosting (machine learning), Monocular, Color image, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Computer Graphics and Computer-Aided Design, Minimum bounding box, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Pose, Software
Abstract: This paper introduces a novel method for real-time 2D hand pose estimation from monocular color images, which is named as SRHandNet. Existing methods can not time efficiently obtain appropriate results for small hand. Our key idea is to simultaneously regress the hand region of interests (RoIs) and hand keypoints for a given color image, and iteratively take the hand RoIs as feedback information for boosting the performance of hand keypoints estimation with a single encoder-decoder network architecture. Different from previous region proposal network (RPN), a new lightweight bounding box representation, which is called region map, is proposed. The proposed bounding box representation map together with hand keypoints heatmaps are combined into the unified multi-channel feature maps, which can be easily acquired with only one forward network inference and thus improve the runtime efficiency of the network. Our proposed SRHandNet can run at 40fps for hand bounding box detection and up to 30fps accurate hand keypoints estimation under the desktop environment without implementation optimization. Experiments demonstrate the effectiveness of the proposed method. State-of-the-art results are also achieved out competing all recent methods.
Published: 2020

440. Simple But Effective Scale Estimation for Monocular Visual Odometry in Road Driving Scenarios

Author: Sung-Tae Kim, Sung-Jea Ko, Ming Fan, Seung Wook Kim, and Jee-Young Sun
Subjects: Monocular, General Computer Science, Scale (ratio), Computer science, business.industry, scale estimation, General Engineering, Filter (signal processing), Simultaneous localization and mapping, Robustness (computer science), Monocular SLAM, General Materials Science, Computer vision, Segmentation, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, Visual odometry, business, Normal, 3D plane fitting, lcsh:TK1-9971
Abstract: In large-scale environments, scale drift is a crucial problem of monocular visual simultaneous localization and mapping (SLAM). A common solution is to utilize the camera height, which can be obtained using the reconstructed 3D ground points (3DGPs) from two successive frames, as prior knowledge. Increasing the number of 3DGPs by using more proceeding frames can be a natural extension of this solution to estimate a more precise camera height. However, merely employing multiple frames based on conventional methods is hard to be directly applicable in a real-world scenario because the vehicle motion and inaccurate feature matching inevitably cause large uncertainty and noisy 3DGPs. In this study, we propose an elaborate method to collect confident 3DGPs from multiple frames for robust scale estimation. First, we gather 3DGP candidates that can be seen in more than a predefined number of frames. To verify the 3DGP candidates, we filter out the 3D points at the exterior of the road region obtained by the deep-learning-based road segmentation model. In addition, we formulate an optimization problem constrained by a simple but effective geometric assumption that the normal vector of the ground plane lies in the null space of a movement vector of the camera center, and provide a closed-form solution. ORB-SLAM with the proposed scale estimation method achieves the average translation error with 1.19% on the KITTI dataset, which outperforms the state-of-the-art conventional monocular visual SLAM methods in road driving scenarios.
Published: 2020

441. Deep Monocular Visual Odometry for Ground Vehicle

Author: Xiangwei Wang and Hui Zhang
Subjects: 0209 industrial biotechnology, Monocular, General Computer Science, business.industry, Computer science, motion analysis, Visual odometry, General Engineering, Robotics, 02 engineering and technology, Frame rate, machine learning, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, Robot, 020201 artificial intelligence & image processing, General Materials Science, Computer vision, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, Focus (optics), lcsh:TK1-9971, Camera resectioning
Abstract: Monocular visual odometry, with the ability to help robots to locate themselves in unexplored environments, has been a crucial research problem in robotics. Though the existed learning-based end-to-end methods can reduce engineering efforts such as accurate camera calibration and tedious case-by-case parameter tuning, the accuracy is still limited. One of the main reasons is that previous works aim to learn six-degrees-of-freedom motions despite the constrained motion of a ground vehicle by its mechanical structure and dynamics. To push the limit, we analyze the motion pattern of a ground vehicle and focus on learning two-degrees-of-freedom motions by proposed motion focusing and decoupling. The experiments on KITTI dataset show that the proposed motion focusing and decoupling approach can improve the visual odometry performance by reducing the relative pose error. Moreover, with the dimension reduction of the learning objective, our network is much lighter with only four convolution layers, which can quickly converge during the training stage and run in real-time at over 200 frames per second during the testing stage.
Published: 2020

442. Blind Stereoscopic Image Quality Assessment Accounting for Human Monocular Visual Properties and Binocular Interactions

Author: Weiqing Yan, Yun Liu, Baoqing Huang, Zhi Zheng, and Hongwei Yu
Subjects: Visual perception, General Computer Science, Image quality, Computer science, media_common.quotation_subject, Stereoscopic image quality, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Stereoscopy, human visual system, law.invention, law, Perception, Distortion, General Materials Science, Computer vision, media_common, Monocular, business.industry, General Engineering, Visualization, Support vector machine, monocular feature, binocular feature, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971
Abstract: Human visual perceptual model is a key factor for evaluating stereoscopic image quality. This paper focuses on the contributions of monocular and binocular properties on quality perception and proposes a novel blind stereoscopic image quality assessment model by comprehensively digging the relationship between visual features and quality perception. The statistical quality-aware monocular features are extracted from both left view and right view to reveal monocular quality perception, including the color statistical features which are missed in most previous models, while the multiple features of the summation signal and the entropy features of the difference signal are extracted to quantify the binocular quality perception. Finally, support vector regression (SVR) is utilized to train a regression model based on the extracted features and the subjective scores. Three public databases, LIVE 3D Phase I, LIVE 3D Phase II, and MCL 3D Database, are adopted to prove the effectiveness of the proposed model. Experimental results demonstrate that the proposed model is superior to other existing state-of-the-art quality metrics.
Published: 2020

443. Perspective Distortion Modeling for Image Measurements

Author: Steve Davis, Samia Nefti-Meziani, Theodoros Theodoridis, and Alexandre Bousaid
Subjects: Image formation, General Computer Science, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, pose effect, 02 engineering and technology, pose estimation, 01 natural sciences, 010309 optics, Perspective distortion, Distortion, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, distance effect, General Materials Science, Computer vision, Set (psychology), Pose, Monocular, business.industry, Perspective (graphical), General Engineering, Perspective distortion modeling, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, projective rotation, lcsh:TK1-9971, Monocular vision, foreshortening effect
Abstract: A perspective distortion modelling for monocular view that is based on the fundamentals of perspective projection is presented in this work. Perspective projection is considered to be the most ideal and realistic model among others, which depicts image formation in monocular vision. There are many approaches trying to model and estimate the perspective effects in images. Some approaches try to learn and model the distortion parameters from a set of training data that work only for a predefined structure. None of the existing methods provide deep understanding of the nature of perspective problems. Perspective distortions, in fact, can be described by three different perspective effects. These effects are pose, distance and foreshortening. They are the cause of the aberrant appearance of object shapes in images. Understanding these phenomena have long been an interesting topic for artists, designers and scientists. In many cases, this problem has to be necessarily taken into consideration when dealing with image diagnostics, high and accurate image measurement, as well as accurate pose estimation from images. In this work, a perspective distortion model for every effect is developed while elaborating the nature of perspective effects. A distortion factor for every effect is derived, then followed by proposed methods, which allows extracting the true target pose and distance, and correcting image measurements.
Published: 2020

444. Stereo Visual-Inertial Fusion for UAV State Estimation

Author: Chao Yao, Klaus Janschek, and Jinyao Zhu
Subjects: 0209 industrial biotechnology, Fusion, Inertial frame of reference, Monocular, SIMPLE (military communications protocol), Computer science, business.industry, 020208 electrical & electronic engineering, Estimator, 02 engineering and technology, Fusion system, 020901 industrial engineering & automation, Control and Systems Engineering, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, State (computer science), Artificial intelligence, business
Abstract: Visual-inertial fusion is frequently used for state estimation in aerial robotic applications due to the low-cost, simple hardware setup as well as the high accuracy. This work proposes a stereo visual-inertial fusion system based on the monocular method VINS-Mono, which tightly combines the visual and inertial measurements. Timing statistics are provided for the system running on an Intel NUC Mini-PC. The system real-time capability fulfills the requirements of the closed-loop control for a UAV. The proposed fusion system is evaluated in the public EuRoC MAV dataset and compared with several representative state-of-the-art open-sourced state estimators. According to the results, our method achieves competitive performance with relative low estimation errors in a computationally efficient manner.
Published: 2020

445. Monocular-based pose determination of uncooperative space objects

Author: Kyunam Kim, Alexei Harvard, Vincenzo Capuano, and Soon-Jo Chung
Subjects: 020301 aerospace & aeronautics, Monocular, Spacecraft, business.industry, Computer science, 3D reconstruction, Optical flow, Aerospace Engineering, 02 engineering and technology, Filter (signal processing), 01 natural sciences, 0203 mechanical engineering, 0103 physical sciences, Structure from motion, A priori and a posteriori, Computer vision, Artificial intelligence, business, 010303 astronomy & astrophysics, Pose
Abstract: Vision-based methods to determine the relative pose of an uncooperative orbiting object are investigated in applications to spacecraft proximity operations, such as on-orbit servicing, spacecraft formation flying, and small bodies exploration. Depending on whether the object is known or unknown, a shape model of the orbiting target object may have to be constructed autonomously in real-time by making use of only optical measurements. The Simultaneous Estimation of Pose and Shape (SEPS) algorithm that does not require a priori knowledge of the pose and shape of the target is presented. This makes use of a novel measurement equation and filter that can efficiently use optical flow information along with a star tracker to estimate the target's angular rotational and translational relative velocity as well as its center of gravity. Depending on the mission constraints, SEPS can be augmented by a more accurate offline, on-board 3D reconstruction of the target shape, which allows for the estimation of the pose as a known target. The use of Structure from Motion (SfM) for this purpose is discussed. A model-based approach for pose estimation of known targets is also presented. The architecture and implementation of both the proposed approaches are elucidated and their performance metrics are evaluated through numerical simulations by using a dataset of images that are synthetically generated according to a chaser/target relative motion in Geosynchronous Orbit (GEO).
Published: 2020

446. Learning Depth for Scene Reconstruction Using an Encoder-Decoder Model

Author: Xiaohan Tu, Cheng Xu, Siping Liu, Guoqi Xie, Jing Huang, Renfa Li, and Junsong Yuan
Subjects: General Computer Science, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 010501 environmental sciences, Simultaneous localization and mapping, 01 natural sciences, depth estimation, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Computer vision, decoder, 0105 earth and related environmental sciences, Feature detection (computer vision), Monocular, Pixel, business.industry, General Engineering, encoder, Feature (computer vision), RGB color model, Convolutional neural networks, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, Scale (map), lcsh:TK1-9971, Encoder, simultaneous localization and mapping
Abstract: Depth estimation has received considerable attention and is often applied to visual simultaneous localization and mapping (SLAM) for scene reconstruction. At least to our knowledge, sufficiently reliable depth always fails to be provided for monocular depth estimation-based SLAM because new image features are rarely re-exploited effectively, local features are easily lost, and relative depth relationships among depth pixels are readily ignored in previous depth estimation methods. Based on inaccurate monocular depth estimation, SLAM still faces scale ambiguity problems. To accurately achieve scene reconstruction based on monocular depth estimation, this paper makes three contributions. (1) We design a depth estimation model (DEM), consisting of a precise encoder to re-exploit new features and a decoder to learn local features effectively. (2) We propose a loss function using the depth relationship of pixels to guide the training of DEM. (3) We design a modular SLAM system containing DEM, feature detection, descriptor computation, feature matching, pose prediction, keyframe extraction, loop closure detection, and pose-graph optimization for pixel-level scene reconstruction. Extensive experiments demonstrate that the DEM and DEM-based SLAM are effective. (1) Our DEM predicts more reliable depth than the state of the arts when inputs are RGB images, sparse depth, or the fusion of both on public datasets. (2) The DEM-based SLAM system achieves comparable accuracy as compared with well-known modular SLAM systems.
Published: 2020

447. Monocular Depth Estimation Based on Multi-Scale Graph Convolution Networks

Author: Jun Liang, Junwei Fu, and Ziyang Wang
Subjects: 0209 industrial biotechnology, Monocular, General Computer Science, Computer science, business.industry, General Engineering, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Latent variable, Topological graph, Convolutional neural network, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, Graph (abstract data type), 020201 artificial intelligence & image processing, General Materials Science, Computer vision, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, reconstruction strategy, lcsh:TK1-9971, graph convolution network, Monocular depth estimation
Abstract: Monocular depth estimation is a foundation task of three-dimensional (3D) reconstruction which is used to improve the accuracy of environment perception. Because of the simpler hardware requirement, it is more suitable than other multi-view methods. In this study, a new monocular depth estimation algorithm based on graph convolution network (GCN) is proposed. The pixel-wise depth relationship is introduced into conventional convolution neural network (CNN) to make up the disadvantage of processing non-Euclidian data. And the remaining depth topological graph information on the spatial latent variables are extracted based on a multi-scale reconstruction strategy. The final results on NYU-v2 depth dataset and KITTI depth dataset demonstrate that our algorithm improves the quality of monocular depth estimation, especially there are several little objects coexisting in the scenes.
Published: 2020

448. LW-Net: A Lightweight Network for Monocular Depth Estimation

Author: Cheng Feng, Congxuan Zhang, Zhen Chen, Ming Li, Hao Chen, and Bingbing Fan
Subjects: iterative decoder, General Computer Science, Computer science, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, self-supervised learning, convolutional neural networks, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Pyramid (image processing), lightweight, 0105 earth and related environmental sciences, Monocular, General Engineering, Function (mathematics), Construct (python library), Frame rate, Robot, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, lcsh:TK1-9971, Encoder, Algorithm, Decoding methods, Monocular depth estimation
Abstract: Existing self-supervised monocular depth estimation methods usually explore increasingly large networks to achieve accurate estimation results. However, larger networks are more difficult to train and require more storage space. To balance the network size and the computational accuracy, we propose in this article a compact lightweight network for monocular depth estimation, named LW-Net. First, we construct a compact network by designing an iterative decoder with shared weights and a lightweight pyramid encoder. The proposed network includes significantly fewer parameters than most of the existing monocular depth estimation networks. Second, we exploit a self-supervised training strategy by combining the proposed LW-Net model with a pose network, and we then use a hybrid loss function to train the decoder and encoder separately. The proposed training strategy results in the LW-Net model achieving a better performance in terms of estimation accuracy than other methods. Finally, we respectively run the proposed LW-Net model on the KITTI and Make3D datasets to conduct a comprehensive comparison with several state-of-the-art methods. The experimental results demonstrate that our method performs the best in terms of computational accuracy while utilizing the fewest parameters. Specifically, the model parameters of our method are reduced by 46.6%, the time cost is decreased by 7.69%, and the frame rate is increased by 5.19% compared with the existing state-of-the-art method.
Published: 2020

449. Adversarial Learning for Joint Optimization of Depth and Ego-Motion

Author: Shanshe Wang, Siwei Ma, Zhijun Fang, Yongbin Gao, Songchao Tan, Jenq-Neng Hwang, and Anjie Wang
Subjects: Monocular, Computer science, business.industry, Deep learning, Estimator, 02 engineering and technology, Computer Graphics and Computer-Aided Design, Backpropagation, Transformation (function), Depth map, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Visual odometry, business, Absolute scale, Software
Abstract: In recent years, supervised deep learning methods have shown a great promise in dense depth estimation. However, massive high-quality training data are expensive and impractical to acquire. Alternatively, self-supervised learning-based depth estimators can learn the latent transformation from monocular or binocular video sequences by minimizing the photometric warp error between consecutive frames, but they suffer from the scale ambiguity problem or have difficulty in estimating precise pose changes between frames. In this paper, we propose a joint self-supervised deep learning pipeline for depth and ego-motion estimation by employing the advantages of adversarial learning and joint optimization with spatial-temporal geometrical constraints. The stereo reconstruction error provides the spatial geometric constraint to estimate the absolute scale depth. Meanwhile, the depth map with an absolute scale and a pre-trained pose network serves as a good starting point for direct visual odometry (DVO). DVO optimization based on spatial geometric constraints can result in a fine-grained ego-motion estimation with the additional backpropagation signals provided to the depth estimation network. Finally, the spatial and temporal domain-based reconstructed views are concatenated, and the iterative coupling optimization process is implemented in combination with the adversarial learning for accurate depth and precise ego-motion estimation. The experimental results show superior performance compared with state-of-the-art methods for monocular depth and ego-motion estimation on the KITTI dataset and a great generalization ability of the proposed approach.
Published: 2020

450. Autonomous Reinforcement Control of Underwater Vehicles based on Monocular Depth Vision

Author: Xiaoling Liang, Pengli Zhu, Yao Shuhan, Siyuan Liu, and Yancheng Liu
Subjects: Scheme (programming language), 0209 industrial biotechnology, Monocular, Computer science, business.industry, 020208 electrical & electronic engineering, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 020901 industrial engineering & automation, Control and Systems Engineering, Control theory, 0202 electrical engineering, electronic engineering, information engineering, Reinforcement learning, RGB color model, Computer vision, Artificial intelligence, Image sensor, Underwater, Reinforcement, business, computer, computer.programming_language
Abstract: In this paper, a monocular depth prediction based end-to-end reinforcement control framework is proposed for autonomous control of underwater vehicles in the unknown environment. In the control framework, with the input of camera sensor RGB videos, a monocular depth prediction network is proposed to generate underwater depth images and a sequential reinforcement learning controller is also developed for autonomous obstacle-avoiding navigation and movement control. Simulated and experimental results demonstrate that the proposed control scheme can achieve remarkable performance on collision-avoidance navigation and autonomous control in the unknown environment.
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

9,636 results on '"monocular"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources