Descriptor: "Monocular depth estimation" / Language: undetermined - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Monocular depth estimation"' showing total 32 results

Start Over Descriptor "Monocular depth estimation" Language undetermined

32 results on '"Monocular depth estimation"'

1. Energy-Quality Scalable Monocular Depth Estimation on Low-Power CPUs

Author: Valentino Peluso, Fabio Tosi, Stefano Mattoccia, Antonio Cipolletta, Andrea Calimera, Filippo Aleotti, Matteo Poggi, Cipolletta, Antonio, Peluso, Valentino, Calimera, Andrea, Poggi, Matteo, Tosi, Fabio, Aleotti, Filippo, and Mattoccia, Stefano
Subjects: Computer Networks and Communications, Computer science, business.industry, Deep learning, Process (computing), deep learning, monocular depth estimation, Program optimization, Convolutional neural networks (CNNs), Convolutional neural network, Computer Science Applications, Computer engineering, Hardware and Architecture, Signal Processing, Scalability, Memory footprint, Monocular Depth Estimation, Energy-Quality Scaling, Embedded Systems, Low-Power CPUs, Convolutional Neural Networks, Deep Learning, embedded systems, Artificial intelligence, energy-quality scaling, low-power CPUs, business, Quantization (image processing), Information Systems, Efficient energy use
Abstract: The recent advancements in deep learning have demonstrated that inferring high-quality depth maps from a single image has become feasible and accurate, thanks to convolutional neural networks (CNNs), but how to process such compute- and memory-intensive models on portable and low-power devices remains a concern. Dynamic energy-quality scaling is an interesting yet less explored option in this field. It can improve efficiency through opportunistic computing policies where performances are boosted only when needed, achieving on average substantial energy savings. Implementing such a computing paradigm encompasses the availability of a scalable inference model, which is the target of this work. Specifically, we describe and characterize the design of an energy-quality scalable pyramidal network (EQPyD-Net), a lightweight CNN capable of modulating at runtime the computational effort with minimal memory resources. We describe the architecture of the network and the optimization flow, covering the important aspects that enable the dynamic scaling, namely, the optimized training procedures, the compression stage via fixed-point quantization, and the code optimization for the deployment on commercial low-power CPUs adopted in the edge segment. To assess the effect of the proposed design knobs, we evaluated the prediction quality on the standard KITTI data set and the energy and memory resources on the ARM Cortex-A53 CPU. The collected results demonstrate the flexibility of the proposed network and its energy efficiency. EQPyD-Net can be shifted across five operating points, ranging from a maximum accuracy of 82.2% with 0.4 Frame/J and up to 92.6% of energy savings with 6.1% of accuracy loss, still keeping a compact memory footprint of 5.2 MB for the weights and 38.3 MB (in the worst case) for the processing.
Published: 2022

2. SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings

Author: Lorenzo Papa, Edoardo Alati, Paolo Russo, and Irene Amerini
Subjects: General Computer Science, Computer vision, monocular depth estimation, fast-throughput, edge devices, fast-throughput, General Engineering, Computer vision, monocular depth estimation, General Materials Science, edge devices
Published: 2022

3. Improved deep depth estimation for environments with sparse visual cues

Author: Niclas Joswig, Juuso Autiosalo, Laura Ruotsalainen, University of Helsinki, Department of Electronics and Nanoengineering, Department of Mechanical Engineering, Aalto-yliopisto, Aalto University, Department of Computer Science, Spatiotemporal Data Analysis, Doctoral Programme in Computer Science, SUSTAINABLE URBAN DEVELOPMENT EMERGING FROM THE MERGER OF CUTTING-EDGE CLIMATE, SOCIAL AND COMPUTER SCIENCES, and Helsinki Institute of Sustainability Science (HELSUS)
Subjects: Deep Learning, Monocular depth, Hardware and Architecture, Monocular Depth Estimation, Computer vision, Deep learning, Computer Vision and Pattern Recognition, 113 Computer and information sciences, Visual SLAM, Software, Computer Science Applications
Abstract: Funding Information: This work has been supported by a donation from Konecranes, Finnish Center for Artificial Intelligence (FCAI), the University of Helsinki and Aalto University. Publisher Copyright: © 2022, The Author(s). Most deep learning-based depth estimation models that learn scene structure self-supervised from monocular video base their estimation on visual cues such as vanishing points. In the established depth estimation benchmarks depicting, for example, street navigation or indoor offices, these cues can be found consistently, which enables neural networks to predict depth maps from single images. In this work, we are addressing the challenge of depth estimation from a real-world bird’s-eye perspective in an industry environment which contains, conditioned by its special geometry, a minimal amount of visual cues and, hence, requires incorporation of the temporal domain for structure from motion estimation. To enable the system to incorporate structure from motion from pixel translation when facing context-sparse, i.e., visual cue sparse, scenery, we propose a novel architecture built upon the structure from motion learner, which uses temporal pairs of jointly unrotated and stacked images for depth prediction. In order to increase the overall performance and to avoid blurred depth edges that lie in between the edges of the two input images, we integrate a geometric consistency loss into our pipeline. We assess the model’s ability to learn structure from motion by introducing a novel industry dataset whose perspective, orthogonal to the floor, contains only minimal visual cues. Through the evaluation with ground truth depth, we show that our proposed method outperforms the state of the art in difficult context-sparse environments.
Published: 2023

4. Monocular Depth Estimation for Tilted Images via Gravity Rectifier

Author: Saito, Yuki, Saito, Hideo, Frémont, Vincent, Keio University, Autonomie des Robots et Maîtrise des interactions avec l’ENvironnement (LS2N - équipe ARMEN), Laboratoire des Sciences du Numérique de Nantes (LS2N), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-École Centrale de Nantes (Nantes Univ - ECN), Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST), Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique), Nantes Université (Nantes Univ), IUEP EU-Japan, MEXT and Erasmus+ programme, Petia Radeva, Giovanni Maria Farinella, and Kadi Bouatouch
Subjects: tilted images, convolutional neural network, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], monocular depth estimation, gravity prediction, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: International audience; Monocular depth estimation is a challenging task in computer vision. Although many approaches using Convolutional neural networks (CNNs) have been proposed, most of them are trained on large-scale datasets mainly composed of gravity-aligned images. Therefore, conventional approaches fail to predict reliable depth for tilted images containing large pitch and roll camera rotations. To tackle this problem, we propose a novel refining method based on the distribution of gravity directions in the training sets. We designed a gravity rectifier that is learned to transform the gravity direction of a tilted image into a rectified one that matches the gravity-aligned training data distribution. For the evaluation, we employed public datasets and also created our own dataset composed of large pitch and roll camera movements. Our experiments showed that our approach successfully rectified the camera rotation and outperformed our baselines, which achieved 29% improvement in abs rel over the vanilla model. Additionally, our method had competitive accuracy comparable to state-ofthe-art monocular depth prediction approaches considering camera rotation.
Published: 2023

5. Partial 3D-reconstruction of the colon from monoscopic colonoscopy videos using shape-from-motion and deep learning

Author: Walluscheck, S., Wittenberg, T., Bruns, V., Eixelberger, T., Hackner, R., and Publica
Subjects: colonoscopy, Biomedical Engineering, Medicine, panorama, monocular depth estimation, endoscopy
Abstract: For the image-based documentation of a colonoscopy procedure, a 3D-reconstuction of the hollow colon structure from endoscopic video streams is desirable. To obtain this reconstruction, 3D information about the colon has to be extracted from monocular colonoscopy image sequences. This information can be provided by estimating depth through shape-from-motion approaches, using the image information from two successive image frames and the exact knowledge of their disparity. Nevertheless, during a standard colonoscopy the spatial offset between successive frames is continuously changing. Thus, in this work deep convolutional neural networks (DCNNs) are applied in order to obtain piecewise depth maps and point clouds of the colon. These pieces can then be fused for a partial 3D reconstruction.
Published: 2021

6. Supervised Object-Specific Distance Estimation from Monocular Images for Autonomous Driving

Author: Yury Davydov, Wen-Hui Chen, and Yu-Chen Lin
Subjects: monocular depth estimation, autonomous driving, computer vision, convolutional neural networks, Automobile Driving, Motor Vehicles, Motorcycles, Humans, Electrical and Electronic Engineering, Biochemistry, Instrumentation, Atomic and Molecular Physics, and Optics, Analytical Chemistry
Abstract: Accurate distance estimation is a requirement for advanced driver assistance systems (ADAS) to provide drivers with safety-related functions such as adaptive cruise control and collision avoidance. Radars and lidars can be used for providing distance information; however, they are either expensive or provide poor object information compared to image sensors. In this study, we propose a lightweight convolutional deep learning model that can extract object-specific distance information from monocular images. We explore a variety of training and five structural settings of the model and conduct various tests on the KITTI dataset for evaluating seven different road agents, namely, person, bicycle, car, motorcycle, bus, train, and truck. Additionally, in all experiments, a comparison with the Monodepth2 model is carried out. Experimental results show that the proposed model outperforms Monodepth2 by 15% in terms of the average weighted mean absolute error (MAE).
Published: 2022

7. A Self-Supervised Monocular Depth Estimation Approach Based on UAV Aerial Images

Author: Zhang, Yuhang, Yu, Qing, Low Kin Huat, Lv, Chen, School of Mechanical and Aerospace Engineering, 2022 IEEE/AIAA 41st Digital Avionics Systems Conference (DASC), and Air Traffic Management Research Institute
Subjects: Self-Supervised Learning, Monocular Depth Estimation, Mechanical engineering [Engineering], Multi-Scale Upsampling, Aerial Images, Unmanned Aerial Vehicles
Abstract: The Unmanned Aerial Vehicles (UAVs) have gained increasing attention recently, and depth estimation is one of the essential tasks for the safe operation of UAVs, especially for drones at low altitudes. Considering the limitations of UAVs’ size and payload, innovative methods combined with deep learning techniques have taken the place of traditional sensors to become the mainstream for predicting per-pixel depth information. Since supervised depth estimation methods require a massive amount of depth ground truth as the supervisory signal. This article proposes an unsupervised framework to tackle the issue of predicting the depth map given a sequence of monocular images. Our model can solve the problem of scale ambiguity by training the depth subnetwork jointly with the pose subnetwork. Moreover, we introduce a modified loss function that utilizes a weighted photometric loss combined with the edge-aware smoothness loss to optimize the training. The evaluation results are compared with the model without weighted loss and other unsupervised monocular depth estimation models (Monodepth and Monodepth2). Our model shows better performance than the others, indicating potential assistance in enhancing the capability of UAVs to estimate distance with the surrounding environment. Civil Aviation Authority of Singapore (CAAS) Submitted/Accepted version This research is supported by the National Research Foundation, Singapore, and the Civil Aviation Authority of Singapore, under the Aviation Transformation Programme.
Published: 2022

8. Depth estimation from a single SEM image using pixel-wise fine-tuning with multimodal data

Author: Tim Houben, Thomas Huisman, Maxim Pisarenco, Fons van der Sommen, Peter H. N. de With, and Video Coding & Architectures
Subjects: Domain adaptation, Weakly supervised learning, Hardware and Architecture, Computer Vision and Pattern Recognition, Scatterometry, Optical critical dimension, Software, Monocular depth estimation, SEM images, Computer Science Applications
Abstract: To support the ongoing size reduction in integrated circuits, the need for accurate depth measurements of on-chip structures becomes increasingly important. Unfortunately, present metrology tools do not offer a practical solution. In the semiconductor industry, critical dimension scanning electron microscopes (CD-SEMs) are predominantly used for 2D imaging at a local scale. The main objective of this work is to investigate whether sufficient 3D information is present in a single SEM image for accurate surface reconstruction of the device topology. In this work, we present a method that is able to produce depth maps from synthetic and experimental SEM images. We demonstrate that the proposed neural network architecture, together with a tailored training procedure, leads to accurate depth predictions. The training procedure includes a weakly supervised domain adaptation step, which is further referred to as pixel-wise fine-tuning. This step employs scatterometry data to address the ground-truth scarcity problem. We have tested this method first on a synthetic contact hole dataset, where a mean relative error smaller than 6.2% is achieved at realistic noise levels. Additionally, it is shown that this method is well suited for other important semiconductor metrics, such as top critical dimension (CD), bottom CD and sidewall angle. To the extent of our knowledge, we are the first to achieve accurate depth estimation results on real experimental data, by combining data from SEM and scatterometry measurements. An experiment on a dense line space dataset yields a mean relative error smaller than 1%.
Published: 2022

9. Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability

Author: Jason Y. K. Chan, Zhiyu Liu, Ka-Wai Kwok, Hon-Sing Tong, Po-Ling Chan, Justin D. L. Ho, and Yui-Lun Ng
Subjects: Computer science, Transnasal surgery, 0206 medical engineering, Video Recording, Biomedical Engineering, Stability (learning theory), Health Informatics, Augmented reality, Surgical annotation, 02 engineering and technology, Imaging phantom, 030218 nuclear medicine & medical imaging, Domain (software engineering), 03 medical and health sciences, Annotation, Imaging, Three-Dimensional, 0302 clinical medicine, Monitoring, Intraoperative, Cadaver, Image Processing, Computer-Assisted, Humans, Radiology, Nuclear Medicine and imaging, Computer vision, Point (geometry), Ground truth, Monocular, Phantoms, Imaging, business.industry, Reproducibility of Results, Endoscopy, General Medicine, 020601 biomedical engineering, Computer Graphics and Computer-Aided Design, Computer Science Applications, Calibration, Original Article, Surgery, Domain transfer learning, Computer Vision and Pattern Recognition, Artificial intelligence, Tomography, X-Ray Computed, business, Monocular depth estimation
Abstract: PurposeSurgical annotation promotes effective communication between medical personnel during surgical procedures. However, existing approaches to 2D annotations are mostly static with respect to a display. In this work, we propose a method to achieve 3D annotations that anchor rigidly and stably to target structures upon camera movement in a transnasal endoscopic surgery setting.MethodsThis is accomplished through intra-operative endoscope tracking and monocular depth estimation. A virtual endoscopic environment is utilized to train a supervised depth estimation network. An adversarial network transfers the style from the real endoscopic view to a synthetic-like view for input into the depth estimation network, wherein framewise depth can be obtained in real time.Results(1) Accuracy: Framewise depth was predicted from images captured from within a nasal airway phantom and compared with ground truth, achieving a SSIM value of 0.8310 ± 0.0655. (2) Stability: mean absolute error (MAE) between reference and predicted depth of a target point was 1.1330 ± 0.9957 mm.ConclusionBoth the accuracy and stability evaluations demonstrated the feasibility and practicality of our proposed method for achieving 3D annotations.
Published: 2021

10. Monocular Depth Estimation for 3D Map Construction at Underground Parking Structures

Author: Jingwen Li, Xuedong Song, Ruipeng Gao, and Dan Tao
Subjects: Computer Networks and Communications, Hardware and Architecture, Control and Systems Engineering, Signal Processing, monocular depth estimation, 3D scene map, geometric consistency, Electrical and Electronic Engineering
Abstract: Converting the actual scenes into three-dimensional models has inevitably become one of the fundamental requirements in autonomous driving. At present, the main obstacle to large-scale deployment is the high-cost lidar for environment sensing. Monocular depth estimation aims to predict the scene depth and construct a 3D map via merely a monocular camera. In this paper, we add geometric consistency constraints to address the non-Lambertian surface problems in depth estimation. We also utilize the imaging principles and conversion rules to produce a 3D scene model from multiple images. We built a prototype and conduct extensive experiments in a corridor and an underground parking structure, and the results show the effectiveness for indoor location-based services.
Published: 2023

11. Deep Monocular Depth Estimation Based on Content and Contextual Features

Author: Saddam Abdulwahab, Hatem A. Rashwan, Najwa Sharaf, Saif Khalid, and Domenec Puig
Subjects: deep learning, monocular depth estimation, autoencoder network, contextual semantic information, Electrical and Electronic Engineering, Biochemistry, Instrumentation, Atomic and Molecular Physics, and Optics, Analytical Chemistry
Abstract: Recently, significant progress has been achieved in developing deep learning-based approaches for estimating depth maps from monocular images. However, many existing methods rely on content and structure information extracted from RGB photographs, which often results in inaccurate depth estimation, particularly for regions with low texture or occlusions. To overcome these limitations, we propose a novel method that exploits contextual semantic information to predict precise depth maps from monocular images. Our approach leverages a deep autoencoder network incorporating high-quality semantic features from the state-of-the-art HRNet-v2 semantic segmentation model. By feeding the autoencoder network with these features, our method can effectively preserve the discontinuities of the depth images and enhance monocular depth estimation. Specifically, we exploit the semantic features related to the localization and boundaries of the objects in the image to improve the accuracy and robustness of the depth estimation. To validate the effectiveness of our approach, we tested our model on two publicly available datasets, NYU Depth v2 and SUN RGB-D. Our method outperformed several state-of-the-art monocular depth estimation techniques, achieving an accuracy of 85%, while minimizing the error Rel by 0.12, RMS by 0.523, and log10 by 0.0527. Our approach also demonstrated exceptional performance in preserving object boundaries and faithfully detecting small object structures in the scene.
Published: 2023

12. Self-Supervised Monocular Depth Estimation With Extensive Pretraining

Author: Hyukdoo Choi
Subjects: Estimation, Monocular, General Computer Science, Computer science, Sensing applications, business.industry, Supervised learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, General Engineering, Optical flow, Reprojection error, unsupervised learning, TK1-9971, Lidar, depth prediction, convolutional neural networks, self-supervised learning, Key (cryptography), General Materials Science, Computer vision, Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, Monocular depth estimation
Abstract: Although depth estimation is a key technology for three-dimensional sensing applications involving motion, active sensors such as LiDAR and depth cameras tend to be expensive and bulky. Here, we explore the potential of monocular depth estimation (MDE) based on a self-supervised approach. MDE is a promising technology, but supervised learning suffers from a need for accurate ground-truth depth data. Recent studies have enabled self-supervised training on an MDE model with only monocular image sequences and image-reconstruction errors. We pretrained networks using multiple datasets, including monocular and stereo image sequences. The main challenges posed by the self-supervised MDE model were occlusions and dynamic objects. We proposed novel loss functions to handle these problems in the form of min-over-all and min-with-flow losses, both based on the per-pixel minimum reprojection error of Monodepth2 and extended to stereo images and optical flow. With extensive pretraining and novel losses, our model outperformed existing unsupervised approaches in quantitative depth estimation and the ability to distinguish small objects against a background, as evaluated by KITTI 2015.
Published: 2021

13. Monocular Depth Estimation Based on Multi-Scale Depth Map Fusion

Author: Xin Yang, Chang Qingling, Yan Cui, Siyuan He, and Liu Xinglin
Subjects: General Computer Science, Machine vision, Computer science, Feature extraction, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, depth adaptive fusion module, Depth map, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Computer vision, Pyramid (image processing), indoor, 0105 earth and related environmental sciences, Monocular, business.industry, General Engineering, multi-scale depth maps, dense feature fusion network, TK1-9971, Feature (computer vision), 020201 artificial intelligence & image processing, Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, Scale (map), Focus (optics), Monocular depth estimation
Abstract: Monocular depth estimation is a basic task in machine vision. In recent years, the performance of monocular depth estimation has been greatly improved. However, most depth estimation networks are based on a very deep network to extract features that lead to a large amount of information lost. The loss of object information is particularly serious in the encoding and decoding process. This information loss leads to the estimated depth maps lacking object structure detail and have non-clear edges. Especially in a complex indoor environment, which is our research focus in this paper, the consequences of this loss of information are particularly serious. To solve this problem, we propose a Dense feature fusion network that uses a feature pyramid to aggregate various scale features. Furthermore, to improve the fusion effectiveness of decoded object contour information and depth information, we propose an adaptive depth fusion module, which allows the fusion network to fuse various scale depth maps adaptively to increase object information in the predicted depth map. Unlike other work predicting depth maps relying on U-NET architecture, our depth map predicted by fusing multi-scale depth maps. These depth maps have their own characteristics. By fusing them, we can estimate depth maps that not only include accurate depth information but also have rich object contour and structure detail. Experiments indicate that the proposed model can predict depth maps with more object information than other prework, and our model also shows competitive accuracy. Furthermore, compared with other contemporary techniques, our method gets state-of-the-art in edge accuracy on the NYU Depth V2 dataset.
Published: 2021

14. Joint Attention Mechanisms for Monocular Depth Estimation With Multi-Scale Convolutions and Adaptive Weight Adjustment

Author: Zonghua Zhang, Zhaozong Meng, Nan Gao, and Peng Liu
Subjects: 0209 industrial biotechnology, multi-scale convolutions, General Computer Science, Channel (digital image), Computer science, Feature extraction, 02 engineering and technology, Convolutional neural network, Field (computer science), 020901 industrial engineering & automation, Dimension (vector space), 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Computer vision, Image resolution, Block (data storage), Monocular, business.industry, General Engineering, joint attention mechanisms, Feature (computer vision), weight adjustment, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971, Monocular depth estimation
Abstract: Monocular depth estimation is a fundamental problem for various vision applications, and is therefore gaining increasing attention in the field of computer vision. Though a great improvement has been made thanks to the rapid progress of deep convolutional neural networks, depth estimation of the object at finer details remains an unsatisfactory issue, especially in complex scenes that has rich structure information. In this article, we proposed a deep end-to-end learning framework with the combination of multi-scale convolutions and joint attention mechanisms to tackle this challenge. Specifically, we firstly elaborately designed a lightweight up-convolution to generate multi-scale feature maps. Then we introduced an attention-based residual block to aggregate different feature maps in joint channel and spatial dimension, which could enhance the discriminant ability of feature fusion at finer details. Furthermore, we explored an effective adaptive weight adjustment strategy for the loss function to further improve the performance, which adjusts the weight of each loss term during training without additional hyper-parameters. The proposed framework was evaluated using challenging NYU Depth v2 and KITTI datasets. Experimental results demonstrated that the proposed approach is superior to most of the state-of-the-art methods.
Published: 2020

15. Leveraging Contextual Information for Monocular Depth Estimation

Author: Doyeon Kim, Sihaeng Lee, Janghyeon Lee, and Junmo Kim
Subjects: Estimation, contextual information, Monocular, General Computer Science, Computer science, business.industry, General Engineering, Contextual information, General Materials Science, Computer vision, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971, Monocular depth estimation
Abstract: Humans strongly rely on visual cues to understand scenes such as segmenting, detecting objects, or measuring the distance from nearby objects. Recent studies suggest that deep neural networks can take advantage of contextual representation for the estimation of a depth map for a given image. Therefore, focusing on the scene context can be beneficial for successful depth estimation. In this study, a novel network architecture is proposed to improve the performance by leveraging the contextual information for monocular depth estimation. We introduce a depth prediction network with the proposed attentive skip connection and a global context module, to obtain meaningful semantic features and enhance the performance of the model. Furthermore, our model is validated through several experiments on the KITTI and NYU Depth V2 datasets. The experimental results demonstrate the effectiveness of the proposed network, which achieves a state-of-the-art monocular depth estimation performance while maintaining a high running speed.
Published: 2020

16. Transfer2Depth: Dual Attention Network With Transfer Learning for Monocular Depth Estimation

Author: Chuan-Yu Chang, Yao-Pao Huang, Chia-Hung Yeh, and Chih-Yang Lin
Subjects: 0209 industrial biotechnology, General Computer Science, Computer science, monocular depth estimation, 02 engineering and technology, transfer learning, Machine learning, computer.software_genre, Ordinal regression, Convolutional neural network, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Representation (mathematics), Network architecture, Monocular, business.industry, Deep learning, General Engineering, deep learning, Computer vision, 020201 artificial intelligence & image processing, spatial-channel attention module, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, Transfer of learning, business, lcsh:TK1-9971, computer
Abstract: Monocular depth estimation poses a fundamental problem in many tasks. Although recent convolutional neural network-based methods can achieve high accuracy with very deep networks and complex architectures to exploit different cues and features, doing so not only increases the vulnerability of the model, but also increases the difficulty of convergence. Moreover, recent depth estimation methods for indoor environments are impractical for outdoor environments. In this work, we aim to develop a simple deep network structure to improve model effectiveness for depth estimation. We apply a dual attention module that can be inserted into any type of network to improve the power of representation, and additionally propose a training strategy which combines transfer learning and ordinal regression to improve training convergence. Even with a simple end-to-end encoder-decoder type of network architecture, we are able to achieve state-of-the-art performance on two of the biggest datasets for indoor and outdoor depth estimation: NYU Depth v2 and KITTI.
Published: 2020

17. Unsupervised Monocular Training Method for Depth Estimation Using Statistical Masks

Author: Xiangtong Wang, Peng Cheng, Wei Li, Menglong Yang, and Binbin Liang
Subjects: General Computer Science, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, statistical masks, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Segmentation, error map, 0105 earth and related environmental sciences, Estimation, Monocular, Pixel, business.industry, General Engineering, Pattern recognition, Variance (accounting), Training methods, Computer Science::Computer Vision and Pattern Recognition, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, problematic pixels, business, lcsh:TK1-9971, Monocular depth estimation
Abstract: Recently, unsupervised monocular training methods based on convolutional neural networks have already shown surprisingly progress in improving the accuracy of depth estimation. However, the performance of these methods suffers deeply from problematic pixels such as occluded pixels, low-texture pixels, and so on. In this paper, we introduce a method to a mask by the statistic of error maps for segmenting the problematic pixels. Different from the conventional methods which use additional segmentation networks to classify problematic pixels, we use a multi-task learning architecture to generate identical mask, mean mask, and variance mask for filtering the problematic pixels. Experimental results show that our proposed method has satisfactory performance compared with other relative methods on the KITTI dataset. Moreover, we also apply our method to the UAV dataset VisDrone, and the results also indicate the effectiveness of the method in detecting moving objects.
Published: 2020

18. LW-Net: A Lightweight Network for Monocular Depth Estimation

Author: Cheng Feng, Congxuan Zhang, Zhen Chen, Ming Li, Hao Chen, and Bingbing Fan
Subjects: iterative decoder, General Computer Science, Computer science, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, self-supervised learning, convolutional neural networks, Pyramid, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, Pyramid (image processing), lightweight, 0105 earth and related environmental sciences, Monocular, General Engineering, Function (mathematics), Construct (python library), Frame rate, Robot, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, lcsh:TK1-9971, Encoder, Algorithm, Decoding methods, Monocular depth estimation
Abstract: Existing self-supervised monocular depth estimation methods usually explore increasingly large networks to achieve accurate estimation results. However, larger networks are more difficult to train and require more storage space. To balance the network size and the computational accuracy, we propose in this article a compact lightweight network for monocular depth estimation, named LW-Net. First, we construct a compact network by designing an iterative decoder with shared weights and a lightweight pyramid encoder. The proposed network includes significantly fewer parameters than most of the existing monocular depth estimation networks. Second, we exploit a self-supervised training strategy by combining the proposed LW-Net model with a pose network, and we then use a hybrid loss function to train the decoder and encoder separately. The proposed training strategy results in the LW-Net model achieving a better performance in terms of estimation accuracy than other methods. Finally, we respectively run the proposed LW-Net model on the KITTI and Make3D datasets to conduct a comprehensive comparison with several state-of-the-art methods. The experimental results demonstrate that our method performs the best in terms of computational accuracy while utilizing the fewest parameters. Specifically, the model parameters of our method are reduced by 46.6%, the time cost is decreased by 7.69%, and the frame rate is increased by 5.19% compared with the existing state-of-the-art method.
Published: 2020

19. Lightweight Monocular Depth with a Novel Neural Architecture Search Method

Author: Lam Huynh, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila, Tampere University, and Computing Sciences
Subjects: FOS: Computer and information sciences, neural architecture search, Computer Vision and Pattern Recognition (cs.CV), assisted Tabu search, Computer Science - Computer Vision and Pattern Recognition, monocular depth estimation, 113 Computer and information sciences, lightweight
Abstract: This paper presents a novel neural architecture search method, called LiDNAS, for generating lightweight monocular depth estimation models. Unlike previous neural architecture search (NAS) approaches, where finding optimized networks are computationally highly demanding, the introduced novel Assisted Tabu Search leads to efficient architecture exploration. Moreover, we construct the search space on a pre-defined backbone network to balance layer diversity and search space size. The LiDNAS method outperforms the state-of-the-art NAS approach, proposed for disparity and depth estimation, in terms of search efficiency and output model performance. The LiDNAS optimized models achieve results superior to compact depth estimation state-of-the-art on NYU-Depth-v2, KITTI, and ScanNet, while being 7%-500% more compact in size, i.e the number of model parameters., Comment: 11 pages, 10 figures
Published: 2022

20. Monocular Depth Estimation for Autonomous Driving

Author: Gurram, Akhil, López Peña, Antonio M. (Antonio Manuel), and Urfalioglu, Onay
Subjects: Estimació monocular de profunditat, Conducció autònoma, Tecnologies, Conducción autónoma, Aprendizaje máquina profundo, Autonomous driving, Deep learning, Estimación monocular de profundidad, Aprenentatge màquina profund, Monocular depth estimation
Abstract: La informació geomètrica 3D és essencial per percebre l’entorn des d’un vehicle autònom (VA) o assistit. Per això, estan equipats amb sensors calibrats. Podem trobar sensors LiDAR que proporcionen aquesta informació 3D, encara que són relativament costosos. Depenent de les condicions operatives del VA, els sistemes estereoscòpics també poden ser suficients per obtenir informació 3D, i són sistemes més barats i fàcils d’instal·lar. Tot i així, assegurar un correcte manteniment i calibratge d’aquest tipus de sensors no és trivial. En conseqüència, hi ha un interès creixent a fer una estimació monocular de la profunditat (EMP) per obtenir informació 3D. L’EMP permet que l’aparença visual i el 3D es corresponguin a nivell de píxel sense un calibratge addicional. Un conjunt de càmeres individuals amb capacitat d’EMP seria una solució barata per a la percepció des d’un VA, relativament fàcil d’integrar i mantenir. Els millors models EMP es basen en xarxes neuronals convolucionals entrenades de manera supervisada. En conseqüència, l’objectiu general d’aquesta tesi doctoral és estudiar mètodes per millorar la precisió d’aquests models en diferents circumstàncies pràctiques que trobem en l’entrenament. Més concretament, aquesta tesi aborda les diferents qüestions que es descriuen a continuació. A l’inici d’aquesta tesi, una línia de treball prometedora per entrenar models d’EMP consistia a utilitzar la supervisió semàntica basada en imatges i la supervisió de profunditat basada en LiDAR. Se suposava que les mateixes dades d’entrenament tenien tots dos tipus de supervisió associada, és a dir, meta-informació de profunditat i semàntica. No obstant això, a la pràctica, era més comú trobar conjunts de dades amb només supervisió de profunditat o només semàntica. Per tant, el nostre primer treball va ser investigar si podíem entrenar models d’EMP aprofitant informació de profunditat i semàntica provinent de conjunts de dades diferents i heterogenis. Demostrem que això és possible, i superem els resultats d’avantguarda a l’EMP d’aquell moment. Per això, vam proposar un nou protocol d’entrenament per als models EMP. Aquesta investigació també va deixar clar que la supervisió basada en LiDAR és la que dóna lloc a models més precisos d’EMP. Tot i això, seria més barat i escalable si poguéssim entrenar aquests models a partir de seqüències monoculars. Això és molt més complex ja que requereix utilitzar els principis que permeten inferir estructura a partir del moviment (SfM en anglès), generant així auto-supervisió. No obstant això, molts problemes pràctics disminueixen la utilitat d’aquest tipus d’auto-supervisió. Per alleujar aquests problemes entrenem models d’EMP mitjançant la supervisió d’imatges virtuals amb informació de profunditat associada i auto-supervisió via SfM de seqüències monoculars reals. Anomenem la nostra proposta com MonoDEVSNet . MonoDEVSNet va superar la precisió d’altres models d’avantguarda també entrenats en seqüències monoculars i, fins i tot, estèreo. Finalment, atès que l’EMP s’aplica per obtenir 3D que serà utilitzat en tasques posteriors de percepció, també abordem la qüestió de si les mètriques estàndard per a l’avaluació de models EMP són realment un bon indicador per a aquestes tasques futures. Utilitzant la detecció d’objectes en núvols de punts 3D com a exemple de percepció, arribem a la conclusió que, de fet, les mètriques d’avaluació d’EMP donen lloc a una classificació de mètodes que reflecteix relativament els resultats esperables en detecció 3D d’objectes. La información geométrica 3D es esencial para percibir el entorno desde un vehículo autónomo (VA) o asistido. Para ello, están equipados con sensores calibrados. Podemos encontrar sensores LiDAR que proporcionan esa información 3D, aunque son relativamente costosos. Dependiendo de las condiciones operativas del VA, los sistemas estereoscópicos también pueden ser suficientes para obtener información 3D, siendo sistemas más baratos y fáciles de instalar. Sin embargo, asegurar un correcto mantenimiento y calibración de este tipo de sensores no es trivial. En consecuencia, existe un interés creciente en realizar una estimación monocular de la profundidad (EMP) para obtener información 3D. La EMP permite que la apariencia visual y el 3D se correspondan a nivel de píxel sin una calibración adicional. Un conjunto de cámaras individuales con capacidad de EMP sería una solución barata para la percepción desde un VA, relativamente fácil de integrar y mantener. Los mejores modelos de EMP se basan en redes neuronales convolucionales entrenadas de manera supervisada. En consecuencia, el objetivo general de esta tesis doctoral es estudiar métodos para mejorar la precisión de esos modelos en diferentes circunstancias prácticas que encontramos al realizar su entrenamiento. Más concretamente, esta tesis aborda las diferentes cuestiones que se describen a continuación. Al inicio de esta tesis, una línea de trabajo prometedora para entrenar modelos de EMP consistía en utilizar la supervisión semántica basada en imágenes y supervisión de profundidad basada en LiDAR. Se suponía que los mismos datos de entrenamiento tenían ambos tipos de supervisión asociada, es decir, metainformación de profundidad y semántica. Sin embargo, en la práctica, era más común encontrar conjuntos de datos con solo supervisión de profundidad o solo semántica. Por lo tanto, nuestro primer trabajo fue investigar si podíamos entrenar modelos de EMP aprovechando información de profundidad y semántica proveniente de conjuntos de datos distintos y heterogéneos. Demostramos que esto es posible, y superamos los resultados de vanguardia en EMP de aquel momento. Para ello, propusimos un nuevo protocolo de entrenamiento para los modelos EMP. Esta investigación también dejó claro que la supervisión basada en LiDAR es la que da lugar a modelos de EMP más precisos. Sin embargo, sería más barato y escalable si pudiéramos entrenar esos modelos a partir de secuencias monoculares. Esto es mucho más complejo ya que requiere utilizar los principios que permiten inferir estructura a partir del movimiento (SfM en inglés), generando así autosupervisión. Sin embargo, numerosos problemas prácticos disminuyen la utilidad de este tipo de autosupervisión. Para aliviar estos problemas, entrenamos modelos de EMP mediante supervisión de imágenes virtuales con información de profundidad asociada y autosupervisión vía SfM de secuencias monoculares reales. A nuestra propuesta la llamamos MonoDEVSNet . MonoDEVSNet superó la precisión de otros modelos de vanguardia también entrenados en secuencias monoculares e incluso estéreo. Finalmente, dado que la EMP se aplica para obtener 3D que será utilizado en tareas posteriores de percepción, también abordamos la cuestión de si las métricas estándar para la evaluación de modelos EMP son realmente un buen indicador para esas futuras tareas. Utilizando la detección de objetos en nubes de puntos 3D como ejemplo de percepción, llegamos a la conclusión de que, de hecho, las métricas de evaluación EMP dan lugar a una clasificación de métodos que refleja relativamente bien los resultados esperables en detección 3D de objetos. 3D geometric information is essential for on-board perception in autonomous driving and driver assistance. Autonomous vehicles (AVs) are equipped with calibrated sensor suites. As part of these suites, we can find LiDARs, which are expensive active sensors in charge of providing the 3D geometric information. Depending on the operational conditions for the AV, calibrated stereo rigs may be also sufficient for obtaining 3D geometric information, being these rigs less expensive and easier to install than LiDARs. However, ensuring a proper maintenance and calibration of these types of sensors is not trivial. Accordingly, there is an increasing interest on performing monocular depth estimation (MDE) to obtain 3D geometric information on-board. MDE is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. Moreover, a set of single cameras with MDE capabilities would still be a cheap solution for on-board perception, relatively easy to integrate and maintain in an AV. Best MDE models are based on Convolutional Neural Networks (CNNs) trained in a supervised manner, i.e., assuming pixelwise ground truth (GT). Accordingly, the overall goal of this PhD is to study methods for improving CNN-based MDE accuracy under different training settings. More specifically, this PhD addresses different research questions that are described below. When we started to work in this PhD, state-of-the-art methods for MDE were already based on CNNs. In fact, a promising line of work consisted in using image-based semantic supervision (i.e., pixel-level class labels) while training CNNs for MDE using LiDAR-based supervision (i.e., depth). It was common practice to assume that the same raw training data are complemented by both types of supervision, i.e., with depth and semantic labels. However, in practice, it was more common to find heterogeneous datasets with either only depth supervision or only semantic supervision. Therefore, our first work was to research if we could train CNNs for MDE by leveraging depth and semantic information from heterogeneous datasets. We show that this is indeed possible, and we surpassed the state-of-the-art results on MDE at the time we did this research. To achieve our results, we proposed a particular CNN architecture and a new training protocol. After this research, it was clear that the upper-bound setting to train CNN-based MDE models consists in using LiDAR data as supervision. However, in would be cheaper and more scalable if we would be able to train such models from monocular sequences. Obviously, this is far more challenging, but worth to research. Training MDE models using monocular sequences is possible by relying on structure-from-motion (SfM) principles to generate self-supervision. Nevertheless, problems of camouflaged objects, visibility changes, static-camera intervals, textureless areas, and scale ambiguity, diminish the usefulness of such self-supervision. To alleviate these problems, we perform MDE by virtual-world supervision and real-world SfM self-supervision. We call our proposal MonoDEVSNet. We compensate the SfM self-supervision limitations by leveraging virtual-world images with accurate semantic and depth supervision, as well as addressing the virtual-to-real domain gap. MonoDEVSNet outperformed previous MDE CNNs trained on monocular and even stereo sequences. We have publicly released MonoDEVSNet at . Finally, since MDE is performed to produce 3D information for being used in downstream tasks related to on-board perception. We also address the question of whether the standard metrics for MDE assessment are a good indicator for future MDE-based driving-related perception tasks. By using 3D object detection on point clouds as proxy of on-board perception, we conclude that, indeed, MDE evaluation metrics give rise to a ranking of methods which reflects relatively well the 3D object detection results we may expect. Universitat Autònoma de Barcelona. Programa de Doctorat en Informàtica
Published: 2022

21. Depth from Mono Accuracy Analysis by Changing Camera Parameters in the CARLA Simulator

Author: Ivan Marković, Juraj Peršić, Ivan Petrović, and Zvonimir Grskovic
Subjects: Monocular depth estimation, Self-supervised training, CARLA simulator, Artificial neural network, business.industry, Computer science, Deep learning, Network on, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Robotics, Extrinsic calibration, Task (computing), Single camera, Vehicle frame, Artificial intelligence, business, Simulation
Abstract: Depth estimation is an important task in robotics and autonomous driving. By estimating depth and relying only on a single camera, it is no longer necessary to add and calibrate additional sensors – usually a second camera. However, such an approach requires training on extensive datasets and obtaining real-world datasets is time consuming and costly. Given that, using photorealistic simulators can be beneficial, since a multitude of various scenes can be created. In this paper we present an approach to training a deep neural network based on the ResNet architecture for estimating depth from a single camera. We target road vehicle scenes and use the CARLA simulator. We evaluate the trained network on the real-world KITTI dataset images and in the CARLA simulator. In the simulated experiments, we compare the performance with respect to the changes in camera intrinsic and extrinsic calibration parameters with respect to the ego vehicle frame.
Published: 2021

22. Gaussian Weighted Deep Modeling for Improved Depth Estimation in Monocular Images

Author: Jianmin Jiang, Ehab H. El-Shazly, and Xiaoyan Zhang
Subjects: Estimation, Monocular, General Computer Science, Computer science, business.industry, Gaussian, General Engineering, monocular depth estimation, Gaussian weighted deep modelling, symbols.namesake, Deep learning applications, symbols, General Materials Science, Computer vision, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971
Abstract: Capturing RGB images and estimating their corresponding depth data for training deep models is a challenging task. Several deep network models have been recently reported to formulate the depth estimation process as an image reconstruction problem, in order to overcome the difficulty of scarcity of ground truth depth. These deep network models have multiple design decisions and parameters that are selected empirically, failing to capture the varying nature of the input and hence the adaptability is limited. In this paper, we propose an automatically Gaussian weighted deep model to achieve improved solutions for the problem of monocular depth estimation. In comparison with the existing state of the arts, our proposed very deep model is supported by novel components, including a hybrid and integrated loss function and a fine training strategy. The hybrid and integrated loss function maintains the balance between appropriate assessments of perceptual similarity and modest resilience for both small and large scale errors, where different loss terms are automatically weighted and hence their integration is optimized via a Gaussian distribution based modelling. The fine training strategy is proposed to adaptively screen all the training images via an error clustering mechanism to sustain an effective and efficient training process. Extensive experiments are carried out and the results show that our proposed outperforms the compared seven benchmarks, representative of the existing state of the arts, across all the assessment metrics.
Published: 2019

23. An Adaptive Unsupervised Learning Framework for Monocular Depth Estimation

Author: Xiafu Peng, Xunyu Zhong, Lin Lixiong, and Delong Yang
Subjects: Ground truth, Monocular, General Computer Science, Pixel, Computer science, business.industry, image sequences, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, General Engineering, monocular depth estimation, unsupervised learning, Convolutional neural network, image processing, Image (mathematics), Constraint (information theory), Computer Science::Computer Vision and Pattern Recognition, adaptive algorithm, Unsupervised learning, General Materials Science, Computer vision, lcsh:Electrical engineering. Electronics. Nuclear engineering, Artificial intelligence, business, lcsh:TK1-9971, Neural networks
Abstract: Depth estimation from a single image plays an important role in 3D scene perception. Owing to the development of deep convolutional neural networks (CNNs), monocular depth estimation models have achieved a large number of exciting results. However, the requirement for the manual per-pixel labeled dataset (ground truth) limits the application of these supervised methods. Basing on a geometric constraint between the consecutive stereo images, we propose an unsupervised method to infer the scene structure. We train the model with consecutive stereo images as input while only a single image is required at test time. In contrast to previous works, this paper presents an adaptive loss function to tackle the regions which are non-overlapping between consecutive images. Moreover, by exploiting the pixels' discontinuity in the edge region and the continuity in the non-edge region of a depth image, we propose a novel depth smoothness loss to improve the accuracy of the model. In addition, as an auxiliary task, our model also obtains the camera motion between consecutive images. Experimental results on the KITTI and Cityscapes datasets show that our model outperforms other unsupervised frameworks and some supervised frameworks.
Published: 2019

24. Depth Estimation Using Feature Pyramid U-Net and Polarized Self-Attention for Road Scenes

Author: Bo Tao, Yunfei Shen, Xiliang Tong, Du Jiang, and Baojia Chen
Subjects: monocular depth estimation, self-supervision, attention mechanism, target detection module, Radiology, Nuclear Medicine and imaging, Instrumentation, Atomic and Molecular Physics, and Optics
Abstract: Studies have shown that the observed image texture details and semantic information are of great significance for the depth estimation on the road scenes. However, there are ambiguities and inaccuracies in the boundary information of observed objects in previous methods. For this reason, we hope to design a new depth estimation method that can obtain higher accuracy and more accurate boundary information of the detected object. Based on polarized self-attention (PSA) and feature pyramid U-net, we proposed a new self-supervised monocular depth estimation model to extract more accurate texture details and semantic information. Firstly, we add a PSA module at the end of the depth encoder and pose encoder so that the network can extract more accurate semantic information. Then, based on the U-net, we put the multi-scale image obtained by the object detection module FPN (Feature Pyramid network) directly into the decoder. It can guide the model to learn semantic information, thus enhancing the boundary of the image. We evaluated our method on KITTI 2015 datasets and Make3D datasets, and our model achieved better results than previous studies. In order to verify the generalization of the model, we have done monocular, stereo, monocular plus stereo experiments. The experimental results show that our model has achieved better results in several main evaluation indexes and clearer boundary information. In order to compare different forms of PSA mechanism, we did ablation experiments. Compared with no PSA module, after adding the PSA module, better results in evaluating indicators were achieved. We also found that our model is better in monocular training than stereo training and monocular plus stereo training.
Published: 2022

25. Self-Supervised Monocular Depth Estimation Based on Channel Attention

Author: Bo Tao, Xinbo Chen, Xiliang Tong, Du Jiang, and Baojia Chen
Subjects: Radiology, Nuclear Medicine and imaging, monocular depth estimation, deep learning, channel attention, self-supervision, Instrumentation, Atomic and Molecular Physics, and Optics
Abstract: Scene structure and local details are important factors in producing high-quality depth estimations so as to solve fuzzy artifacts in depth prediction results. We propose a new network structure that combines two channel attention modules in a deep prediction network. The structure perception module (spm) uses a frequency channel attention network. We use frequencies from different perspectives to analyze the channel representation as a compression process. This enhances the perception of the scene structure and obtains more feature information. The detail emphasis module (dem) adopts the global attention mechanism. It improves the performance of deep neural networks by reducing irrelevant information and magnifying global interactive representations. Emphasizing important details effectively fuses features at different scales to achieve more accurate and clearer depth predictions. Experiments show that our network produces clearer depth estimations, and our accuracy rate on the KITTI benchmark has improved from 98.1% to 98.3% in the δ < 1.253 metric.
Published: 2022

26. Monocular Depth and Velocity Estimation Based on Multi-Cue Fusion

Author: Chunyang Qi, Hongxiang Zhao, Chuanxue Song, Naifu Zhang, Sinxin Song, Haigang Xu, and Feng Xiao
Subjects: Control and Optimization, Control and Systems Engineering, Mechanical Engineering, Computer Science (miscellaneous), Electrical and Electronic Engineering, Industrial and Manufacturing Engineering, monocular depth estimation, driver assistance systems, computer vision, attention mechanisms
Abstract: Many consumers and scholars currently focus on driving assistance systems (DAS) and intelligent transportation technologies. The distance and speed measurement technology of the vehicle ahead is an important part of the DAS. Existing vehicle distance and speed estimation algorithms based on monocular cameras still have limitations, such as ignoring the relationship between the underlying features of vehicle speed and distance. A multi-cue fusion monocular velocity and ranging framework is proposed to improve the accuracy of monocular ranging and velocity measurement. We use the attention mechanism to fuse different feature information. The training method is used to jointly train the network through the distance velocity regression loss function and the depth loss as an auxiliary loss function. Finally, experimental validation is performed on the Tusimple dataset and the KITTI dataset. On the Tusimple dataset, the average speed mean square error of the proposed method is less than 0.496 m2/s2, and the average mean square error of the distance is 5.695 m2. On the KITTI dataset, the average velocity mean square error of our method is less than 0.40 m2/s2. In addition, we test in different scenarios and confirm the effectiveness of the network.
Published: 2022

27. Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion

Author: Lam Huynh, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila, Tampere University, and Computing Sciences
Subjects: FOS: Computer and information sciences, 3D Point Fusion, Monocular Depth Estimation, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Depth Completion, 113 Computer and information sciences
Abstract: In this paper, we propose enhancing monocular depth estimation by adding 3D points as depth guidance. Unlike existing depth completion methods, our approach performs well on extremely sparse and unevenly distributed point clouds, which makes it agnostic to the source of the 3D points. We achieve this by introducing a novel multi-scale 3D point fusion network that is both lightweight and efficient. We demonstrate its versatility on two different depth estimation problems where the 3D points have been acquired with conventional structure-from-motion and LiDAR. In both cases, our network performs on par with state-of-the-art depth completion methods and achieves significantly higher accuracy when only a small number of points is used while being more compact in terms of the number of parameters. We show that our method outperforms some contemporary deep learning based multi-view stereo and structure-from-motion methods both in accuracy and in compactness., Comment: 10 pages, 9 figures
Published: 2020
Full Text: View/download PDF

28. SemanticDepth: Fusing Semantic Segmentation and Monocular Depth Estimation for Enabling Autonomous Driving in Roads without Lane Lines

Author: Pablo R. Palafox, Johannes Betz, Felix Nobis, Konstantin Riedl, and Markus Lienkamp
Subjects: situational awareness, autonomous driving, Advanced Driver Assistance Systems (ADAS), fusion architecture, deep learning, lcsh:TP1-1185, monocular depth estimation, scene understanding, lcsh:Chemical technology, Article, computer vision, semantic segmentation
Abstract: Typically, lane departure warning systems rely on lane lines being present on the road.However, in many scenarios, e.g., secondary roads or some streets in cities, lane lines are eithernot present or not sufficiently well signaled. In this work, we present a vision-based method tolocate a vehicle within the road when no lane lines are present using only RGB images as input.To this end, we propose to fuse together the outputs of a semantic segmentation and a monoculardepth estimation architecture to reconstruct locally a semantic 3D point cloud of the viewed scene.We only retain points belonging to the road and, additionally, to any kind of fences or walls thatmight be present right at the sides of the road. We then compute the width of the road at a certainpoint on the planned trajectory and, additionally, what we denote as the fence-to-fence distance.Our system is suited to any kind of motoring scenario and is especially useful when lane lines arenot present on the road or do not signal the path correctly. The additional fence-to-fence distancecomputation is complementary to the road&rsquo, s width estimation. We quantitatively test our methodon a set of images featuring streets of the city of Munich that contain a road-fence structure, so asto compare our two proposed variants, namely the road&rsquo, s width and the fence-to-fence distancecomputation. In addition, we also validate our system qualitatively on the Stuttgart sequence of thepublicly available Cityscapes dataset, where no fences or walls are present at the sides of the road,thus demonstrating that our system can be deployed in a standard city-like environment. For thebenefit of the community, we make our software open source.
Published: 2019

29. How do neural networks see depth in single images?

Author: Tom van Dijk and Guido C. H. E. de Croon
Subjects: FOS: Computer and information sciences, Artificial neural network, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, monocular depth estimation, 02 engineering and technology, 010501 environmental sciences, neural networks, 01 natural sciences, Image (mathematics), Visualization, Computer Science - Robotics, Depth perception, Position (vector), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Robotics (cs.RO), 0105 earth and related environmental sciences
Abstract: Deep neural networks have lead to a breakthrough in depth estimation from single images. Recent work often focuses on the accuracy of the depth map, where an evaluation on a publicly available test set such as the KITTI vision benchmark is often the main result of the article. While such an evaluation shows how well neural networks can estimate depth, it does not show how they do this. To the best of our knowledge, no work currently exists that analyzes what these networks have learned. In this work we take the MonoDepth network by Godard et al. and investigate what visual cues it exploits for depth estimation. We find that the network ignores the apparent size of known obstacles in favor of their vertical position in the image. Using the vertical position requires the camera pose to be known; however we find that MonoDepth only partially corrects for changes in camera pitch and roll and that these influence the estimated depth towards obstacles. We further show that MonoDepth's use of the vertical image position allows it to estimate the distance towards arbitrary obstacles, even those not appearing in the training set, but that it requires a strong edge at the ground contact point of the object to do so. In future work we will investigate whether these observations also apply to other neural networks for monocular depth estimation., Comment: Submitted
Published: 2019
Full Text: View/download PDF

30. Unsupervised Monocular Depth Estimation for Colonoscope System Using Feedback Network

Author: Sung-Jun Park, Gyu-Min Kim, Joong-Hwan Baek, and Seung-Jun Hwang
Subjects: Adenoma, 0209 industrial biotechnology, Computer science, Skill level, Colonoscopy, monocular depth estimation, 02 engineering and technology, lcsh:Chemical technology, Biochemistry, Article, Feedback, 030218 nuclear medicine & medical imaging, Analytical Chemistry, 03 medical and health sciences, 020901 industrial engineering & automation, 0302 clinical medicine, Artificial Intelligence, colonoscopy, Consistency (statistics), medicine, Humans, lcsh:TP1-1185, Electrical and Electronic Engineering, Instrumentation, Estimation, Monocular, Colonoscopes, medicine.diagnostic_test, business.industry, Frame (networking), Pattern recognition, Atomic and Molecular Physics, and Optics, unsupervised deep learning, Robot, Artificial intelligence, Detection rate, business
Abstract: A colonoscopy is a medical examination used to check disease or abnormalities in the large intestine. If necessary, polyps or adenomas would be removed through the scope during a colonoscopy. Colorectal cancer can be prevented through this. However, the polyp detection rate differs depending on the condition and skill level of the endoscopist. Even some endoscopists have a 90% chance of missing an adenoma. Artificial intelligence and robot technologies for colonoscopy are being studied to compensate for these problems. In this study, we propose a self-supervised monocular depth estimation using spatiotemporal consistency in the colon environment. It is our contribution to propose a loss function for reconstruction errors between adjacent predicted depths and a depth feedback network that uses predicted depth information of the previous frame to predict the depth of the next frame. We performed quantitative and qualitative evaluation of our approach, and the proposed FBNet (depth FeedBack Network) outperformed state-of-the-art results for unsupervised depth estimation on the UCL datasets.
Published: 2021

31. Persistent self-supervised learning: From stereo to monocular vision for obstacle avoidance

Author: van Hecke, K.G., de Croon, G.C.H.E., van der Maaten, L.J.P., Hennes, Daniel, and Izzo, Dario
Subjects: Persistent self-supervised learning, robotics, 0209 industrial biotechnology, Monocular, Computer science, business.industry, Deep learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Aerospace Engineering, monocular depth estimation, Robotics, Context (language use), 02 engineering and technology, stereo vision, 020901 industrial engineering & automation, Stereopsis, Obstacle avoidance, 0202 electrical engineering, electronic engineering, information engineering, Robot, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Monocular vision
Abstract: Self-supervised learning is a reliable learning mechanism in which a robot uses an original, trusted sensor cue for training to recognize an additional, complementary sensor cue. We study for the first time in self-supervised learning how a robot’s learning behavior should be organized, so that the robot can keep performing its task in the case that the original cue becomes unavailable. We study this persistent form of self-supervised learning in the context of a flying robot that has to avoid obstacles based on distance estimates from the visual cue of stereo vision. Over time it will learn to also estimate distances based on monocular appearance cues. A strategy is introduced that has the robot switch from flight based on stereo to flight based on monocular vision, with stereo vision purely used as “training wheels” to avoid imminent collisions. This strategy is shown to be an effective approach to the “feedback-induced data bias” problem as also experienced in learning from demonstration. Both simulations and real-world experiments with a stereo vision equipped ARDrone2 show the feasibility of this approach, with the robot successfully using monocular vision to avoid obstacles in a 5 × 5 m room. The experiments show the potential of persistent self-supervised learning as a robust learning approach to enhance the capabilities of robots. Moreover, the abundant training data coming from the own sensors allow to gather large data sets necessary for deep learning approaches.
Published: 2018
Full Text: View/download PDF

32. Depth Estimation for Egocentric Rehabilitation Monitoring Using Deep Learning Algorithms

Author: Yasaman Izadmehr, Héctor F. Satizábal, Kamiar Aminian, and Andres Perez-Uribe
Subjects: Fluid Flow and Transfer Processes, single-image depth prediction, Process Chemistry and Technology, monocular depth estimation, free-living monitoring, wearable devices, context awareness, upper-limb neurological disorders, quality of movement, rehabilitation, General Engineering, stroke, Computer Science Applications, General Materials Science, activity recognition, Instrumentation
Abstract: Upper limb impairment is one of the most common problems for people with neurological disabilities, affecting their activity, quality of life (QOL), and independence. Objective assessment of upper limb performance is a promising way to help patients with neurological upper limb disorders. By using wearable sensors, such as an egocentric camera, it is possible to monitor and objectively assess patients’ actual performance in activities of daily life (ADLs). We analyzed the possibility of using Deep Learning models for depth estimation based on a single RGB image to allow the monitoring of patients with 2D (RGB) cameras. We conducted experiments placing objects at different distances from the camera and varying the lighting conditions to evaluate the performance of the depth estimation provided by two deep learning models (MiDaS & Alhashim). Finally, we integrated the best performing model for depth-estimation (MiDaS) with other Deep Learning models for hand (MediaPipe) and object detection (YOLO) and evaluated the system in a task of hand-object interaction. Our tests showed that our final system has a 78% performance in detecting interactions, while the reference performance using a 3D (depth) camera is 84%.

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

32 results on '"Monocular depth estimation"'

1. Energy-Quality Scalable Monocular Depth Estimation on Low-Power CPUs

2. SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings

3. Improved deep depth estimation for environments with sparse visual cues

4. Monocular Depth Estimation for Tilted Images via Gravity Rectifier

5. Partial 3D-reconstruction of the colon from monoscopic colonoscopy videos using shape-from-motion and deep learning

6. Supervised Object-Specific Distance Estimation from Monocular Images for Autonomous Driving

7. A Self-Supervised Monocular Depth Estimation Approach Based on UAV Aerial Images

8. Depth estimation from a single SEM image using pixel-wise fine-tuning with multimodal data

9. Real-to-virtual domain transfer-based depth estimation for real-time 3D annotation in transnasal surgery: a study of annotation accuracy and stability

10. Monocular Depth Estimation for 3D Map Construction at Underground Parking Structures

11. Deep Monocular Depth Estimation Based on Content and Contextual Features

12. Self-Supervised Monocular Depth Estimation With Extensive Pretraining

13. Monocular Depth Estimation Based on Multi-Scale Depth Map Fusion

14. Joint Attention Mechanisms for Monocular Depth Estimation With Multi-Scale Convolutions and Adaptive Weight Adjustment

15. Leveraging Contextual Information for Monocular Depth Estimation

16. Transfer2Depth: Dual Attention Network With Transfer Learning for Monocular Depth Estimation

17. Unsupervised Monocular Training Method for Depth Estimation Using Statistical Masks

18. LW-Net: A Lightweight Network for Monocular Depth Estimation

19. Lightweight Monocular Depth with a Novel Neural Architecture Search Method

20. Monocular Depth Estimation for Autonomous Driving

21. Depth from Mono Accuracy Analysis by Changing Camera Parameters in the CARLA Simulator

22. Gaussian Weighted Deep Modeling for Improved Depth Estimation in Monocular Images

23. An Adaptive Unsupervised Learning Framework for Monocular Depth Estimation

24. Depth Estimation Using Feature Pyramid U-Net and Polarized Self-Attention for Road Scenes

25. Self-Supervised Monocular Depth Estimation Based on Channel Attention

26. Monocular Depth and Velocity Estimation Based on Multi-Cue Fusion

27. Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion

28. SemanticDepth: Fusing Semantic Segmentation and Monocular Depth Estimation for Enabling Autonomous Driving in Roads without Lane Lines

29. How do neural networks see depth in single images?

30. Unsupervised Monocular Depth Estimation for Colonoscope System Using Feedback Network

31. Persistent self-supervised learning: From stereo to monocular vision for obstacle avoidance

32. Depth Estimation for Egocentric Rehabilitation Monitoring Using Deep Learning Algorithms

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Journal

Database

Publisher

32 results on '"Monocular depth estimation"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources