9,638 results on '"Monocular"'
Search Results
202. Constant-time monocular object detection using scene geometry.
- Author
-
Nieto, Marcos, Ortega, Juan Diego, Leškovský, Peter, and Senderos, Orti
- Subjects
- *
PHYSICAL constants , *MONOCULARS , *GEOMETRY , *PARAMETER estimation , *CLASSIFICATION algorithms - Abstract
This paper presents a structured approach for efficiently exploiting the perspective information of a scene to enhance the detection of objects in monocular systems. It defines a finite grid of 3D positions on the dominant ground plane and computes occupancy maps from which object location estimates are extracted. This method works on the top of any detection method, either pixel-wise (e.g. background subtraction) or region-wise (e.g. detection-by-classification) technique, which can be linked to the proposed scheme with minimal fine tuning. Its flexibility thus allows for applying this approach in a wide variety of applications and sectors, such as surveillance applications (e.g. person detection) or driver assistance systems (e.g. vehicle or pedestrian detection). Extensive results provide evidence of its excellent performance and its ease of use in combination with different image processing techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
203. DepthCut: improved depth edge estimation using multiple unreliable channels.
- Author
-
Guerrero, Paul, Winnemöller, Holger, Li, Wilmot, and Mitra, Niloy J.
- Subjects
- *
DEEP learning , *GENETIC algorithms , *GEOMETRIC analysis , *VIRTUAL reality , *VISUAL analytics - Abstract
In the context of scene understanding, a variety of methods exists to estimate different information channels from mono or stereo images, including disparity, depth, and normals. Although several advances have been reported in the recent years for these tasks, the estimated information is often imprecise particularly near depth discontinuities or creases. Studies have however shown that precisely such depth edges carry critical cues for the perception of shape, and play important roles in tasks like depth-based segmentation or foreground selection. Unfortunately, the currently extracted channels often carry conflicting signals, making it difficult for subsequent applications to effectively use them. In this paper, we focus on the problem of obtaining high-precision depth edges (i.e., depth contours and creases) by jointly analyzing such unreliable information channels. We propose DEPTHCUT, a data-driven fusion of the channels using a convolutional neural network trained on a large dataset with known depth. The resulting depth edges can be used for segmentation, decomposing a scene into depth layers with relatively flat depth, or improving the accuracy of the depth estimate near depth edges by constraining its gradients to agree with these edges. Quantitatively, we compare against 18 variants of baselines and demonstrate that our depth edges result in an improved segmentation performance and an improved depth estimate near depth edges compared to data-agnostic channel fusion. Qualitatively, we demonstrate that the depth edges result in superior segmentation and depth orderings. (Code and datasets will be made available.) [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
204. Linear SFM: A hierarchical approach to solving structure-from-motion problems by decoupling the linear and nonlinear components.
- Author
-
Zhao, Liang, Huang, Shoudong, and Dissanayake, Gamini
- Subjects
- *
ALGORITHMS , *STEREO image , *LEAST squares , *MONOCULARS , *MATHEMATICAL optimization - Abstract
This paper presents a novel hierarchical approach to solving structure-from-motion (SFM) problems. The algorithm begins with small local reconstructions based on nonlinear bundle adjustment (BA). These are then joined in a hierarchical manner using a strategy that requires solving a linear least squares optimization problem followed by a nonlinear transform. The algorithm can handle ordered monocular and stereo image sequences. Two stereo images or three monocular images are adequate for building each initial reconstruction. The bulk of the computation involves solving a linear least squares problem and, therefore, the proposed algorithm avoids three major issues associated with most of the nonlinear optimization algorithms currently used for SFM: the need for a reasonably accurate initial estimate, the need for iterations, and the possibility of being trapped in a local minimum. Also, by summarizing all the original observations into the small local reconstructions with associated information matrices, the proposed Linear SFM manages to preserve all the information contained in the observations. The paper also demonstrates that the proposed problem formulation results in a sparse structure that leads to an efficient numerical implementation. The experimental results using publicly available datasets show that the proposed algorithm yields solutions that are very close to those obtained using a global BA starting with an accurate initial estimate. The C/C++ source code of the proposed algorithm is publicly available at https://github.com/LiangZhaoPKUImperial/LinearSFM . [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
205. Deep Neural Network for Autonomous UAV Navigation in Indoor Corridor Environments.
- Author
-
Padhy, Ram Prasad, Verma, Sachin, Ahmad, Shahzad, Choudhury, Suman Kumar, and Sa, Pankaj Kumar
- Subjects
DRONE aircraft ,ARTIFICIAL neural networks ,AUTONOMOUS robots ,GLOBAL Positioning System ,CAMERAS - Abstract
In recent years, the UAV technology is unceasingly emerging as a revolutionary reform among the research community. In this paper, we propose a method that facilitates UAVs with a monocular camera to navigate autonomously in previously unknown and GPS-denied indoor corridor arenas. The proposed system uses a state-of-the-art Convolutional Neural Network (CNN) model to achieve the task. We propose a novel approach, which uses the video feed extracted from the front camera of the UAV and passes it through a deep neural network model to decide on the next course of maneuver. The entire process is treated as a classification task where the deep neural network model is responsible for classifying the image as left, right or center of the corridor. The training is performed over a dataset of images, collected from various indoor corridor environments. Apart from utilizing the front facing camera, the model is not dependent on any other sensor. We demonstrate the efficacy of the proposed system in real-time indoor corridor scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
206. Callosal Influence on Visual Receptive Fields Has an Ocular, an Orientation-and Direction Bias.
- Author
-
Conde-Ocazionez, Sergio A., Jungen, Christiane, Wunderle, Thomas, Eriksson, David, Neuenschwander, Sergio, and Schmidt, Kerstin E.
- Subjects
BINOCULAR vision disorders ,RECEPTIVE fields (Neurology) ,ACTION potentials - Abstract
One leading hypothesis on the nature of visual callosal connections (CC) is that they replicate features of intrahemispheric lateral connections. However, CC act also in the central part of the binocular visual field. In agreement, early experiments in cats indicated that they provide the ipsilateral eye part of binocular receptive fields (RFs) at the vertical midline (Berlucchi and Rizzolatti, 1968), and play a key role in stereoscopic function. But until today callosal inputs to receptive fields activated by one or both eyes were never compared simultaneously, because callosal function has been often studied by cutting or lesioning either corpus callosum or optic chiasm not allowing such a comparison. To investigate the functional contribution of CC in the intact cat visual system we recorded both monocular and binocular neuronal spiking responses and receptive fields in the 17/18 transition zone during reversible deactivation of the contralateral hemisphere. Unexpectedly from many of the previous reports, we observe no change in ocular dominance during CC deactivation. Throughout the transition zone, a majority of RFs shrink, but several also increase in size. RFs are significantly more affected for ipsi- as opposed to contralateral stimulation, but changes are also observed with binocular stimulation. Noteworthy, RF shrinkages are tiny and not correlated to the profound decreases of monocular and binocular firing rates. They depend more on orientation and direction preference than on eccentricity or ocular dominance of the receiving neuron's RF. Our findings confirm that in binocularly viewing mammals, binocular RFs near the midline are constructed via the direct geniculo-cortical pathway. They also support the idea that input from the two eyes complement each other through CC: Rather than linking parts of RFs separated by the vertical meridian, CC convey a modulatory influence, reflecting the feature selectivity of lateral circuits, with a strong cardinal bias. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
207. A multicentre study of long‐term follow‐up and owner satisfaction following enucleation in horses.
- Author
-
Wright, K., Ireland, J. L., and Rendle, D. I.
- Abstract
Summary: Background: Horses are reported to return to a variety of disciplines following unilateral enucleation; however, owners of horses with ocular disease are frequently reluctant to consider the procedure. There is little published information investigating owners’ attitudes towards, and satisfaction following, enucleation. Objectives: To investigate the hypotheses: 1) horses return to their previous level of work following unilateral enucleation and 2) their owners are satisfied with the post‐operative outcome. Study design: Retrospective case series with cross‐sectional survey. Methods: Clinical records from eight equine referral centres in the United Kingdom were reviewed to identify horses that underwent enucleation between August 2006 and March 2015. Owner questionnaires were completed to corroborate information provided by medical records and obtain information on client perceptions. Results: A total of 170 cases were identified and 119 owner questionnaires completed. The most frequent primary uses of horses in the study were pleasure/leisure riding, showjumping and dressage, with 25.2% (n = 30) of horses used for competition. Following enucleation, 77.3% (n = 92) of horses were performing at an equivalent or higher level than prior to enucleation and 60.0% (n = 18/30) of competition horses were competing at an equivalent or higher level. Complications related to the surgical site (predominantly mild post‐operative swelling) were reported in 3.7% of cases and nonocular complications in 3.7% of cases. Of owners who reported being concerned or very concerned about certain factors prior to surgery, ≥86.8% reported that these factors caused little or no issue post‐surgery. Most owners, 90.8% (n = 108) were pleased with the outcome following surgery, with 21.3% (n = 23/108) wishing the procedure had been undertaken sooner. Main limitations: Retrospective data collection from clinical records and the potential for recall bias. Conclusions: Horses can return successfully to a variety of disciplines following enucleation. Owners are satisfied with the outcome and pleased that enucleation was performed. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
208. The Visual–Inertial Canoe Dataset.
- Author
-
Miller, Martin, Chung, Soon-Jo, and Hutchinson, Seth
- Subjects
- *
GLOBAL Positioning System , *CANOES & canoeing - Abstract
We present a dataset collected from a canoe along the Sangamon River in Illinois. The canoe was equipped with a stereo camera, an inertial measurement unit (IMU), and a global positioning system (GPS) device, which provide visual data suitable for stereo or monocular applications, inertial measurements, and position data for ground truth. We recorded a canoe trip up and down the river for 44 minutes covering a 2.7 km round trip. The dataset adds to those previously recorded in unstructured environments and is unique in that it is recorded on a river, which provides its own set of challenges and constraints that are described in this paper. The dataset is stored on the Illinois Data Bank and can be accessed at:
https://doi.org/10.13012/B2IDB-9342111_V1 . [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
209. Keyframe-based monocular SLAM: design, survey, and future directions.
- Author
-
Younes, Georges, Asmar, Daniel, Shammas, Elie, and Zelek, John
- Subjects
- *
VISUAL perception , *SLAM (Robotics) , *LOCALIZATION problems (Robotics) , *COGNITIVE robotics , *ROBOTIC trajectory control , *MOTION analysis , *MATHEMATICAL models - Abstract
Extensive research in the field of monocular SLAM for the past fifteen years has yielded workable systems that found their way into various applications in robotics and augmented reality. Although filter-based monocular SLAM systems were common at some time, the more efficient keyframe-based solutions are becoming the de facto methodology for building a monocular SLAM system. The objective of this paper is threefold: first, the paper serves as a guideline for people seeking to design their own monocular SLAM according to specific environmental constraints. Second, it presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution. Third, the paper provides insight into the direction of future research in this field, to address the major limitations still facing monocular SLAM; namely, in the issues of illumination changes, initialization, highly dynamic motion, poorly textured scenes, repetitive textures, map maintenance, and failure recovery. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
210. Real-time Obstacle Avoidance on a Quadrotor Using CNN-based Monocular Depth Estimation
- Author
-
Hyeonbeom Lee and Hyeongjin Kim
- Subjects
Estimation ,Monocular ,Control and Systems Engineering ,Computer science ,business.industry ,Applied Mathematics ,Obstacle avoidance ,Computer vision ,Artificial intelligence ,business ,Software - Published
- 2021
211. Dynamical mechanisms of a monolayer binocular rivalry model with fixed and time-dependent stimuli
- Author
-
Fang Han, Wenlian Lu, Qinghua Zhu, Zhijie Wang, and Kaleem Kashif
- Subjects
Binocular rivalry ,Physics ,Monocular ,genetic structures ,Applied Mathematics ,Mechanical Engineering ,Flicker ,Aerospace Engineering ,Ocean Engineering ,Stimulus (physiology) ,Visual cortex ,medicine.anatomical_structure ,Control and Systems Engineering ,medicine ,Oscillation (cell signaling) ,Electrical and Electronic Engineering ,Rivalry ,Neuroscience ,Bifurcation - Abstract
Current research practices have revealed that the neural mechanisms driving visual consciousness alternations in binocular rivalry come from the primary visual cortex, i.e., stimulus rivalry can be induced by monocular neurons. However, the competition mechanisms of the monocular neurons remain unclear. In this paper, we probe the dynamical characteristics of a monolayer binocular rivalry model (which contains four monocular neurons) with different types of stimuli, including fixed inputs, swap, flicker, swap and flicker, swap and blanks, respectively. Firstly, we study the dynamic effects of the traditional stimuli with fixed inputs (but with different grating conditions) on the monolayer rivalry model. Results show that Hopf bifurcations can induce three types of dynamical behaviors: winner-take-all (WTA), rivalry oscillation (RIV), and same activity (SAM), which are similar to other binocular rivalry models. Besides, the simulation results indicate that the competition mechanisms of the monolayer binocular rivalry model are more consistent with the experimental results compared with the hierarchical rivalry model proposed by Wilson. Secondly, the dynamical mechanisms of the monolayer rivalry model with four types of time-dependent stimuli are investigated. More complex dynamical behaviors including WTA-Mod, RIV-Mod, SAM-Mod induced by torus bifurcations and cycle skipping, multi-cycle skipping, chaos induced by period-doubling bifurcation, appear with different types of periodic stimuli. Finally, we analyze the perceptual alternation mechanisms based on the temporal characteristics of the monolayer rivalry model, and results are more consistent with the empirical findings compared with the hierarchical rivalry model. Our simulations and analysis provide a new opportunity for future experiments to investigate the neural mechanisms of binocular rivalry.
- Published
- 2021
212. Efficient Monocular Depth Estimation with Transfer Feature Enhancement
- Author
-
Ming Yin
- Subjects
Monocular ,business.industry ,Computer science ,Process (computing) ,Boundary (topology) ,Word error rate ,Pattern recognition ,Image (mathematics) ,Upsampling ,Feature (computer vision) ,Robustness (computer science) ,Signal Processing ,Artificial intelligence ,Electrical and Electronic Engineering ,business - Abstract
Estimating the depth of the scene from a monocular image is an essential step for image semantic understanding. Practically, some existing methods for this highly ill-posed issue are still in lack of robustness and efficiency. This paper proposes a novel end-to-end depth esti- mation model with skip connections from a pre- trained Xception model for dense feature extrac- tion, and three new modules are designed to im- prove the upsampling process. In addition, ELU activation and convolutions with smaller kernel size are added to improve the pixel-wise regres- sion process. The experimental results show that our model has fewer network parameters, a lower error rate than the most advanced networks and requires only half the training time. The evalu- ation is based on the NYU v2 dataset, and our proposed model can achieve clearer boundary de- tails with state-of-the-art effects and robustness.
- Published
- 2021
213. Excitatory Contribution to Binocular Interactions in Human Visual Cortex Is Reduced in Strabismic Amblyopia
- Author
-
Spero Nicholas, Preeti Verghese, Chuan Hou, Ismet Joan Üner, and Terence L. Tyson
- Subjects
Adult ,Male ,genetic structures ,Binocular summation ,media_common.quotation_subject ,Biology ,Amblyopia ,Young Adult ,Cortex (anatomy) ,medicine ,Humans ,Contrast (vision) ,Research Articles ,Aged ,Visual Cortex ,media_common ,Vision, Binocular ,Monocular ,General Neuroscience ,Electroencephalography ,Middle Aged ,eye diseases ,Strabismus ,Electrophysiology ,Visual cortex ,medicine.anatomical_structure ,Excitatory postsynaptic potential ,Evoked Potentials, Visual ,Female ,Neuroscience ,Binocular vision ,Photic Stimulation - Abstract
Binocular summation in strabismic amblyopia is typically reported as being absent or greatly reduced in behavioral studies and is thought to be because of a preferential loss of excitatory interactions between the eyes. Here, we studied how excitatory and suppressive interactions contribute to binocular contrast interactions along the visual cortical hierarchy of humans with strabismic and anisometropic amblyopia in both sexes, using source-imaged steady-state visual evoked potentials (SSVEP) over a wide range of relative contrast between the two eyes. Dichoptic parallel grating stimuli modulated at unique temporal frequencies in each eye allowed us to quantify spectral response components associated with monocular inputs (self-terms) and the response components because of interaction of the inputs of the two eyes [intermodulation (IM) terms]. Although anisometropic amblyopes revealed a similar pattern of responses to normal-vision observers, strabismic amblyopes exhibited substantially reduced IM responses across cortical regions of interest (V1, V3a, hV4, hMT+ and lateral occipital cortex), indicating reduced interocular interactions in visual cortex. A contrast gain control model that simultaneously fits self- and IM-term responses within each cortical area revealed different patterns of binocular interactions between individuals with normal and disrupted binocularity. Our model fits show that in strabismic amblyopia, the excitatory contribution to binocular interactions is significantly reduced in both V1 and extra-striate cortex, whereas suppressive contributions remain intact. Our results provide robust electrophysiological evidence supporting the view that disruption of binocular interactions in strabismus or amblyopia is because of preferential loss of excitatory interactions between the eyes.SIGNIFICANCE STATEMENTWe studied how excitatory and suppressive interactions contribute to binocular contrast interactions along the visual cortical hierarchy of humans with normal and amblyopic vision, using source-imaged SSVEP and frequency-domain analysis of dichoptic stimuli over a wide range of relative contrast between the two eyes. A dichoptic contrast gain control model was used to characterize these interactions in amblyopia and provided a quantitative comparison to normal vision. Our model fits revealed different patterns of binocular interactions between normal and amblyopic vision. Strabismic amblyopia significantly reduced excitatory contributions to binocular interactions, whereas suppressive contributions remained intact. Our results provide robust evidence supporting the view that the preferential loss of excitatory interactions disrupts binocular interactions in strabismic amblyopia.
- Published
- 2021
214. Perceptual and cognitive processes in augmented reality – comparison between binocular and monocular presentations
- Author
-
Tsukasa Kimura, Akihiko Dempo, and Kazumitsu Shinohara
- Subjects
Linguistics and Language ,medicine.medical_specialty ,Monocular ,genetic structures ,media_common.quotation_subject ,Experimental and Cognitive Psychology ,Cognition ,Stimulus (physiology) ,Audiology ,behavioral disciplines and activities ,eye diseases ,Sensory Systems ,Language and Linguistics ,Task (project management) ,Ocular dominance ,Perception ,medicine ,sense organs ,Psychology ,Physiological psychology ,Oddball paradigm ,psychological phenomena and processes ,media_common - Abstract
In the present study, we investigated the difference between monocular augmented reality (AR) and binocular AR in terms of perception and cognition by using a task that combines the flanker task with the oddball task. A right- or left-facing arrowhead was presented as a central stimulus at the central vision, and participants were instructed to press a key only when the direction in which the arrowhead faced was a target. In a small number of trials, arrowheads that were facing in the same or opposite direction (flanker stimuli) were presented beside the central stimulus binocularly or monocularly as an AR image. In the binocular condition, the flanker stimuli were presented to both eyes, and, in the monocular condition, only to the dominant eye. The results revealed that participants could respond faster in the binocular condition than in the monocular one; however, only when the flanker stimuli were in the opposite direction was the response faster in the monocular condition. Moreover, the results of event-related brain potentials (ERPs) showed that all stimuli were processed in both the monocular and the binocular conditions in the perceptual stage; however, the influence of the flanker stimuli was attenuated in the monocular condition in the cognitive stage. The influence of flanker stimuli might be more unstable in the monocular condition than in the binocular condition, but more precise examination should be conducted in a future study.
- Published
- 2021
215. Self-supervised Monocular Trained Depth Estimation Using Triplet Attention and Funnel Activation
- Author
-
Xiangdong Kong, Kaixu Zhang, Xuezhi Xiang, Yujian Qiu, and Ning Lv
- Subjects
Ground truth ,business.product_category ,Monocular ,Computer Networks and Communications ,business.industry ,Computer science ,General Neuroscience ,GRASP ,Complex system ,Context (language use) ,Computational intelligence ,Convolution ,Artificial Intelligence ,Computer vision ,Artificial intelligence ,Funnel ,business ,Software - Abstract
Dense depth estimation based on a single image is a basic problem in computer vision and has exciting applications in many robotic tasks. Modelling fully supervised methods requires the acquisition of accurate and large ground truth data sets, which is often complex and expensive. On the other hand, self-supervised learning has emerged as a promising alternative to monocular depth estimation as it does not require ground truth depth data. In this paper, we propose a novel self-supervised joint learning framework for depth estimation using consecutive frames from monocular and stereo videos. Our architecture leverages two new ideas for improvement: (1) triplet attention and (2) funnel activation (FReLU). By adding triplet attention to the deep and pose networks, this module captures the importance of features across dimensions in a tensor without any information bottlenecks, making the optimisation learning framework more reliable. FReLU is used at the non-linear activation layer to grasp the local context adaptively in images, rather than using more complex convolutions at the convolution layer. FReLU extracts the spatial structure of objects by the pixel-wise modeling capacity provided by the spatial condition, making the details of the complex image richer. The experimental results show that the proposed method is comparable with the state-of-the-art self-supervised monocular depth estimation method.
- Published
- 2021
216. Vision-based positioning system for auto-docking of unmanned surface vehicles (USVs)
- Author
-
Thor I. Fossen, Øystein Volden, and Annette Stahl
- Subjects
Monocular ,Positioning system ,business.industry ,Computer science ,3D reconstruction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Navigation system ,Modular design ,Object detection ,Field (computer science) ,Computer Science Applications ,Lidar ,Artificial Intelligence ,Computer vision ,Artificial intelligence ,business - Abstract
This paper presents an independent stereo-vision based positioning system for docking operations. The low-cost system consists of an object detector and different 3D reconstruction techniques. To address the challenge of robust detections in an unstructured and complex outdoor environment, a learning-based object detection model is proposed. The system employs a complementary modular approach that uses data-driven methods, utilizing data wherever required and traditional computer vision methods when the scope and complexity of the environment are reduced. Both, monocular and stereo-vision based methods are investigated for comparison. Furthermore, easily identifiable markers are utilized to obtain reference points, thus simplifying the localization task. A small unmanned surface vehicle (USV) with a LiDAR-based positioning system was exploited to verify that the proposed vision-based positioning system produces accurate measurements under various docking scenarios. Field experiments have proven that the developed system performs well and can supplement the traditional navigation system for safety-critical docking operations.
- Published
- 2021
217. Improving outdoor plane estimation without manual supervision
- Author
-
Mustafa Özuysal and Furkan Eren Uzyıldırım
- Subjects
Ground truth ,Monocular ,Exploit ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Point cloud ,Process (computing) ,Convolutional neural network ,Signal Processing ,Range (statistics) ,Segmentation ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business - Abstract
Recently, great progress has been made in the automatic detection and segmentation of planar regions from monocular images of indoor scenes. This has been achieved thanks to the development of convolutional neural network architectures for the task and the availability of large amounts of training data usually obtained with the help of active depth sensors. Unfortunately, it is much harder to obtain large image sets outdoors partly due to limited range of active sensors. Therefore, there is a need to develop techniques that transfer features learned from the indoor dataset to segmentation of outdoor images. We propose such an approach that does not require manual annotations on the outdoor datasets. Instead, we exploit a network trained on indoor images and an automatically reconstructed point cloud to estimate the training ground truth on the outdoor images in an energy minimization framework. We show that the resulting ground truth estimate is good enough to improve the network weights. Moreover, the process can be repeated multiple times to further improve plane detection and segmentation accuracy on monocular images of outdoor scenes.
- Published
- 2021
218. SportsCap: Monocular 3D Human Motion Capture and Fine-Grained Understanding in Challenging Sports Videos
- Author
-
Lan Xu, Wei Yang, Jingyi Yu, Anqi Pang, Xin Chen, and Yuexin Ma
- Subjects
Monocular ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Motion capture ,Motion (physics) ,Artificial Intelligence ,Pattern recognition (psychology) ,Graph (abstract data type) ,Embedding ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Classifier (UML) ,Software ,Block (data storage) - Abstract
Markerless motion capture and understanding of professional non-daily human movements is an important yet unsolved task, which suffers from complex motion patterns and severe self-occlusion, especially for the monocular setting. In this paper, we propose SportsCap—the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input. Our approach utilizes the semantic and temporally structured sub-motion prior in the embedding space for motion capture and understanding in a data-driven multi-task manner. To enable robust capture under complex motion patterns, we propose an effective motion embedding module to recover both the implicit motion embedding and explicit 3D motion details via a corresponding mapping function as well as a sub-motion classifier. Based on such hybrid motion information, we introduce a multi-stream spatial-temporal graph convolutional network to predict the fine-grained semantic action attributes, and adopt a semantic attribute mapping block to assemble various correlated action attributes into a high-level action label for the overall detailed understanding of the whole sequence, so as to enable various applications like action assessment or motion scoring. Comprehensive experiments on both public and our proposed datasets show that with a challenging monocular sports video input, our novel approach not only significantly improves the accuracy of 3D human motion capture, but also recovers accurate fine-grained semantic action attribute.
- Published
- 2021
219. MULTI-TASK LEARNING FROM FIXED-WING UAV IMAGES FOR 2D/3D CITY MODELLING
- Author
-
Mehdi Khoshboresh-Masouleh and Mohammad R. Bayanlou
- Subjects
FOS: Computer and information sciences ,Technology ,Computer Science - Artificial Intelligence ,Generalization ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Point cloud ,Multi-task learning ,Machine learning ,computer.software_genre ,Annotation ,Segmentation ,Applied optics. Photonics ,Monocular ,Artificial neural network ,business.industry ,Engineering (General). Civil engineering (General) ,TA1501-1820 ,Artificial Intelligence (cs.AI) ,Artificial intelligence ,TA1-2040 ,business ,computer ,Change detection - Abstract
Single-task learning in artificial neural networks will be able to learn the model very well, and the benefits brought by transferring knowledge thus become limited. In this regard, when the number of tasks increases (e.g., semantic segmentation, panoptic segmentation, monocular depth estimation, and 3D point cloud), duplicate information may exist across tasks, and the improvement becomes less significant. Multi-task learning has emerged as a solution to knowledge-transfer issues and is an approach to scene understanding which involves multiple related tasks each with potentially limited training data. Multi-task learning improves generalization by leveraging the domain-specific information contained in the training data of related tasks. In urban management applications such as infrastructure development, traffic monitoring, smart 3D cities, and change detection, automated multi-task data analysis for scene understanding based on the semantic, instance, and panoptic annotation, as well as monocular depth estimation, is required to generate precise urban models. In this study, a common framework for the performance assessment of multi-task learning methods from fixed-wing UAV images for 2D/3D city modelling is presented.
- Published
- 2021
220. GABAergic inhibition in the human visual cortex relates to eye dominance
- Author
-
I. Betina Ip, Uzay E. Emir, Andrew Parker, Claudia Lunghi, Holly Bridge, Laboratoire des systèmes perceptifs (LSP), Département d'Etudes Cognitives - ENS Paris (DEC), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Male ,Magnetic Resonance Spectroscopy ,genetic structures ,medicine.medical_treatment ,Striate cortex ,030218 nuclear medicine & medical imaging ,0302 clinical medicine ,GABAergic Neurons ,gamma-Aminobutyric Acid ,media_common ,Visual Cortex ,Vision, Binocular ,Multidisciplinary ,medicine.diagnostic_test ,Reciprocal inhibition ,Magnetic Resonance Imaging ,Dominance, Ocular ,Dominance (ethology) ,medicine.anatomical_structure ,Cerebral cortex ,Medicine ,Sensory processing ,Female ,Adult ,media_common.quotation_subject ,Science ,Biology ,Inhibitory postsynaptic potential ,Article ,Ocular dominance ,Young Adult ,03 medical and health sciences ,Perception ,medicine ,Humans ,Monocular ,[SCCO.NEUR]Cognitive science/Neuroscience ,Neural Inhibition ,eye diseases ,Oxygen ,Visual cortex ,sense organs ,Visual system ,Functional magnetic resonance imaging ,Neuroscience ,Binocular vision ,Photic Stimulation ,030217 neurology & neurosurgery - Abstract
Our binocular world is seamlessly assembled from two retinal images that remain segregated until the cerebral cortex. Despite the coherence of this input, there is often an imbalance between the strength of these connections in the brain. ‘Eye dominance’ provides a measure of the perceptual dominance of one eye over the other. Theoretical models suggest that eye dominance is related to reciprocal inhibition between monocular units in the primary visual cortex, the first location where the binocular input is combined. As the specific inhibitory interactions in the binocular visual system critically depend on the presence of visual input, we sought to test the role of inhibition by measuring the concentrations of inhibitory (GABA) neurotransmitters during monocular visual stimulation of the dominant and the non-dominant eye. GABA-levels were acquired in V1 using a combined functional magnetic resonance imaging (fMRI) and magnetic resonance spectroscopy (MRS) sequence on a 7-Tesla MRI scanner. Individuals with stronger eye dominance had a greater difference in GABAergic inhibition between the eyes. This relationship was present only when the visual system was actively processing sensory input and was not present at rest. We provide the first evidence that imbalances in GABA levels during ongoing sensory processing are related to eye dominance in the human visual cortex. This provides strong support to the view that intracortical inhibition underlies normal eye dominance.SIGNIFICANCE STATEMENTWhat we see is shaped by excitation and inhibition in our brain. We investigated how eye dominance, the perceptual preference of one eye’s input over the other, is related to levels of inhibitory neurotransmitter GABA during monocular visual stimulation. GABAergic inhibition is related to eye dominance, but only when the visual system is actively processing sensory input. This provides key support for the view that imbalances in visual competition that are observed in the normal visual system arise from an inability of GABA signalling to suppress the stronger sensory representation.
- Published
- 2021
221. Nearby contours abolish the binocular advantage
- Author
-
Maria Lev, Jian Ding, Dennis M. Levi, and Uri Polat
- Subjects
genetic structures ,Vision ,Computer science ,media_common.quotation_subject ,Science ,Context (language use) ,Models, Biological ,Luminance ,Article ,Multiplicative noise ,Contrast Sensitivity ,Low contrast ,Models ,Clinical Research ,Global configuration ,Psychology ,Humans ,Contrast (vision) ,Computer vision ,media_common ,Vision, Binocular ,Multidisciplinary ,Monocular ,business.industry ,Detection threshold ,Biological ,Binocular ,eye diseases ,Sensory Thresholds ,Medicine ,Artificial intelligence ,business ,Neuroscience - Abstract
That binocular viewing confers an advantage over monocular viewing for detecting isolated low luminance or low contrast objects, has been known for well over a century; however, the processes involved in combining the images from the two eyes are still not fully understood. Importantly, in natural vision, objects are rarely isolated but appear in context. It is well known that nearby contours can either facilitate or suppress detection, depending on their distance from the target and the global configuration. Here we report that at close distances collinear (but not orthogonal) flanking contours suppress detection more under binocular compared to monocular viewing, thus completely abolishing the binocular advantage, both at threshold and suprathreshold levels. In contrast, more distant flankers facilitate both monocular and binocular detection, preserving a binocular advantage up to about four times the detection threshold. Our results for monocular and binocular viewing, for threshold contrast discrimination without nearby flankers, can be explained by a gain control model with uncertainty and internal multiplicative noise adding additional constraints on detection. However, in context with nearby flankers, both contrast detection threshold and suprathreshold contrast appearance matching require the addition of both target-to-target and flank-to-target interactions occurring before the site of binocular combination. To test an alternative model, in which the interactions occur after the site of binocular combination, we performed a dichoptic contrast matching experiment, with the target presented to one eye, and the flanks to the other eye. The two models make very different predictions for abutting flanks under dichoptic conditions. Interactions after the combination site predict that the perceived contrast of the flanked target will be strongly suppressed, while interactions before the site predict the perceived contrast will be more or less veridical. The data are consistent with the latter model, strongly suggesting that the interactions take place before the site of binocular combination.
- Published
- 2021
222. A variational approach for estimation of monocular depth and camera motion in autonomous driving
- Author
-
Chuan Hu, Xuetao Zhang, and Huijuan Hu
- Subjects
Monocular ,business.industry ,Computer science ,Mechanical Engineering ,3D reconstruction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,Aerospace Engineering ,020206 networking & telecommunications ,02 engineering and technology ,Motion (physics) ,Computer Science::Computer Vision and Pattern Recognition ,0202 electrical engineering, electronic engineering, information engineering ,Structure from motion ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
In this paper, a new direct computational approach to dense 3D reconstruction in autonomous driving is proposed to simultaneously estimate the depth and the camera motion for the motion stereo problem. A traditional Structure from Motion framework is utilized to establish geometric constrains for our variational model. The architecture is mainly composed of the texture constancy constraint, one-order motion smoothness constraint, a second-order depth regularize constraint and a soft constraint. The texture constancy constraint can improve the robustness against illumination changes. One-order motion smoothness constraint can reduce the noise in estimation of dense correspondence. The depth regularize constraint is used to handle inherent ambiguities and guarantee a smooth or piecewise smooth surface, and the soft constraint can provide a dense correspondence as initial estimation of the camera matrix to improve the robustness future. Compared to the traditional dense Structure from Motion approaches and popular stereo approaches, our monocular depth estimation results are more accurate and more robust. Even in contrast to the popular depth from single image networks, our variational approach still has good performance in estimation of monocular depth and camera motion.
- Published
- 2021
223. Realtime Object-aware Monocular Depth Estimation in Onboard Systems
- Author
-
Chung-Keun Lee, H. Jin Kim, Haram Kim, and Sangil Lee
- Subjects
Monocular ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Object (computer science) ,Object detection ,Computer Science Applications ,Control and Systems Engineering ,Bounding overwatch ,Feature (computer vision) ,Minimum bounding box ,Depth map ,Computer Science::Computer Vision and Pattern Recognition ,Computer vision ,Artificial intelligence ,Visual odometry ,business - Abstract
This paper proposes the object depth estimation in real-time, using only a monocular camera in an onboard computer with a low-cost GPU. Our algorithm estimates scene depth from a sparse feature-based visual odometry algorithm and detects/tracks objects’ bounding box by utilizing the existing object detection algorithm in parallel. Both algorithms share their results, i.e., feature, motion, and bounding boxes, to handle static and dynamic objects in the scene. We validate the scene depth accuracy of sparse features with KITTI and its ground-truth depth map made from LiDAR observations quantitatively, and the depth of detected object with the Hyundai driving datasets and satellite maps qualitatively. We compare the depth map of our algorithm with the result of (un-) supervised monocular depth estimation algorithms. The validation shows that our performance is comparable to that of monocular depth estimation algorithms which train depth indirectly (or directly) from stereo image pairs (or depth image), and better than that of algorithms trained with monocular images only, in terms of the error and the accuracy. Also, we confirm that our computational load is much lighter than the learning-based methods, while showing comparable performance.
- Published
- 2021
224. Transferring knowledge from monocular completion for self-supervised monocular depth estimation
- Author
-
Lin Sun, Jie Zhu, Zhe Zhang, Bingzheng Liu, Liying Xu, and Yi Li
- Subjects
Monocular ,Computer Networks and Communications ,business.industry ,Computer science ,Supervised learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Task (project management) ,Hardware and Architecture ,Feature (computer vision) ,Measured depth ,Media Technology ,Leverage (statistics) ,Segmentation ,Computer vision ,Artificial intelligence ,Image warping ,business ,Software ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Monocular depth estimation is a very challenging task in computer vision, with the goal to predict per-pixel depth from a single RGB image. Supervised learning methods require large amounts of depth measurement data, which are time-consuming and expensive to obtain. Self-supervised methods are showing great promise, exploiting geometry to provide supervision signals through image warping. Moreover, several works leverage on other visual tasks (e.g. stereo matching and semantic segmentation) to further advance self-supervised monocular depth estimation. In this paper, we propose a novel framework utilizing monocular depth completion as an auxiliary task to assist monocular depth estimation. In particular, a knowledge transfer strategy is employed to enable monocular depth estimation to benefit from the effective feature representations learned by monocular depth completion task. The correlation between monocular depth completion and monocular depth estimation could be fully and effectively utilized in this framework. Only unlabeled stereo images are used in the proposed framework, which achieves a self-supervised learning paradigm. Experimental results on publicly available dataset prove that the proposed approach achieves superior performance to state-of-the-art self-supervised methods and comparable performance with supervised methods.
- Published
- 2021
225. RGB+D and deep learning-based real-time detection of suspicious event in Bank-ATMs
- Author
-
Pushpajit A. Khaire and Praveen Kumar
- Subjects
Monocular ,business.industry ,Computer science ,Event (computing) ,Deep learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer graphics ,Pattern recognition (psychology) ,RGB color model ,Statistical analysis ,Computer vision ,Artificial intelligence ,Unavailability ,business ,Information Systems - Abstract
Real-time detection of human activities has become very important in terms of surveillance and security of Bank-Automated Teller Machines (ATMs), public offices because of the day-to-day increase in criminal activities. The current way of monitoring such constrained environments is done through monocular CCTV cameras which capture only RGB video. The RGB+D sensor provides depth data of the scene in addition to RGB data. To address the problem of online detection of abnormal activities in Bank ATMs, we propose a supervised deep learning framework based on multi-stream CNNs and RGB+D sensor. From the online video stream of RGB+D data, motion templates are created from RGB and depth video segments and then trained on CNNs to detect a suspicious event in ongoing activity. Moreover, due to the unavailability of any dataset for analyzing human activities in ATMs, we also contributed a novel RGB+D dataset in this paper. The proposed deep learning-based framework is evaluated on qualitative and quantitative statistical evaluation parameters and detect suspicious event with the precision of 0.932 and accuracy of 94.2%. Detailed statistical analysis of results shows that the proposed framework can detect the suspicious event in a real-time online manner before the abnormal activity gets completed.
- Published
- 2021
226. VIR-SLAM: visual, inertial, and ranging SLAM for single and multi-robot systems
- Author
-
Yanjun Cao and Giovanni Beltrame
- Subjects
Monocular ,Inertial frame of reference ,business.industry ,Computer science ,Ranging ,Pipeline (software) ,Computer Science::Robotics ,Transformation matrix ,Transformation (function) ,Odometry ,Artificial Intelligence ,Robot ,Computer vision ,Artificial intelligence ,business - Abstract
Monocular cameras coupled with inertial measurements generally give high performance visual inertial odometry. However, drift can be significant with long trajectories, especially when the environment is visually challenging. In this paper, we propose a system that leverages Ultra–WideBand (UWB) ranging with one static anchor placed in the environment to correct the accumulated error whenever the anchor is visible. We also use this setup for collaborative SLAM: different robots use mutual ranging (when available) and the common anchor to estimate the transformation between each other, facilitating map fusion. Our system consists of two modules: a double layer ranging, visual, and inertial odometry for single robots, and a transformation estimation module for collaborative SLAM. We test our system on public datasets by simulating UWB measurements as well as on real robots in different environments. Experiments validate our system and show our method can outperform pure visual-inertial odometry by more than 20%, and in visually challenging environments, our method works even when the visual-inertial pipeline has significant drift. Furthermore, we can compute the inter-robot transformation matrices for collaborative SLAM at almost no extra computation cost.
- Published
- 2021
227. Hybrid Monocular SLAM Using Double Window Optimization
- Author
-
Hang Luo, Eduard Reithmeier, and Christian Pape
- Subjects
0209 industrial biotechnology ,Control and Optimization ,Computer science ,Feature extraction ,Biomedical Engineering ,Initialization ,02 engineering and technology ,Simultaneous localization and mapping ,020901 industrial engineering & automation ,Artificial Intelligence ,Robustness (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,Pose ,Monocular ,business.industry ,Mechanical Engineering ,Computer Science Applications ,Visualization ,Human-Computer Interaction ,Control and Systems Engineering ,Feature (computer vision) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business - Abstract
This letter presents a hybrid framework, both in front-end and back-end, for monocular simultaneous localization and mapping (SLAM), capable of utilizing the robustness of feature matching and the accuracy of direct alignment. In the front-end, the feature-based method is first used for coarse pose estimation that is subsequently taken by the direct alignment module as initialization for further refinement. In the back-end, a double window structure is constructed based on the maintained semi-dense map and the sparse feature map, of which the states are optimized via a multi-layer optimization scheme based on the reprojection constraints and the relative pose constraints. Our evaluation on several public datasets demonstrates that this hybrid design retains the superior resilience to scene variations of salient features, and achieves better tracking accuracy due to the integration of the direct modules, leading to a comparable performance with the state-of-the-arts.
- Published
- 2021
228. Results of Using Alternating Presentation of Stereostimuli in Children with Functional Scotoma in Non-Paralytic Strabismus
- Author
-
S. I. Rychkova and V. G. Likhvantseva
- Subjects
medicine.medical_specialty ,genetic structures ,media_common.quotation_subject ,Audiology ,050105 experimental psychology ,functional scotoma ,03 medical and health sciences ,0302 clinical medicine ,Perception ,medicine ,0501 psychology and cognitive sciences ,Strabismus ,stereovision ,Mathematics ,media_common ,Monocular ,Blind spot ,05 social sciences ,RE1-994 ,eye diseases ,strabismus ,alternating stimuli presentation ,Left eye ,Interval (music) ,Ophthalmology ,Duration (music) ,030221 ophthalmology & optometry ,sense organs ,Monocular vision - Abstract
The work is devoted to one of the actual problems of modern strabismology — the study of the ability to stereo perception in children with non-paralytic strabismus.Purpose: to study the capability to stereovision with alternating presentation of stereostimuli in children with functional scotoma in non-paralytic strabismus.Patients and methods. 113 children with functional scotoma (FS) in non-paralytic strabismus were observed. We used stereostimuli with different characteristics in the following regimes of presentation: 1) the regime of simple monocular alternating (alternate presentation of an image for the right eye and the left eye); 2) the regime having an “empty” interval (black background) between monocular phases; 3) the regime having a binocular phase (a binocular image containing details corresponding to the stimuli for the right eye and the left eye) between monocular phases.Results. It was found that in 23 (20,3 %) children, the ability to stereo perception was completely absent. All these children had stable total FS (monocular vision). In the remaining 90 children (with unstable or regional FS), the ability to stereo perception was shown with some stimuli in some modes of their alternating presentation. For stimuli with a central arrangement of linear parts, the stereo effect was possible when they were presented in an alternating mode with an “empty” interval lasting from 20 to 70 ms in combination with the duration of monocular phases from 30 to 90 ms. For stimuli with a peripheral arrangement of linear elements, 22.1 % of children were able to stereo perception not only in the “empty” interval mode, but also in the simple alternation mode. At the same time, the greatest number of children capable of stereo perception was detected when using the mode with an “empty” interval of 30–60 ms and a duration of monocular phases of 40–60 ms. With random-dot stimuli, none of the children in this group were capable of stereo perception.Conclusion. Our results suggest that even in patients with FS in non-paralytic strabismus, stereo perception is possible under the conditions of alternating presentation of stimuli with certain characteristics. In this case, the most likely appearance of a stereo effect with stimuli containing peripheral linear elements that create a stereo effect when presented in an alternating mode with an empty interval between monocular phases.
- Published
- 2021
229. Athletes Demonstrate Superior Dynamic Visual Acuity
- Author
-
Benjamin Thompson, Alan Yee, Kristine Dalton, and Elizabeth L. Irving
- Subjects
medicine.medical_specialty ,Refractive error ,Monocular ,Visual acuity ,genetic structures ,biology ,Athletes ,Visual Acuity ,Emmetropia ,Audiology ,Refractive Errors ,medicine.disease ,biology.organism_classification ,Pursuit, Smooth ,Smooth pursuit ,Ophthalmology ,Post-hoc analysis ,medicine ,Humans ,medicine.symptom ,Psychology ,Video game ,Optometry - Abstract
Athletes exhibit better dynamic visual acuity (DVA) compared with nonathletes, whereas action video game players (VGPs) perform more similarly to controls despite having similar static visual acuity and refractive errors. The differences in DVA between groups were not related to differences in static visual acuity, refractive error, or smooth pursuit gain.The purpose of the study was to examine whether athletes and VGPs have superior DVA than controls (nonathletes, nongamers).Forty-six participants (15 athletes, 11 VGPs, 20 controls) aged 21.7 years (standard deviation, 2.8 years) were recruited. Participants were emmetropic with equivalent monocular and binocular static visual acuity between groups. Dynamic visual acuity was assessed using predictable (horizontal) and unpredictable (random) motion targets at velocities of 5, 10, 20, and 30°/s. Smooth pursuit eye movements were assessed using a horizontal motion step-ramp stimulus at the same speeds. This study was pre-registered with the Center for Open Science (https://osf.io/eu7qc).At 30°/s, there were significant main effects of group (F = 4.762, P = .01) and motion type (F = 9.538, P = .004). Tukey post hoc analysis for groups indicated that athletes performed better than did the control group (t = -2.919, P.02). An omnibus (group × motion type × speed) repeated measures ANOVA showed a main effect of speed (F = 110.137, P.001) and a speed × motion-type interaction (F = 27.825, P.001). Dynamic visual acuity decreased as speed increased, and the slope of the change was greater for random than for horizontal motion. Smooth pursuit gains were not significantly different between groups (P.05).Athletes have superior dynamic visual acuity performance compared with controls at 30°/s. This between-group difference cannot be fully explained by differences in smooth pursuit eye movements and therefore may reflect other differences between the groups.
- Published
- 2021
230. Polarimetric Monocular Dense Mapping Using Relative Deep Depth Prior
- Author
-
Shing Yang Loo, Moein Shakeri, Hong Zhang, and Kangkang Hu
- Subjects
Control and Optimization ,Monocular ,Computer science ,business.industry ,Mechanical Engineering ,Biomedical Engineering ,Polarimetry ,020207 software engineering ,02 engineering and technology ,Iterative reconstruction ,Simultaneous localization and mapping ,Computer Science Applications ,Human-Computer Interaction ,Azimuth ,Artificial Intelligence ,Control and Systems Engineering ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Specular reflection ,business ,Normal ,Surface reconstruction - Abstract
This letter is concerned with polarimetric dense map reconstruction based on a polarization camera with the help of relative depth information as a prior. In general, polarization imaging is able to reveal information about surface normal such as azimuth and zenith angles, which can support the development of solutions to the problem of dense reconstruction, especially in texture-poor regions. However, polarimetric shape cues are ambiguous due to two types of polarized reflection (specular/diffuse). Although methods have been proposed to address this issue, they either are offline and therefore not practical in robotics applications, or use incomplete polarimetric cues, leading to sub-optimal performance. In this letter, we propose an online reconstruction method that uses full polarimetric cues available from the polarization camera. With our online method, we can propagate sparse depth values both along and perpendicular to iso-depth contours. Through comprehensive experiments on challenging image sequences, we demonstrate that our method is able to significantly improve the accuracy of the depthmap as well as increase its density, specially in regions of poor texture.
- Published
- 2021
231. GazeBase, a large-scale, multi-stimulus, longitudinal eye movement dataset
- Author
-
Evgeniy Abdulin, Oleg V. Komogortsev, Henry Griffith, and Dillon J. Lohr
- Subjects
Adult ,Male ,FOS: Computer and information sciences ,Statistics and Probability ,Data Descriptor ,Adolescent ,Eye Movements ,Computer science ,Science ,0211 other engineering and technologies ,Computer Science - Human-Computer Interaction ,02 engineering and technology ,Library and Information Sciences ,050105 experimental psychology ,Pupil ,Human-Computer Interaction (cs.HC) ,Education ,Task (project management) ,Young Adult ,Humans ,0501 psychology and cognitive sciences ,Computer vision ,Longitudinal Studies ,Eye-Tracking Technology ,021110 strategic, defence & security studies ,Monocular ,business.industry ,05 social sciences ,Eye movement ,Middle Aged ,Electrical and electronic engineering ,Computer Science Applications ,Reading ,Data quality ,Saccade ,Fixation (visual) ,Eye tracking ,Female ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,Information Systems - Abstract
This manuscript presents GazeBase, a large-scale longitudinal dataset containing 12,334 monocular eye-movement recordings captured from 322 college-aged participants. Participants completed a battery of seven tasks in two contiguous sessions during each round of recording, including a – (1) fixation task, (2) horizontal saccade task, (3) random oblique saccade task, (4) reading task, (5/6) free viewing of cinematic video task, and (7) gaze-driven gaming task. Nine rounds of recording were conducted over a 37 month period, with participants in each subsequent round recruited exclusively from prior rounds. All data was collected using an EyeLink 1000 eye tracker at a 1,000 Hz sampling rate, with a calibration and validation protocol performed before each task to ensure data quality. Due to its large number of participants and longitudinal nature, GazeBase is well suited for exploring research hypotheses in eye movement biometrics, along with other applications applying machine learning to eye movement signal analysis. Classification labels produced by the instrument’s real-time parser are provided for a subset of GazeBase, along with pupil area., Measurement(s) eye movement measurement Technology Type(s) eye tracking device Factor Type(s) round • participant Sample Characteristic - Organism Homo sapiens Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.14761866
- Published
- 2021
232. Utilization of Semantic Planes: Improved Localization and Dense Semantic Map for Monocular SLAM in Urban Environment
- Author
-
Bao Yaoqi, Yun Pan, Zhe Yang, and Ruohong Huan
- Subjects
Control and Optimization ,Monocular ,Artificial neural network ,Pixel ,Computer science ,business.industry ,Mechanical Engineering ,Biomedical Engineering ,Simultaneous localization and mapping ,Semantics ,Computer Science Applications ,Human-Computer Interaction ,Consistency (database systems) ,Odometry ,Artificial Intelligence ,Control and Systems Engineering ,Point (geometry) ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business - Abstract
In this letter, we propose a novel semantic direct monocular simultaneous localization and mapping (SLAM) system that fuses the semantic information obtained by an advanced deep neural network (DNN) into direct sparse odometry with loop closure(LDSO), with the purpose of improving the localization accuracy and building a dense semantic map of the urban environment. For localization, we apply a point reselection strategy based on coarse semantic plane (CSP) constraints to discard static points inconsistent with the nearby co-plane points of the same semantic class and dynamic points beyond the visible range. Moreover, a point group movement consistency (PGMC) check is utilized to decrease the impact of moving dynamic objects. For the dense semantic map, we model numerous small semantic planes from well-estimated points to measure the depth of each static pixel, rather than conduct stereo matching. Experimental results show that our method is more accurate than LDSO and comparable with ORB-SLAM in terms of localization. Moreover, it is capable of building a dense semantic map of the urban environment for better scene understanding.
- Published
- 2021
233. Results of Using Different Modes of Presentation of Stereostimuli in the Study of Stereo Vision in Normal Children and in Children with Non-Paralytic Strabismus without Functional Scotoma
- Author
-
S. I. Rychkova and V. G. Likhvantseva
- Subjects
medicine.medical_specialty ,Visual perception ,Monocular ,genetic structures ,Blind spot ,010401 analytical chemistry ,05 social sciences ,Audiology ,RE1-994 ,01 natural sciences ,050105 experimental psychology ,eye diseases ,strabismus ,0104 chemical sciences ,alternating stimuli presentation ,Functional Treatment ,Interval (music) ,Left eye ,Ophthalmology ,medicine ,0501 psychology and cognitive sciences ,Strabismus ,Psychology ,stereovision - Abstract
The work is devoted to one of the actual problems of current ophthalmology — creating effective methods of studying stereovision.The purpose — comparative analysis of the capability of stereoperception under conditions of using different regimes of alternating presentation of stereo stimuli with different characteristics in children with strabismus and in children without ophthalmopathology.Patients and methods. 294 school children — 167 children of the control group (without ophthalmopathology) and 127 children with non-paralytic strabismus without functional scotoma (FSS) were observed. We used stereostimuli with different characteristics in the following regimes of presentation: 1) the regime of simple monocular alternating (alternate presentation of an image for the right and left eye); 2) the regime having an “empty” interval (black background) between monocular phases; 3) the regime having a binocular phase (a binocular image containing details corresponding to the stimuli for the right eye and the left eye) between monocular phases.Results. It was found that the majority of children with non-paralytic strabismus, who are incapable of stereoperception with the classic Fly-test and Lang-test, can perceive the stereoeffect with alternating presentation of stereostimuli within individual ranges of durations of monocular phases, a binocular phase and an “empty” interval. In children of the control group when switching from the simple alternation regime to the “empty” interval regime the maximal durations of monocular phases, which preserved the stereoeffect, decreased and when switching to the binocular phase regime they significantly increased. In children with strabismus linear images are simpler for stereoperception than random-dot images as well as in children of the control group (p < 0.001); stimuli creating the effect of the frontoparallel separation of details get perceived better than those creating the decline effect or the turning effect (p < 0.001); stimuli creating the effect of the vertical stripes decline get perceived better than those creating the effect of the horizontal stripes turning (p < 0.001). However, as opposed to the children of the control group, in children with strabismus the stereoeffect gets formed better under conditions of the peripheral localization of linear details than under conditions of the central one.Conclusion. Using computer programs with different regimes of alternating presentation of stereostimuli with certain characteristics allows to effectively evaluate individual capability of stereoperception which is necessary for the personalized approach to the selection of visual stimuli and stimuli presentation regimes in functional treatment of patients with non-paralytic strabismus.
- Published
- 2021
234. Research on the Monocular Ranging Method of the Leading Vehicle in Multi-weather
- Author
-
Yong Tian, Gongrou Fu, Shuman Guo, Junkai Guo, Quancai Li, and Shichang Wang
- Subjects
Monocular ,Computer science ,business.industry ,Pixel mapping ,Ranging ,Computer vision ,Artificial intelligence ,business - Abstract
In order to improve the accuracy of the monocular distance measurement of the vehicle in front under sunny, cloudy, rainy, snowy, and foggy weather, an improved pixel-mapping monocular distance measurement method is proposed. This method is based on eight-connected domains to detect the front vehicle, obtain the line pixels of the target vehicle in the image, and fit the image line pixels to the corresponding real longitudinal distance function, and combine the fitted function with the internal and external parameters of the camera. An improved pixel-mapping monocular ranging model is obtained. Set up a test environment under different weather to verify the feasibility of the algorithm. The results show that in the four environments, the detectable distances are within 70m, 60m, 30m, and 40m respectively; the error of the improved pixel-mapping monocular ranging method is reduced by 0.6% on average compared with before the improvement, up to 0.92% ; The improved algorithm ranging errors under the four weathers are 1.8513%, 2.6987%, 4.0137%, and 2.5795% respectively, which achieves the purpose of improving the accuracy of the monocular distance measurement of the vehicle in front under multiple weather conditions.
- Published
- 2021
235. Self-supervised monocular depth estimation based on image texture detail enhancement
- Author
-
Huan-huan Wu, Chunxia Xiao, Li Wenjie, Fei Luo, Shenjie Zheng, and Yuanzhen Li
- Subjects
Monocular ,Artificial neural network ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Texture (music) ,Computer Graphics and Computer-Aided Design ,View synthesis ,Image (mathematics) ,Computer graphics ,Image texture ,Computer Science::Computer Vision and Pattern Recognition ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Focus (optics) ,business ,Software - Abstract
We present a new self-supervised monocular depth estimation method with multi-scale texture detail enhancement. Based on the observation that the image texture detail and the semantic information have essential significance on the depth estimation, we propose to provide them to the network to learn more sharpness and structural integrity of depth. Firstly, we generate the filtered images and detail images by multi-scale decomposition and use a deep neural network to automatically learn their weights to construct the texture detail enhanced image. Then, we consider the semantic features by putting deep features from the VGG-19 network into a self-attention network, guide the depth decoder network to focus on the integrity of objects in the scene. Finally, we propose a scale-invariant smooth loss to improve the structural integrity of the predicted depth. We evaluate our method on the KITTI 2015 and Make3D datasets and apply the predicted depth to novel view synthesis. The experimental results show that it has achieved satisfactory results compared with the existing methods.
- Published
- 2021
236. Unsupervised Scale-Consistent Depth Learning from Video
- Author
-
Huangying Zhan, Ian Reid, Jia-Wang Bian, Le Zhang, Zhichao Li, Chunhua Shen, Ming-Ming Cheng, and Naiyan Wang
- Subjects
FOS: Computer and information sciences ,Monocular ,Source code ,business.industry ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,media_common.quotation_subject ,Computer Science - Computer Vision and Pattern Recognition ,Inference ,Pattern recognition ,02 engineering and technology ,Tracking (particle physics) ,Consistency (database systems) ,Artificial Intelligence ,Component (UML) ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Scale (map) ,business ,Software ,media_common - Abstract
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time. Our contributions include: (i) we propose a geometry consistency loss, which penalizes the inconsistency of predicted depths between adjacent views; (ii) we propose a self-discovered mask to automatically localize moving objects that violate the underlying static scene assumption and cause noisy signals during training; (iii) we demonstrate the efficacy of each component with a detailed ablation study and show high-quality depth estimation results in both KITTI and NYUv2 datasets. Moreover, thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system for more robust and accurate tracking. The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training. Finally, we provide several demos for qualitative evaluation., Comment: Accept to IJCV. The source code is available at https://github.com/JiawangBian/SC-SfMLearner-Release
- Published
- 2021
237. Underwater Depth Estimation for Spherical Images
- Author
-
Lei Jin, Sören Schwertfeger, Jiadi Cui, Haofei Kuang, and Qingwen Xu
- Subjects
Ground truth ,Monocular ,Article Subject ,General Computer Science ,business.industry ,Computer science ,020207 software engineering ,Robotics ,02 engineering and technology ,Stereopsis ,Control and Systems Engineering ,TJ1-1570 ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,RGB color model ,020201 artificial intelligence & image processing ,Computer vision ,Mechanical engineering and machinery ,Artificial intelligence ,Underwater ,Scale (map) ,business - Abstract
This paper proposes a method for monocular underwater depth estimation, which is an open problem in robotics and computer vision. To this end, we leverage publicly available in-air RGB-D image pairs for underwater depth estimation in the spherical domain with an unsupervised approach. For this, the in-air images are style-transferred to the underwater style as the first step. Given those synthetic underwater images and their ground truth depth, we then train a network to estimate the depth. This way, our learning model is designed to obtain the depth up to scale, without the need of corresponding ground truth underwater depth data, which is typically not available. We test our approach on style-transferred in-air images as well as on our own real underwater dataset, for which we computed sparse ground truth depths data via stereopsis. This dataset is provided for download. Experiments with this data against a state-of-the-art in-air network as well as different artificial inputs show that the style transfer as well as the depth estimation exhibit promising performance.
- Published
- 2021
238. Boosting unsupervised monocular depth estimation with auxiliary semantic information
- Author
-
Nan Gao, Hui Ren, and Jia Li
- Subjects
Boosting (machine learning) ,Monocular ,Computer Networks and Communications ,Computer science ,business.industry ,Feature extraction ,02 engineering and technology ,Image segmentation ,Semantics ,Machine learning ,computer.software_genre ,Visualization ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,020201 artificial intelligence & image processing ,Segmentation ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer - Abstract
Learning-based multi-task models have been widely used in various scene understanding tasks, and complement each other, i.e., they allow us to consider prior semantic information to better infer depth. We boost the unsupervised monocular depth estimation using semantic segmentation as an auxiliary task. To address the lack of cross-domain datasets and catastrophic forgetting problems encountered in multi-task training, we utilize existing methodology to obtain redundant segmentation maps to build our cross-domain dataset, which not only provides a new way to conduct multi-task training, but also helps us to evaluate results compared with those of other algorithms. In addition, in order to comprehensively use the extracted features of the two tasks in the early perception stage, we use a strategy of sharing weights in the network to fuse cross-domain features, and introduce a novel multi-task loss function to further smooth the depth values. Extensive experiments on KITTI and Cityscapes datasets show that our method has achieved state-of-the-art performance in the depth estimation task, as well improved semantic segmentation.
- Published
- 2021
239. Projection Invariant Feature and Visual Saliency-Based Stereoscopic Omnidirectional Image Quality Assessment
- Author
-
Yang Zhou, Yo-Sung Ho, Yun Zhang, Na Li, Xu Wang, and Xuemei Zhou
- Subjects
Monocular ,Image quality ,business.industry ,Computer science ,Distortion (optics) ,Scale-invariant feature transform ,020206 networking & telecommunications ,Stereoscopy ,02 engineering and technology ,law.invention ,law ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Chrominance ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,Invariant (mathematics) ,Projection (set theory) ,business - Abstract
In this article, we propose a quality assessment model-based on the projection invariant feature and the visual saliency for Stereoscopic Omnidirectional Images (SOIs). Firstly, the projection invariant monocular and binocular features of SOI are derived from the Scale-Invariant Feature Transform (SIFT) points to tackle the inconsistency between the stretched projection formats and the viewports. Secondly, the visual saliency model, which combines the chrominance and contrast perceptual factors, is used to facilitate the prediction accuracy. Thirdly, according to the characteristics of the panoramic image, we generate the weight map and utilize it as a location prior, which can be adapted to different projection formats. Finally, the proposed SOI quality assessment model fuses the projection invariant features, visual saliency, and location prior. Experimental results on both the NingBo University SOI Database (NBU-SOID) and Stereoscopic OmnidirectionaL Image quality assessment Database (SOLID) demonstrate the proposed metric on equi-rectangular projection format outperforms the state-of-the-art schemes, the pearson linear correlation coefficient and spearman rank order correlation coefficient performance are 0.933 and 0.933 on SOLID, and 0.907 and 0.910 on NBU-SOID, respectively. Meanwhile, the proposed algorithm is extended to another five representative projection formats and achieves superior performance.
- Published
- 2021
240. Monocular Diplopia: An Optical Correction Modality
- Author
-
Haile Woretaw Alemu and Preetam Kumar
- Subjects
Refractive error ,medicine.medical_specialty ,genetic structures ,medicine.medical_treatment ,Intraocular lens ,Case Report ,Trauma ,Ophthalmology ,Cornea ,medicine ,Diplopia ,Monocular Diplopia ,Monocular ,Corectopia ,business.industry ,Irregular pupil ,Contact lens ,RE1-994 ,medicine.disease ,Anisocoria ,eye diseases ,medicine.anatomical_structure ,sense organs ,medicine.symptom ,business - Abstract
Post-surgical or traumatic corectopia is among the rare causes of monocular diplopia. A 26-years-old student presented to the Institute with a complaint of monocular double vision in the left eye. He had a penetrating ocular injury in the left eye and subsequently, undergone for multiple ocular surgeries. Following the final intraocular lens implantation, he experienced a monocular double vision in his left eye. Upon contact lens clinic presentation, visual acuities were 20/20 in the right and 20/320 in the left eye (improved to 20/25 with pinhole). Slit-lamp examination on the left eye revealed scarring in the superior nasal quadrant of the cornea, irregular mid-dilated pupil with exposed aphakic and pseudophakic portions. A range of different optical management options were implemented to eliminate monocular diplopia and to correct refractive error. Finally, a combination of prosthetic soft contact lens and spectacle correction was able to remove diplopia and provide binocular single vision.
- Published
- 2021
241. Unsupervised-Learning-Based Continuous Depth and Motion Estimation With Monocular Endoscopy for Virtual Reality Minimally Invasive Surgery
- Author
-
Ling Li, Shuai Ding, Xiaojian Li, Xi Zheng, Shanlin Yang, and Alireza Jolfaei
- Subjects
Monocular ,business.industry ,Computer science ,020208 electrical & electronic engineering ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Virtual reality ,Motion (physics) ,Computer Science Applications ,Control and Systems Engineering ,Robustness (computer science) ,Motion estimation ,0202 electrical engineering, electronic engineering, information engineering ,Unsupervised learning ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Fusion mechanism ,Information Systems - Abstract
Three-dimensional display and virtual reality technology have been applied in minimally invasive surgery to provide doctors with a more immersive surgical experience. One of the most popular systems based on this technology is the Da Vinci surgical robot system. The key to build the in vivo 3-D virtual reality model with a monocular endoscope is an accurate estimation of depth and motion. In this article, a fully unsupervised learning method for depth and motion estimation using the continuous monocular endoscopic video is proposed. After the detection of highlighted regions, EndoMotionNet and EndoDepthNet are designed to estimate ego-motion and depth, respectively. The timing information between consecutive frames is considered with a long short-term memory layer by EndoMotionNet to enhance the accuracy of ego-motion estimation. The estimated depth value of the previous frame is used to estimate the depth of the next frame by EndoDepthNet with a multimode fusion mechanism. The custom loss function is defined to improve the robustness and accuracy of the proposed unsupervised-learning-based method. Experiments with the public datasets verify that the proposed unsupervised-learning-based continuous depth and motion estimation method can effectively improve the accuracy of depth and motion estimation, especially after processing the continuous frame.
- Published
- 2021
242. Real-Time Model-Based Monocular Pose Tracking for an Asteroid by Contour Fitting
- Author
-
Jia Liu, Chang Liu, Rongliang Chen, Wulong Guo, and Weiduo Hu
- Subjects
020301 aerospace & aeronautics ,Monocular ,Computer science ,business.industry ,Template matching ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Aerospace Engineering ,02 engineering and technology ,Solid modeling ,Tracking (particle physics) ,Extended Kalman filter ,0203 mechanical engineering ,Asteroid ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Pose - Abstract
This article presents an innovative framework of real-time vision-based pose tracking for asteroid using the contour information of the asteroid image. At the first-time instant, the tracking is initialized by the distance-based template matching and contour-based pose optimization. Subsequently, at each time instant, with the prediction of the extended Kalman filter (EKF) as initial guess, the pose of the asteroid is obtained in real time by geometrically fitting the contour of the projected asteroid CAD model over the contour of the asteroid image with M-estimation. The variance of the pose is calculated based on 1-order approximation inference, which enables EKF to generate the final pose estimation and predict the pose at the next time instant. Sufficient experiments validate the accuracy and the efficiency of the proposed method.
- Published
- 2021
243. Monocular Depth Estimation Using Information Exchange Network
- Author
-
Wenzhen Yang, Zengfu Wang, Quan Zhou, Wen Su, and Haifeng Zhang
- Subjects
050210 logistics & transportation ,Monocular ,Channel (digital image) ,Computer science ,business.industry ,Mechanical Engineering ,05 social sciences ,Context (language use) ,Pattern recognition ,Semantics ,Convolutional neural network ,Computer Science Applications ,Feature (computer vision) ,0502 economics and business ,Automotive Engineering ,Artificial intelligence ,Representation (mathematics) ,business ,Information exchange - Abstract
Depth estimation from single monocular image attracts increasing attention in autonomous driving and computer vision. While most existing approaches regress depth values or classify depth labels based on features extracted from limited image area, the resulting depth maps are still perceptually unsatisfying. Neither local context nor low-level semantic information is sufficient to predict depth. Learning based approaches suffer from inherent defects of supervision signals. This paper addresses monocular depth estimation with a general information exchange convolutional neural network. We maintain a high-resolution prediction throughout the network. Meanwhile, both low-resolution features capturing long-range context and fine-grained features describing local context can be refined with information exchange path stage by stage. Mutual channel attention mechanism is applied to emphasize interdependent feature maps and improve the feature representation of specific semantics. The network is trained under the supervision of improved log-cosh and gradient constraints so that the abnormal predictions have less impacts and the estimation can be consistent in high order. The results of ablation studies verify the efficiency of every proposed components. Experiments on the popular indoor and street-view datasets show competitive results compared with the recent state-of-the-art approaches.
- Published
- 2021
244. Monocular Patching Attenuates Vertical Nystagmus in Wernicke's Encephalopathy via Release of Activity in Subcortical Visual Pathways
- Author
-
Björn Machner, Christoph Helmchen, Andreas Sprenger, and David S. Zee
- Subjects
Wernicke's encephalopathy ,Monocular ,business.industry ,Visual system ,medicine.disease ,monocular viewing ,Clinical Vignette ,Neurology ,subcortical visual pathways ,Clinical Vignettes ,Vertical nystagmus ,Medicine ,Neurology (clinical) ,business ,Neuroscience - Published
- 2021
245. Cue-dependent effects of VR experience on motion-in-depth sensitivity
- Author
-
Jacqueline M. Fulvio, Mohan Ji, Lowell Thompson, Bas Rokers, and Ari Rosenberg
- Subjects
Man-Computer Interface ,Vision Disparity ,Eye Movements ,genetic structures ,Vision ,Physiology ,Visual System ,Sensory Physiology ,Motion Perception ,Social Sciences ,Audiology ,Computer Architecture ,0302 clinical medicine ,Medicine and Health Sciences ,Psychophysics ,Psychology ,media_common ,Vision, Binocular ,Multidisciplinary ,05 social sciences ,Virtual Reality ,Sensory Systems ,Medicine ,Engineering and Technology ,Sensory Perception ,Anatomy ,Cues ,Research Article ,Computer and Information Sciences ,medicine.medical_specialty ,Science ,media_common.quotation_subject ,050105 experimental psychology ,03 medical and health sciences ,Sensory Cues ,Ocular System ,Perception ,medicine ,Humans ,0501 psychology and cognitive sciences ,Motion perception ,Sensory cue ,Depth Perception ,Monocular ,Biology and Life Sciences ,Eye movement ,eye diseases ,Human Factors Engineering ,Signal Processing ,Eyes ,Depth perception ,Head ,030217 neurology & neurosurgery ,Neuroscience ,User Interfaces - Abstract
The visual system exploits multiple signals, including monocular and binocular cues, to determine the motion of objects through depth. In the laboratory, sensitivity to different three-dimensional (3D) motion cues varies across observers and is often weak for binocular cues. However, laboratory assessments may reflect factors beyond inherent perceptual sensitivity. For example, the appearance of weak binocular sensitivity may relate to extensive prior experience with two-dimensional (2D) displays in which binocular cues are not informative. Here we evaluated the impact of experience on motion-in-depth (MID) sensitivity in a virtual reality (VR) environment. We tested a large cohort of observers who reported having no prior VR experience and found that binocular cue sensitivity was substantially weaker than monocular cue sensitivity. As expected, sensitivity was greater when monocular and binocular cues were presented together than in isolation. Surprisingly, the addition of motion parallax signals appeared to cause observers to rely almost exclusively on monocular cues. As observers gained experience in the VR task, sensitivity to monocular and binocular cues increased. Notably, most observers were unable to distinguish the direction of MID based on binocular cues above chance level when tested early in the experiment, whereas most showed statistically significant sensitivity to binocular cues when tested late in the experiment. This result suggests that observers may discount binocular cues when they are first encountered in a VR environment. Laboratory assessments may thus underestimate the sensitivity of inexperienced observers to MID, especially for binocular cues.
- Published
- 2022
- Full Text
- View/download PDF
246. Self-supervised monocular depth estimation with occlusion mask and edge awareness
- Author
-
Mitsunori Mizumachi, Zhen Li, Lifeng Zhang, Miaomiao Zhu, He Li, and Shi Zhou
- Subjects
Self-supervised learning ,Computer science ,0206 medical engineering ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Convolutional neural network ,02 engineering and technology ,General Biochemistry, Genetics and Molecular Biology ,Image (mathematics) ,03 medical and health sciences ,0302 clinical medicine ,3d vision ,Artificial Intelligence ,Robustness (computer science) ,Occlusion ,Computer vision ,Estimation ,Monocular ,business.industry ,Geometric consistency ,020601 biomedical engineering ,Edge awareness ,Artificial intelligence ,Enhanced Data Rates for GSM Evolution ,business ,030217 neurology & neurosurgery ,Monocular depth estimation - Abstract
Depth estimation is one of the basic and important tasks in 3D vision. Recently, many works have been done in self-supervised depth estimation based on geometric consistency between frames. However, these research works still have difficulties in ill-posed areas, such as occlusion areas and texture-less areas. This work proposed a novel self-supervised monocular depth estimation method based on occlusion mask and edge awareness to overcome these difficulties. The occlusion mask divides the image into two classes, making the training of the network more reasonable. The edge awareness loss function is designed based on the edge obtained by the traditional method, so that the method has strong robustness to various lighting conditions. Furthermore, we evaluated the proposed method on the KITTI datasets. The occlusion mask and edge awareness are both beneficial to find corresponding points in ill-posed areas.
- Published
- 2021
247. Articulated Object Tracking by High-Speed Monocular RGB Camera
- Author
-
Yang Liu and Akio Namiki
- Subjects
Monocular ,Computer science ,business.industry ,010401 analytical chemistry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,Kinematics ,Degrees of freedom (mechanics) ,Tracking (particle physics) ,Object (computer science) ,01 natural sciences ,0104 chemical sciences ,Video tracking ,RGB color model ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Instrumentation ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
In recent years, tracking of articulated objects at high speed with monocular cameras has been gaining attention. This study presents a novel method for high-frame-rate articulated object tracking with a monocular camera. The method is an extended version of our previous research on high-speed monocular rigid object tracking. In this study, to realize tracking of an articulated object, we integrate dual-quaternion kinematics with our previous fast pixel-wise-posteriors (fast-PWP3D) tracking framework, and propose an auto-regressive (AR) process to encode the dynamic propagation of the estimated state vectors. We give a full three-dimensional derivation of the mathematical formulation of our method and show that our method is capable of tracking an articulated object having a large number of degrees of freedom with only a monocular camera, and is robust against dynamic environmental changes (e.g., illumination/partial occlusion). Moreover, we show an efficient implementation strategy of our method. The results of real-time experiments show that we achieved nearly 350 Hz performance when tracking a four degrees-of-freedom (4-DOF) articulated object with a monocular camera.
- Published
- 2021
248. Simultaneous Attack on CNN-Based Monocular Depth Estimation and Optical Flow Estimation
- Author
-
Ryuraroh Matsumoto, Koichiro Yamanaka, Keita Takahashi, and Toshiaki Fujii
- Subjects
Estimation ,Optical flow estimation ,Monocular ,Artificial Intelligence ,Hardware and Architecture ,business.industry ,Computer science ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Software - Published
- 2021
249. Deep learning for monocular depth estimation: A review
- Author
-
Yue Ming, Hui Yu, Xuyang Meng, and Chunxiao Fan
- Subjects
Estimation ,0209 industrial biotechnology ,Monocular ,Computer science ,business.industry ,Cognitive Neuroscience ,Deep learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Machine learning ,computer.software_genre ,Computer Science Applications ,Task (project management) ,020901 industrial engineering & automation ,Categorization ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Augmented reality ,Depth of field ,Artificial intelligence ,business ,Depth perception ,computer - Abstract
Depth estimation is a classic task in computer vision, which is of great significance for many applications such as augmented reality, target tracking and autonomous driving. Traditional monocular depth estimation methods are based on depth cues for depth prediction with strict requirements, e.g. shape-from-focus/ defocus methods require low depth of field on the scenes and images. Recently, a large body of deep learning methods have been proposed and has shown great promise in handling the traditional ill-posed problem. This paper aims to review the state-of-the-art development in deep learning-based monocular depth estimation. We give an overview of published papers between 2014 and 2020 in terms of training manners and task types. We firstly summarize the deep learning models for monocular depth estimation. Secondly, we categorize various deep learning-based methods in monocular depth estimation. Thirdly, we introduce the publicly available dataset and the evaluation metrics. And we also analysis the properties of these methods and compare their performance. Finally, we highlight the challenges in order to inform the future research directions.
- Published
- 2021
250. Real-Time Monocular Obstacle Detection Based on Horizon Line and Saliency Estimation for Unmanned Surface Vehicles
- Author
-
Hengyu Li, Jun Liu, Liu Jingyi, Jun Luo, and Shaorong Xie
- Subjects
Monocular ,Computer Networks and Communications ,Computer science ,Sun glitter ,business.industry ,020206 networking & telecommunications ,02 engineering and technology ,Filter (signal processing) ,Interference (wave propagation) ,Mixture model ,Hardware and Architecture ,Obstacle ,Line (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Monocular vision ,Software ,Information Systems - Abstract
Recently, real-time obstacle detection by monocular vision exhibits a promising prospect in enhancing the safety of unmanned surface vehicles (USVs). Since the obstacles that may threaten USVs generally appear below the water edge, most existing methods first detect the horizon line and then search for obstacles below the estimated horizon line. However, these methods detect horizon line only using edge or line features, which are susceptible to interference edges from clouds, waves, and land, eventually resulting in poor obstacle detection. To avoid being affected by interference edges, in this paper, we propose a novel horizon line detection method based on semantic segmentation. The method assumes a Gaussian mixture model (GMM) with spatial smoothness constraints to fit the semantic structure of marine images and simultaneously generate a water segmentation mask. The horizon line is estimated from the water boundary points via straight line fitting. Further, inspired by human visual attention mechanisms, an efficient saliency detection method based on background prior and contrast prior is presented to detect obstacles below the estimated horizon line. To reduce false positives caused by sun glitter, waves and foam, the continuity of the adjacent frames is employed to filter the detected obstacles. An extensive evaluation was conducted on a large marine image dataset collected by our ‘Jinghai VIII’ USV. The experimental results show that the proposed method significantly outperformed the recent state-of-the-art marine obstacle method by 22.07% in terms of F-score while running over 24 fps on an NVIDIA GTX1080Ti GPU.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.