Author: "Gordon Wetzstein" / Topic: artificial intelligence - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Gordon Wetzstein"' showing total 98 results

Start Over Author "Gordon Wetzstein" Topic artificial intelligence

98 results on '"Gordon Wetzstein"'

1. Acorn

Author: Gordon Wetzstein, Marco Monteiro, Julien N. P. Martel, David B. Lindell, Eric R. Chan, and Connor Z. Lin
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 010103 numerical & computational mathematics, 02 engineering and technology, 01 natural sciences, Machine Learning (cs.LG), Rendering (computer graphics), Octree, Computer Science - Graphics, 0202 electrical engineering, electronic engineering, information engineering, Quadtree, Computer vision, Polygon mesh, 0101 mathematics, Block (data storage), Network architecture, business.industry, 020207 software engineering, Computer Graphics and Computer-Aided Design, Graphics (cs.GR), Artificial intelligence, Geometric modeling, business, Encoder
Abstract: Neural representations have emerged as a new paradigm for applications in rendering, imaging, geometric modeling, and simulation. Compared to traditional representations such as meshes, point clouds, or volumes they can be flexibly incorporated into differentiable learning-based pipelines. While recent improvements to neural representations now make it possible to represent signals with fine details at moderate resolutions (e.g., for images and 3D shapes), adequately representing large-scale or complex scenes has proven a challenge. Current neural representations fail to accurately represent images at resolutions greater than a megapixel or 3D scenes with more than a few hundred thousand polygons. Here, we introduce a new hybrid implicit-explicit network architecture and training strategy that adaptively allocates resources during training and inference based on the local complexity of a signal of interest. Our approach uses a multiscale block-coordinate decomposition, similar to a quadtree or octree, that is optimized during training. The network architecture operates in two stages: using the bulk of the network parameters, a coordinate encoder generates a feature grid in a single forward pass. Then, hundreds or thousands of samples within each block can be efficiently evaluated using a lightweight feature decoder. With this hybrid implicit-explicit network architecture, we demonstrate the first experiments that fit gigapixel images to nearly 40 dB peak signal-to-noise ratio. Notably this represents an increase in scale of over 1000x compared to the resolution of previously demonstrated image-fitting experiments. Moreover, our approach is able to represent 3D shapes significantly faster and better than previous techniques; it reduces training times from days to hours or minutes and memory requirements by over an order of magnitude., Comment: J. N. P. Martel and D. B. Lindell equally contributed to this work
Published: 2021
Full Text: View/download PDF

2. Neural holography with camera-in-the-loop training

Author: Gordon Wetzstein, Suyeon Choi, Nitish Padmanaban, and Yifan Peng
Subjects: Wave propagation, business.industry, Computer science, Image quality, 1080p, Holography, 020207 software engineering, 02 engineering and technology, Virtual reality, Computer Graphics and Computer-Aided Design, Image (mathematics), law.invention, law, 0202 electrical engineering, electronic engineering, information engineering, Holographic display, Computer vision, Augmented reality, Artificial intelligence, business
Abstract: Holographic displays promise unprecedented capabilities for direct-view displays as well as virtual and augmented reality applications. However, one of the biggest challenges for computer-generated holography (CGH) is the fundamental tradeoff between algorithm runtime and achieved image quality, which has prevented high-quality holographic image synthesis at fast speeds. Moreover, the image quality achieved by most holographic displays is low, due to the mismatch between the optical wave propagation of the display and its simulated model. Here, we develop an algorithmic CGH framework that achieves unprecedented image fidelity and real-time framerates. Our framework comprises several parts, including a novel camera-in-the-loop optimization strategy that allows us to either optimize a hologram directly or train an interpretable model of the optical wave propagation and a neural network architecture that represents the first CGH algorithm capable of generating full-color high-quality holographic images at 1080p resolution in real time.
Published: 2020
Full Text: View/download PDF

3. Optimizing depth perception in virtual and augmented reality through gaze-contingent stereo rendering

Author: Petr Kellnhofer, Brooke Krajancich, and Gordon Wetzstein
Subjects: Computer science, business.industry, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Wearable computer, 020207 software engineering, 02 engineering and technology, Virtual reality, Computer Graphics and Computer-Aided Design, Gaze, Rendering (computer graphics), medicine.anatomical_structure, Perception, 0202 electrical engineering, electronic engineering, information engineering, medicine, Computer vision, Augmented reality, Human eye, Artificial intelligence, Depth perception, business, ComputingMethodologies_COMPUTERGRAPHICS, media_common
Abstract: Virtual and augmented reality (VR/AR) displays crucially rely on stereoscopic rendering to enable perceptually realistic user experiences. Yet, existing near-eye display systems ignore the gaze-dependent shift of the no-parallax point in the human eye. Here, we introduce a gaze-contingent stereo rendering technique that models this effect and conduct several user studies to validate its effectiveness. Our findings include experimental validation of the location of the no-parallax point, which we then use to demonstrate significant improvements of disparity and shape distortion in a VR setting, and consistent alignment of physical and digitally rendered objects across depths in optical see-through AR. Our work shows that gaze-contingent stereo rendering improves perceptual realism and depth perception of emerging wearable computing systems.
Published: 2020
Full Text: View/download PDF

4. Neural Sensors: Learning Pixel Exposures for HDR Imaging and Video Compressive Sensing With Programmable Sensors

Author: Gordon Wetzstein, Lorenz K. Muller, Stephen J. Carey, Julien N. P. Martel, and Piotr Dudek
Subjects: high-speed imaging, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, end-to-end optimization, vision chip, Image processing, 02 engineering and technology, Computational photography, Optical imaging, Artificial Intelligence, High-dynamic-range imaging, Shutter, High-dynamic range imaging, 0202 electrical engineering, electronic engineering, information engineering, Vision chip, Computer vision, Image sensor, programmable sensors, Pixel, business.industry, Applied Mathematics, Rolling shutter, video compressive sensing, deep neural networks, Computational Theory and Mathematics, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software
Abstract: Camera sensors rely on global or rolling shutter functions to expose an image. This fixed function approach severely limits the sensors' ability to capture high-dynamic-range (HDR) scenes and resolve high-speed dynamics. Spatially varying pixel exposures have been introduced as a powerful computational photography approach to optically encode irradiance on a sensor and computationally recover additional information of a scene, but existing approaches rely on heuristic coding schemes and bulky spatial light modulators to optically implement these exposure functions. Here, we introduce neural sensors as a methodology to optimize per-pixel shutter functions jointly with a differentiable image processing method, such as a neural network, in an end-to-end fashion. Moreover, we demonstrate how to leverage emerging programmable and re-configurable sensor-processors to implement the optimized exposure functions directly on the sensor. Our system takes specific limitations of the sensor into account to optimize physically feasible optical codes and we evaluate its performance for snapshot HDR and high-speed compressive imaging both in simulation and experimentally with real scenes.
Published: 2020
Full Text: View/download PDF

5. Factored Occlusion: Single Spatial Light Modulator Occlusion-capable Optical See-through Augmented Reality Display

Author: Brooke Krajancich, Gordon Wetzstein, and Nitish Padmanaban
Subjects: Spatial light modulator, Computer science, business.industry, Digital content, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Metaverse, Computer Graphics and Computer-Aided Design, Digital micromirror device, law.invention, law, Signal Processing, Augmented reality, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Depth perception, business, Software, Realism, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Occlusion is a powerful visual cue that is crucial for depth perception and realism in optical see-through augmented reality (OST-AR). However, existing OST-AR systems additively overlay physical and digital content with beam combiners - an approach that does not easily support mutual occlusion, resulting in virtual objects that appear semi-transparent and unrealistic. In this work, we propose a new type of occlusion-capable OST-AR system. Rather than additively combining the real and virtual worlds, we employ a single digital micromirror device (DMD) to merge the respective light paths in a multiplicative manner. This unique approach allows us to simultaneously block light incident from the physical scene on a pixel-by-pixel basis while also modulating the light emitted by a light-emitting diode (LED) to display digital content. Our technique builds on mixed binary/continuous factorization algorithms to optimize time-multiplexed binary DMD patterns and their corresponding LED colors to approximate a target augmented reality (AR) scene. In simulations and with a prototype benchtop display, we demonstrate hard-edge occlusions, plausible shadows, and also gaze-contingent optimization of this novel display mode, which only requires a single spatial light modulator.
Published: 2020
Full Text: View/download PDF

6. Deep learning multi-shot 3D localization microscopy using hybrid optical-electronic computing

Author: Michael Broxton, Takamasa Kudo, Hayato Ikoma, Gordon Wetzstein, and Yifan Peng
Subjects: Physics, 0303 health sciences, Microscopy, business.industry, Deep learning, 01 natural sciences, Atomic and Molecular Physics, and Optics, 010309 optics, 03 medical and health sciences, Optics, Deep Learning, Shot (pellet), 0103 physical sciences, Artificial intelligence, Electronics, business, Algorithms, 030304 developmental biology, 3d localization
Abstract: Current 3D localization microscopy approaches are fundamentally limited in their ability to image thick, densely labeled specimens. Here, we introduce a hybrid optical–electronic computing approach that jointly optimizes an optical encoder (a set of multiple, simultaneously imaged 3D point spread functions) and an electronic decoder (a neural-network-based localization algorithm) to optimize 3D localization performance under these conditions. With extensive simulations and biological experiments, we demonstrate that our deep-learning-based microscope achieves significantly higher 3D localization accuracy than existing approaches, especially in challenging scenarios with high molecular density over large depth ranges.
Published: 2021

7. Holographic near-eye displays based on overlap-add stereograms

Author: Gordon Wetzstein, Yifan Peng, and Nitish Padmanaban
Subjects: Image quality, Computer science, business.industry, Short-time Fourier transform, Holography, 020207 software engineering, 02 engineering and technology, Computer Graphics and Computer-Aided Design, law.invention, law, Face (geometry), 0202 electrical engineering, electronic engineering, information engineering, Angular resolution, Computer vision, Artificial intelligence, business, Image resolution, Light field
Abstract: Holographic near-eye displays are a key enabling technology for virtual and augmented reality (VR/AR) applications. Holographic stereograms (HS) are a method of encoding a light field into a hologram, which enables them to natively support view-dependent lighting effects. However, existing HS algorithms require the choice of a hogel size, forcing a tradeoff between spatial and angular resolution. Based on the fact that the short-time Fourier transform (STFT) connects a hologram to its observable light field, we develop the overlap-add stereogram (OLAS) as the correct method of "inverting" the light field into a hologram via the STFT. The OLAS makes more efficient use of the information contained within the light field than previous HS algorithms, exhibiting better image quality at a range of distances and hogel sizes. Most remarkably, the OLAS does not degrade spatial resolution with increasing hogel size, overcoming the spatio-angular resolution tradeoff that previous HS algorithms face. Importantly, the optimal hogel size of previous methods typically varies with the depth of every object in a scene, making the OLAS not only a hogel size-invariant method, but also nearly scene independent. We demonstrate the performance of the OLAS both in simulation and on a prototype near-eye display system, showing focusing capabilities and view-dependent effects.
Published: 2019
Full Text: View/download PDF

8. Wave-based non-line-of-sight imaging using fast f-k migration

Author: Gordon Wetzstein, David B. Lindell, and Matthew O'Toole
Subjects: Image formation, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, Inversion (meteorology), 02 engineering and technology, Computer Graphics and Computer-Aided Design, Computational photography, Non-line-of-sight propagation, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Artificial intelligence, Specular reflection, business
Abstract: Imaging objects outside a camera's direct line of sight has important applications in robotic vision, remote sensing, and many other domains. Time-of-flight-based non-line-of-sight (NLOS) imaging systems have recently demonstrated impressive results, but several challenges remain. Image formation and inversion models have been slow or limited by the types of hidden surfaces that can be imaged. Moreover, non-planar sampling surfaces and non-confocal scanning methods have not been supported by efficient NLOS algorithms. With this work, we introduce a wave-based image formation model for the problem of NLOS imaging. Inspired by inverse methods used in seismology, we adapt a frequency-domain method, f-k migration, for solving the inverse NLOS problem. Unlike existing NLOS algorithms, f-k migration is both fast and memory efficient, it is robust to specular and other complex reflectance properties, and we show how it can be used with non-confocally scanned measurements as well as for non-planar sampling surfaces. f-k migration is more robust to measurement noise than alternative methods, generally produces better quality reconstructions, and is easy to implement. We experimentally validate our algorithms with a new NLOS imaging system that records room-sized scenes outdoors under indirect sunlight, and scans persons wearing retroreflective clothing at interactive rates.
Published: 2019
Full Text: View/download PDF

9. Non-line-of-sight Imaging with Partial Occluders and Surface Normals

Author: Steven Diamond, Kai Zang, Matthew O'Toole, Gordon Wetzstein, Felix Heide, and David B. Lindell
Subjects: FOS: Computer and information sciences, Surface (mathematics), Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, 01 natural sciences, Computer Graphics and Computer-Aided Design, Reflectivity, 010309 optics, Non-line-of-sight propagation, Computational photography, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Artificial intelligence, Representation (mathematics), business, Search and rescue, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Imaging objects obscured by occluders is a significant challenge for many applications. A camera that could “see around corners” could help improve navigation and mapping capabilities of autonomous vehicles or make search and rescue missions more effective. Time-resolved single-photon imaging systems have recently been demonstrated to record optical information of a scene that can lead to an estimation of the shape and reflectance of objects hidden from the line of sight of a camera. However, existing non-line-of-sight (NLOS) reconstruction algorithms have been constrained in the types of light transport effects they model for the hidden scene parts. We introduce a factored NLOS light transport representation that accounts for partial occlusions and surface normals. Based on this model, we develop a factorization approach for inverse time-resolved light transport and demonstrate high-fidelity NLOS reconstructions for challenging scenes both in simulation and with an experimental NLOS imaging system.
Published: 2019
Full Text: View/download PDF

10. Suremap: Predicting Uncertainty in Cnn-Based Image Reconstructions Using Stein’s Unbiased Risk Estimate

Author: Gordon Wetzstein, Frank Ong, Christopher A. Metzler, and Ruangrawee Kitichotkul
Subjects: Signal processing, Artifact (error), Computer science, business.industry, Message passing, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, 02 engineering and technology, Machine learning, computer.software_genre, Convolutional neural network, Image (mathematics), Compressed sensing, Stein's unbiased risk estimate, 0202 electrical engineering, electronic engineering, information engineering, Medical imaging, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Convolutional neural networks (CNN) have emerged as a powerful tool for solving computational imaging reconstruction problems. However, CNNs are generally difficult-to-understand black-boxes. Accordingly, it is challenging to know when they will work and, more importantly, when they will fail. This limitation is a major barrier to their use in safety-critical applications like medical imaging: Is that blob in the reconstruction an artifact or a tumor?In this work we use Stein’s unbiased risk estimate (SURE) to develop per-pixel confidence intervals, in the form of heatmaps, for compressive sensing reconstruction using the approximate message passing (AMP) framework with CNN-based denoisers. These heatmaps tell end-users how much to trust an image formed by a CNN, which could greatly improve the utility of CNNs in various computational imaging applications.
Published: 2021
Full Text: View/download PDF

11. AutoInt: Automatic Integration for Fast Neural Volume Rendering

Author: David B. Lindell, Julien N. P. Martel, and Gordon Wetzstein
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial neural network, business.industry, Computer science, Image quality, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Volume rendering, Graphics (cs.GR), Antiderivative, Machine Learning (cs.LG), Rendering (computer graphics), View synthesis, Computer Science - Graphics, Computer engineering, Fundamental theorem of calculus, Graph (abstract data type), Artificial intelligence, business
Abstract: Numerical integration is a foundational technique in scientific computing and is at the core of many computer vision applications. Among these applications, neural volume rendering has recently been proposed as a new paradigm for view synthesis, achieving photorealistic image quality. However, a fundamental obstacle to making these methods practical is the extreme computational and memory requirements caused by the required volume integrations along the rendered rays during training and inference. Millions of rays, each requiring hundreds of forward passes through a neural network are needed to approximate those integrations with Monte Carlo sampling. Here, we propose automatic integration, a new framework for learning efficient, closed-form solutions to integrals using coordinate-based neural networks. For training, we instantiate the computational graph corresponding to the derivative of the coordinate-based network. The graph is fitted to the signal to integrate. After optimization, we reassemble the graph to obtain a network that represents the antiderivative. By the fundamental theorem of calculus, this enables the calculation of any definite integral in two evaluations of the network. Applying this approach to neural rendering, we improve a tradeoff between rendering speed and image quality: improving render times by greater than 10× with a tradeoff of reduced image quality.
Published: 2021
Full Text: View/download PDF

12. Neural Lumigraph Rendering

Author: Ryan Spicer, Kari Pulli, Andrew Jones, Petr Kellnhofer, Lars C. Jebe, and Gordon Wetzstein
Subjects: FOS: Computer and information sciences, Computer science, business.industry, Image quality, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Volume rendering, Facial recognition system, Graphics pipeline, Graphics (cs.GR), Rendering (computer graphics), View synthesis, Computer Science - Graphics, Face (geometry), Computer vision, Artificial intelligence, Graphics, business, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Novel view synthesis is a challenging and ill-posed inverse rendering problem. Neural rendering techniques have recently achieved photorealistic image quality for this task. State-of-the-art (SOTA) neural volume rendering approaches, however, are slow to train and require minutes of inference (i.e., rendering) time for high image resolutions. We adopt high-capacity neural scene representations with periodic activations for jointly optimizing an implicit surface and a radiance field of a scene supervised exclusively with posed 2D images. Our neural rendering pipeline accelerates SOTA neural volume rendering by about two orders of magnitude and our implicit surface representation is unique in allowing us to export a mesh with view-dependent texture information. Thus, like other implicit surface representations, ours is compatible with traditional graphics pipelines, enabling real-time rendering rates, while achieving unprecedented image quality compared to other surface methods. We assess the quality of our approach using existing datasets as well as high-quality 3D face data captured with a custom multi-camera rig., Comment: Project website: http://www.computationalimaging.org/publications/nlr/
Published: 2021
Full Text: View/download PDF

13. Event-Based Near-Eye Gaze Tracking Beyond 10,000 Hz

Author: Gordon Wetzstein, Julien N. P. Martel, Jörg Conradt, Amit P. S. Kohli, and Anastasios Angelopoulos
Subjects: Event (computing), business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, Tracking system, 02 engineering and technology, Computer Graphics and Computer-Aided Design, Gaze, Rendering (computer graphics), Signal Processing, Parametric model, 0202 electrical engineering, electronic engineering, information engineering, Eye tracking, Computer vision, Augmented reality, Computer Vision and Pattern Recognition, Artificial intelligence, Microsaccade, business, Software
Abstract: The cameras in modern gaze-tracking systems suffer from fundamental bandwidth and power limitations, constraining data acquisition speed to 300 Hz realistically. This obstructs the use of mobile eye trackers to perform, e.g., low latency predictive rendering, or to study quick and subtle eye motions like microsaccades using head-mounted devices in the wild. Here, we propose a hybrid frame-event-based near-eye gaze tracking system offering update rates beyond 10,000 Hz with an accuracy that matches that of high-end desktop-mounted commercial trackers when evaluated in the same conditions. Our system, previewed in Figure 1, builds on emerging event cameras that simultaneously acquire regularly sampled frames and adaptively sampled events. We develop an online 2D pupil fitting method that updates a parametric model every one or few events. Moreover, we propose a polynomial regressor for estimating the point of gaze from the parametric pupil model in real time. Using the first event-based gaze dataset, we demonstrate that our system achieves accuracies of 0.45°-1.75° for fields of view from 45° to 98°. With this technology, we hope to enable a new generation of ultra-low-latency gaze-contingent rendering and display techniques for virtual and augmented reality.
Published: 2021

14. High-quality holographic displays using double SLMs and camera-in-the-loop optimization

Author: Yifan Peng, Gordon Wetzstein, Suyeon Choi, and Jonghyun Kim
Subjects: Diffraction, Loop optimization, Spatial light modulator, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Holography, Image plane, Diffraction efficiency, law.invention, Speckle pattern, law, Holographic display, Computer vision, Artificial intelligence, business
Abstract: Holographic displays have recently shown remarkable progress in the research field. However, images reconstructed from existing display systems using phase-only spatial light modulators (SLMs) are with noticeable speckles and low contrast due to the non-trivial diffraction efficiency loss. In this work, we investigate a novel holographic display architecture that uses two phase-only SLMs to enable high-quality, contrast-enhanced dis- play experiences. Our system builds on emerging camera-in-the-loop optimization techniques that capture both diffracted and undiffracted light on the image plane with a camera and use this to update the hologram patterns on the SLMs in an iterative fashion. Our experimental results demonstrate that the proposed display architecture can deliver higher-contrast and holographic images with little speckle without the need for extra optical filtering.
Published: 2021
Full Text: View/download PDF

15. A Perceptual Model for Eccentricity-dependent Spatio-temporal Flicker Fusion and its Applications to Foveated Graphics

Author: Brooke Krajancich, Gordon Wetzstein, and Petr Kellnhofer
Subjects: FOS: Computer and information sciences, Computer science, Image quality, Computer Science - Human-Computer Interaction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Flicker fusion threshold, 02 engineering and technology, Virtual reality, Luminance, 01 natural sciences, Human-Computer Interaction (cs.HC), 010309 optics, Computer Science - Graphics, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, Computer vision, Graphics, business.industry, Image and Video Processing (eess.IV), 020207 software engineering, Electrical Engineering and Systems Science - Image and Video Processing, Computer Graphics and Computer-Aided Design, Graphics (cs.GR), Human visual system model, 020201 artificial intelligence & image processing, Augmented reality, Spatial frequency, Artificial intelligence, business
Abstract: Virtual and augmented reality (VR/AR) displays strive to provide a resolution, framerate and field of view that matches the perceptual capabilities of the human visual system, all while constrained by limited compute budgets and transmission bandwidths of wearable computing systems. Foveated graphics techniques have emerged that could achieve these goals by exploiting the falloff of spatial acuity in the periphery of the visual field. However, considerably less attention has been given to temporal aspects of human vision, which also vary across the retina. This is in part due to limitations of current eccentricity-dependent models of the visual system. We introduce a new model, experimentally measuring and computationally fitting eccentricity-dependent critical flicker fusion thresholds jointly for both space and time. In this way, our model is unique in enabling the prediction of temporal information that is imperceptible for a certain spatial frequency, eccentricity, and range of luminance levels. We validate our model with an image quality user study, and use it to predict potential bandwidth savings 7X higher than those afforded by current spatial-only foveated models. As such, this work forms the enabling foundation for new temporally foveated graphics techniques.
Published: 2021
Full Text: View/download PDF

16. pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

Author: Gordon Wetzstein, Marco Monteiro, Petr Kellnhofer, Jiajun Wu, and Eric R. Chan
Subjects: FOS: Computer and information sciences, Network architecture, business.industry, Computer science, Image quality, Computer Vision and Pattern Recognition (cs.CV), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, Volume rendering, 02 engineering and technology, Graphics (cs.GR), Rendering (computer graphics), Visualization, Generative model, Computer Science - Graphics, 0202 electrical engineering, electronic engineering, information engineering, Artificial intelligence, Representation (mathematics), business, Generative grammar
Abstract: We have witnessed rapid progress on 3D-aware image synthesis, leveraging recent advances in generative visual models and neural rendering. Existing approaches however fall short in two ways: first, they may lack an underlying 3D representation or rely on view-inconsistent rendering, hence synthesizing images that are not multi-view consistent; second, they often depend upon representation network architectures that are not expressive enough, and their results thus lack in image quality. We propose a novel generative model, named Periodic Implicit Generative Adversarial Networks ($\pi$-GAN or pi-GAN), for high-quality 3D-aware image synthesis. $\pi$-GAN leverages neural representations with periodic activation functions and volumetric rendering to represent scenes as view-consistent 3D representations with fine detail. The proposed approach obtains state-of-the-art results for 3D-aware image synthesis with multiple real and synthetic datasets.
Published: 2020

17. Semantic Implicit Neural Scene Representations With Semi-Supervised Training

Author: Vincent Sitzmann, Gordon Wetzstein, and Amit P. S. Kohli
Subjects: FOS: Computer and information sciences, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Point cloud, 02 engineering and technology, 010501 environmental sciences, Semantics, 01 natural sciences, 0202 electrical engineering, electronic engineering, information engineering, Segmentation, Representation (mathematics), 0105 earth and related environmental sciences, I.2.10, Artificial neural network, business.industry, I.4.5, I.4.6, Trilinear interpolation, 020207 software engineering, Pattern recognition, Image segmentation, I.4.10, I.4.8, Task analysis, Artificial intelligence, business
Abstract: The recent success of implicit neural scene representations has presented a viable new method for how we capture and store 3D scenes. Unlike conventional 3D representations, such as point clouds, which explicitly store scene properties in discrete, localized units, these implicit representations encode a scene in the weights of a neural network which can be queried at any coordinate to produce these same scene properties. Thus far, implicit representations have primarily been optimized to estimate only the appearance and/or 3D geometry information in a scene. We take the next step and demonstrate that an existing implicit representation (SRNs) is actually multi-modal; it can be further leveraged to perform per-point semantic segmentation while retaining its ability to represent appearance and geometry. To achieve this multi-modal behavior, we utilize a semi-supervised learning strategy atop the existing pre-trained scene representation. Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks in order to achieve dense 3D semantic segmentation. We explore two novel applications for this semantically aware implicit neural scene representation: 3D novel view and semantic label synthesis given only a single input RGB image or 2D label mask, as well as 3D interpolation of appearance and semantics., 3DV 2020 Camera Ready https://www.computationalimaging.org/publications/
Published: 2020
Full Text: View/download PDF

18. Non-Line-of-Sight Surface Reconstruction Using the Directional Light-Cone Transform

Author: David Taubman, Sean I. Young, Gordon Wetzstein, Bernd Girod, and David B. Lindell
Subjects: Surface (mathematics), Current (mathematics), business.industry, Computer science, 020207 software engineering, 02 engineering and technology, Iterative reconstruction, Albedo, Non-line-of-sight propagation, Light cone, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Deconvolution, business, Algorithm, Surface reconstruction
Abstract: We propose a joint albedo–normal approach to non-line-of-sight (NLOS) surface reconstruction using the directional light-cone transform (D-LCT). While current NLOS imaging methods reconstruct either the albedo or surface normals of the hidden scene, the two quantities provide complementary information of the scene, so an efficient method to estimate both simultaneously is desirable. We formulate the recovery of the two quantities as a vector deconvolution problem, and solve it via Cholesky–Wiener decomposition. We demonstrate that surfaces fitted non-parametrically using our recovered normals are more accurate than those produced with NLOS surface reconstruction methods recently proposed, and are 1,000 times faster to compute than using inverse rendering.
Published: 2020
Full Text: View/download PDF

19. Non-line-of-sight Imaging

Author: Gordon Wetzstein, Andreas Velten, and Daniele Faccio
Subjects: Line-of-sight, Photon, Computer science, business.industry, Perspective (graphical), Detector, Image and Video Processing (eess.IV), Process (computing), General Physics and Astronomy, FOS: Physical sciences, Electrical Engineering and Systems Science - Image and Video Processing, Non-line-of-sight propagation, Computer Science::Computer Vision and Pattern Recognition, FOS: Electrical engineering, electronic engineering, information engineering, Computer vision, Artificial intelligence, business, Inverse method, Optics (physics.optics), Physics - Optics
Abstract: Emerging single-photon-sensitive sensors produce picosecond-accurate time-stamped photon counts. Applying advanced inverse methods to process these data has resulted in unprecedented imaging capabilities, such as non-line-of-sight (NLOS) imaging. Rather than imaging photons that travel along direct paths from a source to an object and back to the detector, NLOS methods analyse photons that travel along indirect light paths, scattered from multiple surfaces, to estimate 3D images of scenes outside the direct line of sight of a camera, hidden by a wall or other obstacles. We review the transient imaging techniques that underlie many NLOS imaging approaches, discuss methods for reconstructing hidden scenes from time-resolved measurements, describe some other methods for NLOS imaging that do not require transient imaging and discuss the future of ‘seeing around corners’. Non-line-of-sight (NLOS) imaging methods use light scattered from multiple surfaces to reconstruct images of scenes that are hidden by another object. This Perspective summarizes existing NLOS imaging techniques and discusses which directions show most promise for future developments.
Published: 2020

20. Deep Adaptive LiDAR: End-to-end Optimization of Sampling and Depth Completion at Low Sampling Rates

Author: Alexander W. Bergman, David B. Lindell, and Gordon Wetzstein
Subjects: Adaptive sampling, business.industry, Computer science, Deep learning, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Point cloud, Inpainting, Sampling (statistics), 02 engineering and technology, 01 natural sciences, Task (project management), 010309 optics, Lidar, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Upstream (networking), Artificial intelligence, business
Abstract: Current LiDAR systems are limited in their ability to capture dense 3D point clouds. To overcome this challenge, deep learning-based depth completion algorithms have been developed to inpaint missing depth guided by an RGB image. However, these methods fail for low sampling rates. Here, we propose an adaptive sampling scheme for LiDAR systems that demonstrates state-of-the-art performance for depth completion at low sampling rates. Our system is fully differentiable, allowing the sparse depth sampling and the depth inpainting components to be trained end-to-end with an upstream task.
Published: 2020
Full Text: View/download PDF

21. Comparison of head pose tracking methods for mixed-reality neuronavigation for transcranial magnetic stimulation

Author: Brian A. Hargreaves, Bruce L. Daniel, Gordon Wetzstein, Supriya Sathyanarayana, Amit Etkin, Mahendra T. Bhati, Jennifer A. McNab, and Christoph Leuze
Subjects: Tracking error, Neuronavigation, Computer science, business.industry, Headset, Head (vessel), Computer vision, Artificial intelligence, Focus (optics), Tracking (particle physics), business, Mixed reality, Visualization
Abstract: Purpose: Repetitive Transcranial Magnetic Stimulation (rTMS) is an important treatment option for medication resistant depression. It uses an electromagnetic coil that needs to be positioned accurately at a specific location and angle next to the head such that specific brain areas are stimulated. Existing image-guided neuronavigation systems allow accurate targeting but add cost, training and setup times, preventing their wide-spread use in the clinic. Mixed-reality neuronavigation can help mitigate these issues and thereby enable more widespread use of image-based neuronavigation by providing a much more intuitive and streamlined visualization of the target. A mixed-reality neuronavigation system requires two core functionalities: 1) tracking of the patient's head and 2) visualization of targeting-related information. Here we focus on the head tracking functionality and compare three different head tracking methods for a mixed-reality neuronavigation system. Methods: We integrated three head tracking methods into the mixed reality neuronavigation framework and measured their accuracy. Specifically, we experimented with (a) marker-based tracking with a mixed reality headset (optical see-through head-mounted display (OST-HMD)) camera, (b) marker-based tracking with a world-anchored camera and (c) markerless RGB-depth (RGB-D) tracking with a world-anchored camera. To measure the accuracy of each approach, we measured the distance between real-world and virtual target points on a mannequin head. Results: The mean tracking error for the initial head pose and the head rotated by 10° and 30° for the three methods respectively was: (a) 3.54±1.10 mm, 3.79±1.78 mm and 4.08±1.88 mm, (b) 3.97±1.41 mm, 6.01±2.51 mm and 6.84±3.48 mm, (c) 3.16±2.26 mm, 4.46±2.30 mm and 5.83±3.70 mm. Conclusion: For the initial head pose, all three methods achieved the required accuracy of < 5 mm for TMS treatment. For smaller head rotations of 10°, only the marker-based (a) and markerless method (c) delivered sufficient accuracy for TMS treatment. For larger head rotations of 30°, only the marker-based method (a) achieved sufficient accuracy. While the markerless method (c) did not provide sufficient accuracy for TMS at the larger head rotations, it offers significant advantages such as occlusion-handling and stability and could potentially meet the accuracy requirements with further methodological refinements.
Published: 2020
Full Text: View/download PDF

22. End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging

Author: Stephen Boyd, Felix Heide, Gordon Wetzstein, Xiong Dun, Vincent Sitzmann, Wolfgang Heidrich, Steven Diamond, and Yifan Peng
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, Image processing, Reconstruction algorithm, 02 engineering and technology, 01 natural sciences, Computer Graphics and Computer-Aided Design, Superresolution, law.invention, 010309 optics, End-to-end principle, Achromatic lens, law, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Depth of field, Artificial intelligence, business
Abstract: In typical cameras the optical system is designed first; once it is fixed, the parameters in the image processing algorithm are tuned to get good image reproduction. In contrast to this sequential design approach, we consider joint optimization of an optical system (for example, the physical shape of the lens) together with the parameters of the reconstruction algorithm. We build a fully-differentiable simulation model that maps the true source image to the reconstructed one. The model includes diffractive light propagation, depth and wavelength-dependent effects, noise and nonlinearities, and the image post-processing. We jointly optimize the optical parameters and the image processing algorithm parameters so as to minimize the deviation between the true and reconstructed image, over a large set of images. We implement our joint optimization method using autodifferentiation to efficiently compute parameter gradients in a stochastic optimization algorithm. We demonstrate the efficacy of this approach by applying it to achromatic extended depth of field and snapshot super-resolution imaging.
Published: 2018
Full Text: View/download PDF

23. Single-photon 3D imaging with deep sensor fusion

Author: Matthew O'Toole, Gordon Wetzstein, and David B. Lindell
Subjects: business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Process (computing), Reconstruction algorithm, 02 engineering and technology, Sensor fusion, 01 natural sciences, Computer Graphics and Computer-Aided Design, Signal, 010309 optics, Computational photography, Gesture recognition, Depth map, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Pose
Abstract: Sensors which capture 3D scene information provide useful data for tasks in vehicle navigation, gesture recognition, human pose estimation, and geometric reconstruction. Active illumination time-of-flight sensors in particular have become widely used to estimate a 3D representation of a scene. However, the maximum range, density of acquired spatial samples, and overall acquisition time of these sensors is fundamentally limited by the minimum signal required to estimate depth reliably. In this paper, we propose a data-driven method for photon-efficient 3D imaging which leverages sensor fusion and computational reconstruction to rapidly and robustly estimate a dense depth map from low photon counts. Our sensor fusion approach uses measurements of single photon arrival times from a low-resolution single-photon detector array and an intensity image from a conventional high-resolution camera. Using a multi-scale deep convolutional network, we jointly process the raw measurements from both sensors and output a high-resolution depth map. To demonstrate the efficacy of our approach, we implement a hardware prototype and show results using captured data. At low signal-to-background levels, our depth reconstruction algorithm with sensor fusion outperforms other methods for depth estimation from noisy measurements of photon arrival times.
Published: 2018
Full Text: View/download PDF

24. Snapshot difference imaging using correlation time-of-flight sensors

Author: Gordon Wetzstein, Felix Heide, Matthias B. Hullin, and Clara Callenberg
Subjects: Computer science, business.industry, 020208 electrical & electronic engineering, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, Computer Graphics and Computer-Aided Design, Correlation, Time of flight, Computational photography, 0202 electrical engineering, electronic engineering, information engineering, Snapshot (computer storage), Computer vision, Artificial intelligence, business
Abstract: Computational photography encompasses a diversity of imaging techniques, but one of the core operations performed by many of them is to compute image differences. An intuitive approach to computing such differences is to capture several images sequentially and then process them jointly. In this paper, we introduce a snapshot difference imaging approach that is directly implemented in the sensor hardware of emerging time-of-flight cameras. With a variety of examples, we demonstrate that the proposed snapshot difference imaging technique is useful for direct-global illumination separation, for direct imaging of spatial and temporal image gradients, for direct depth edge imaging, and more.
Published: 2017
Full Text: View/download PDF

25. Optically sensing neural activity without imaging

Author: Gordon Wetzstein and Isaac Kauvar
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, 02 engineering and technology, 021001 nanoscience & nanotechnology, 01 natural sciences, Atomic and Molecular Physics, and Optics, Electronic, Optical and Magnetic Materials, 010309 optics, Neural activity, 0103 physical sciences, Artificial intelligence, 0210 nano-technology, business, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Advanced computational imaging techniques have the potential to extract neural activity patterns from scattered data without reconstructing images.
Published: 2020
Full Text: View/download PDF

26. Real-Time Unknown-View Tomography Using Recurrent Neural Networks with Applications to Keyhole Imaging

Author: Gordon Wetzstein and Christopher A. Metzler
Subjects: Recurrent neural network, Computer science, business.industry, Computer vision, Artificial intelligence, Tomography, business, Keyhole
Abstract: Unknown-view tomography is an important but computationally intensive reconstruction problem. We demonstrate that recurrent neural networks (RNNs) can perform unknown-view tomography in real time and validate our solution on simulated non-line-of- sight imaging-through-a-keyhole data.
Published: 2020
Full Text: View/download PDF

27. Disambiguating Monocular Depth Estimation with a Single Transient

Author: Mark Nishimura, Christopher A. Metzler, Gordon Wetzstein, and David B. Lindell
Subjects: Monocular, Computer science, business.industry, media_common.quotation_subject, Detector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Ambiguity, Sensor fusion, Computer Science::Computer Vision and Pattern Recognition, Histogram, Metric (mathematics), Computer vision, Transient (computer programming), Artificial intelligence, business, Scale (map), ComputingMethodologies_COMPUTERGRAPHICS, media_common
Abstract: Monocular depth estimation algorithms successfully predict the relative depth order of objects in a scene. However, because of the fundamental scale ambiguity associated with monocular images, these algorithms fail at correctly predicting true metric depth. In this work, we demonstrate how a depth histogram of the scene, which can be readily captured using a single-pixel time-resolved detector, can be fused with the output of existing monocular depth estimation algorithms to resolve the depth ambiguity problem. We validate this novel sensor fusion technique experimentally and in extensive simulation. We show that it significantly improves the performance of several state-of-the-art monocular depth estimation algorithms.
Published: 2020
Full Text: View/download PDF

28. Neural Holographic Display and Image Synthesis

Author: Suyeon Choi, Nitish Padmanaban, Yifan Peng, and Gordon Wetzstein
Subjects: business.industry, Computer science, Image quality, Aperture synthesis, media_common.quotation_subject, Holography, Fidelity, 02 engineering and technology, Virtual reality, 021001 nanoscience & nanotechnology, 01 natural sciences, law.invention, 010309 optics, Stochastic gradient descent, law, 0103 physical sciences, Holographic display, Augmented reality, Computer vision, Artificial intelligence, 0210 nano-technology, business, media_common
Abstract: Holographic displays promise unprecedented capabilities for direct-view displays as well as virtual and augmented reality applications. We develop Neural Holography, an algorithmic computer-generated holography (CGH) framework that uses camera-in-the-loop training to achieve unprecedented image fidelity and real-time framerates.
Published: 2020
Full Text: View/download PDF

29. Inference in artificial intelligence with deep optics and photonics

Author: Cornelia Denz, Sylvain Gigan, Gordon Wetzstein, Marin Soljacic, Aydogan Ozcan, Demetri Psaltis, Shanhui Fan, Dirk Englund, and David A. B. Miller
Subjects: neural-networks, Multidisciplinary, Artificial neural network, business.industry, Computer science, Optical computing, Inference, 02 engineering and technology, 021001 nanoscience & nanotechnology, 01 natural sciences, Visual computing, 010309 optics, 0103 physical sciences, microscopy, Artificial intelligence, Applications of artificial intelligence, Photonics, recognition, 0210 nano-technology, business, implementation
Abstract: Artificial intelligence tasks across numerous applications require accelerators for fast and low-power execution. Optical computing systems may be able to meet these domain-specific needs but, despite half a century of research, general-purpose optical computing systems have yet to mature into a practical technology. Artificial intelligence inference, however, especially for visual computing applications, may offer opportunities for inference based on optical and photonic systems. In this Perspective, we review recent work on optical computing for artificial intelligence applications and discuss its promise and challenges.
Published: 2019

30. Gaze-Contingent Ocular Parallax Rendering for Virtual Reality

Author: Gordon Wetzstein, Robert Konrad, and Anastasios Angelopoulos
Subjects: FOS: Computer and information sciences, J.4, Computer science, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Human-Computer Interaction, 02 engineering and technology, Virtual reality, 050105 experimental psychology, Human-Computer Interaction (cs.HC), Rendering (computer graphics), Computer graphics, InformationSystems_MODELSANDPRINCIPLES, Computer Science - Graphics, H.5.1, Perception, 0202 electrical engineering, electronic engineering, information engineering, Immersion (virtual reality), medicine, 0501 psychology and cognitive sciences, Computer vision, media_common, ComputingMethodologies_COMPUTERGRAPHICS, Retina, business.industry, 05 social sciences, Perspective (graphical), I.3.7, 020207 software engineering, Computer Graphics and Computer-Aided Design, Gaze, Graphics (cs.GR), medicine.anatomical_structure, Eye tracking, Augmented reality, Artificial intelligence, Parallax, business, Depth perception
Abstract: Immersive computer graphics systems strive to generate perceptually realistic user experiences. Current-generation virtual reality (VR) displays are successful in accurately rendering many perceptually important effects, including perspective, disparity, motion parallax, and other depth cues. In this article, we introduce ocular parallax rendering, a technology that accurately renders small amounts of gaze-contingent parallax capable of improving depth perception and realism in VR. Ocular parallax describes the small amounts of depth-dependent image shifts on the retina that are created as the eye rotates. The effect occurs because the centers of rotation and projection of the eye are not the same. We study the perceptual implications of ocular parallax rendering by designing and conducting a series of user experiments. Specifically, we estimate perceptual detection and discrimination thresholds for this effect and demonstrate that it is clearly visible in most VR applications. Additionally, we show that ocular parallax rendering provides an effective ordinal depth cue and it improves the impression of realistic depth in VR., Video: https://www.youtube.com/watch?v=FvBYYAObJNM&feature=youtu.be Project Page: http://www.computationalimaging.org/publications/gaze-contingent-ocular-parallax-rendering-for-virtual-reality/
Published: 2019

31. Acoustic Non-Line-Of-Sight Imaging

Author: Gordon Wetzstein, David B. Lindell, and Vladlen Koltun
Subjects: Image formation, business.industry, Geophysical imaging, Computer science, Machine vision, Real-time computing, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Photodetector, law.invention, Computational photography, Non-line-of-sight propagation, Orders of magnitude (time), law, Medical imaging, Artificial intelligence, Radar, business
Abstract: Non-line-of-sight (NLOS) imaging enables unprecedented capabilities in a wide range of applications, including robotic and machine vision, remote sensing, autonomous vehicle navigation, and medical imaging. Recent approaches to solving this challenging problem employ optical time-of-flight imaging systems with highly sensitive time-resolved photodetectors and ultra-fast pulsed lasers. However, despite recent successes in NLOS imaging using these systems, widespread implementation and adoption of the technology remains a challenge because of the requirement for specialized, expensive hardware. We introduce acoustic NLOS imaging, which is orders of magnitude less expensive than most optical systems and captures hidden 3D geometry at longer ranges with shorter acquisition times compared to state-of-the-art optical methods. Inspired by hardware setups used in radar and algorithmic approaches to model and invert wave-based image formation models developed in the seismic imaging community, we demonstrate a new approach to seeing around corners.
Published: 2019
Full Text: View/download PDF

32. Computational imaging with multi-camera time-of-flight systems

Author: Shikhar Shrestha, Felix Heide, Gordon Wetzstein, and Wolfgang Heidrich
Subjects: Computer science, business.industry, Global illumination, Machine vision, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, Motion detection, 02 engineering and technology, Virtual reality, Interference (wave propagation), 01 natural sciences, Computer Graphics and Computer-Aided Design, 010309 optics, Computational photography, symbols.namesake, Time of flight, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, symbols, Waveform, Computer vision, Artificial intelligence, business, Doppler effect
Abstract: Depth cameras are a ubiquitous technology used in a wide range of applications, including robotic and machine vision, human-computer interaction, autonomous vehicles as well as augmented and virtual reality. In this paper, we explore the design and applications of phased multi-camera time-of-flight (ToF) systems. We develop a reproducible hardware system that allows for the exposure times and waveforms of up to three cameras to be synchronized. Using this system, we analyze waveform interference between multiple light sources in ToF applications and propose simple solutions to this problem. Building on the concept of orthogonal frequency design, we demonstrate state-of-the-art results for instantaneous radial velocity capture via Doppler time-of-flight imaging and we explore new directions for optically probing global illumination, for example by de-scattering dynamic scenes and by non-line-of-sight motion detection via frequency gating.
Published: 2016
Full Text: View/download PDF

33. Tensor low-rank and sparse light field photography

Author: Mahdad Hosseini Kamal, Barmak Heshmat, Gordon Wetzstein, Pierre Vandergheynst, and Ramesh Raskar
Subjects: Low-rank tensor factorization, Imagination, Computer science, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, High dimensional, Low-rank and sparse decomposition, law.invention, Redundancy (information theory), Computational photography, law, 0202 electrical engineering, electronic engineering, information engineering, Image acquisition, Computer vision, media_common, Light-field camera, business.industry, 020207 software engineering, Compressive sensing, Compressed sensing, Computer Science::Computer Vision and Pattern Recognition, Signal Processing, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software, Light field, Curse of dimensionality
Abstract: We present a computational camera system for efficient light field image and video acquisition.Our mathematical framework models the intrinsic low dimensionality of light fields using tensor low-rank and sparse priors.We design and implement a prototype compressive light field camera that avoids capturing redundancy of high-dimensional plenoptic function. High-quality light field photography has been one of the most difficult challenges in computational photography. Conventional methods either sacrifice resolution, use multiple devices, or require multiple images to be captured. Combining coded image acquisition and compressive reconstruction is one of the most promising directions to overcome limitations of conventional light field cameras. We present a new approach to compressive light field photography that exploits a joint tensor low-rank and sparse prior (LRSP) on natural light fields. As opposed to recently proposed light field dictionaries, our method does not require a computationally expensive learning stage but rather models the redundancies of high dimensional visual signals using a tensor low-rank prior. This is not only computationally more efficient but also more flexible in that the proposed techniques are easily applicable to a wide range of different imaging systems, camera parameters, and also scene types.
Published: 2016
Full Text: View/download PDF

34. SPADnet: deep RGB-SPAD sensor fusion assisted by monocular depth estimation

Author: Gordon Wetzstein, Olav Solgaard, Zhanghao Sun, and David B. Lindell
Subjects: Monocular, Artificial neural network, Computer science, business.industry, Noise reduction, Detector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Ranging, 02 engineering and technology, 021001 nanoscience & nanotechnology, Sensor fusion, 01 natural sciences, Atomic and Molecular Physics, and Optics, 010309 optics, Optics, Lidar, 0103 physical sciences, RGB color model, Computer vision, Artificial intelligence, 0210 nano-technology, business
Abstract: Single-photon light detection and ranging (LiDAR) techniques use emerging single-photon detectors (SPADs) to push 3D imaging capabilities to unprecedented ranges. However, it remains challenging to robustly estimate scene depth from the noisy and otherwise corrupted measurements recorded by a SPAD. Here, we propose a deep sensor fusion strategy that combines corrupted SPAD data and a conventional 2D image to estimate the depth of a scene. Our primary contribution is a neural network architecture—SPADnet—that uses a monocular depth estimation algorithm together with a SPAD denoising and sensor fusion strategy. This architecture, together with several techniques in network training, achieves state-of-the-art results for RGB-SPAD fusion with simulated and captured data. Moreover, SPADnet is more computationally efficient than previous RGB-SPAD fusion networks.
Published: 2020
Full Text: View/download PDF

35. LiFF: Light Field Features in Scale and Depth

Author: Donald G. Dansereau, Gordon Wetzstein, and Bernd Girod
Subjects: FOS: Computer and information sciences, Scale (ratio), Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Perspective (graphical), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, 02 engineering and technology, Scale invariance, Computational photography, Feature (computer vision), Path (graph theory), 0202 electrical engineering, electronic engineering, information engineering, Structure from motion, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Light field
Abstract: Feature detectors and descriptors are key low-level vision tools that many higher-level tasks build on. Unfortunately these fail in the presence of challenging light transport effects including partial occlusion, low contrast, and reflective or refractive surfaces. Building on spatio-angular imaging modalities offered by emerging light field cameras, we introduce a new and computationally efficient 4D light field feature detector and descriptor: LiFF. LiFF is scale invariant and utilizes the full 4D light field to detect features that are robust to changes in perspective. This is particularly useful for structure from motion (SfM) and other tasks that match features across viewpoints of a scene. We demonstrate significantly improved 3D reconstructions via SfM when using LiFF instead of the leading 2D or 4D features, and show that LiFF runs an order of magnitude faster than the leading 4D approach. Finally, LiFF inherently estimates depth for each feature, opening a path for future research in light field-based SfM.
Published: 2019

36. Deep Optics for Single-shot High-dynamic-range Imaging

Author: Gordon Wetzstein, Hayato Ikoma, Yifan Peng, and Christopher A. Metzler
Subjects: Point spread function, FOS: Computer and information sciences, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Convolutional neural network, law.invention, Optical imaging, High-dynamic-range imaging, law, 0202 electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering, Computer vision, Image resolution, Dynamic range, business.industry, Deep learning, Image and Video Processing (eess.IV), 020207 software engineering, Electrical Engineering and Systems Science - Image and Video Processing, Lens (optics), Hallucinating, 020201 artificial intelligence & image processing, Artificial intelligence, business, Encoder
Abstract: High-dynamic-range (HDR) imaging is crucial for many computer graphics and vision applications. Yet, acquiring HDR images with a single shot remains a challenging problem. Whereas modern deep learning approaches are successful at hallucinating plausible HDR content from a single low-dynamic-range (LDR) image, saturated scene details often cannot be faithfully recovered. Inspired by recent deep optical imaging approaches, we interpret this problem as jointly training an optical encoder and electronic decoder where the encoder is parameterized by the point spread function (PSF) of the lens, the bottleneck is the sensor with a limited dynamic range, and the decoder is a convolutional neural network (CNN). The lens surface is then jointly optimized with the CNN in a training phase; we fabricate this optimized optical element and attach it as a hardware add-on to a conventional camera during inference. In extensive simulations and with a physical prototype, we demonstrate that this end-to-end deep optical imaging approach to single-shot HDR imaging outperforms both purely CNN-based approaches and other PSF engineering approaches.
Published: 2019
Full Text: View/download PDF

37. Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Author: Gordon Wetzstein, Christopher A. Metzler, and David B. Lindell
Subjects: Signal Processing (eess.SP), FOS: Computer and information sciences, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Iterative reconstruction, Tracking (particle physics), 01 natural sciences, 010309 optics, 03 medical and health sciences, Non-line-of-sight propagation, Optical path, Position (vector), 0103 physical sciences, FOS: Electrical engineering, electronic engineering, information engineering, Transient (computer programming), Computer vision, Electrical Engineering and Systems Science - Signal Processing, 030304 developmental biology, 0303 health sciences, business.industry, Object (computer science), Computer Science Applications, Computational Mathematics, Signal Processing, Artificial intelligence, business, Keyhole
Abstract: Non-line-of-sight (NLOS) imaging and tracking is an emerging technology that allows the shape or position of objects around corners or behind diffusers to be recovered from transient, time-of-flight measurements. However, existing NLOS approaches require the imaging system to scan a large area on a visible surface, where the indirect light paths of hidden objects are sampled. In many applications, such as robotic vision or autonomous driving, optical access to a large scanning area may not be available, which severely limits the practicality of existing NLOS techniques. Here, we propose a new approach, dubbed keyhole imaging, that captures a sequence of transient measurements along a single optical path, for example, through a keyhole. Assuming that the hidden object of interest moves during the acquisition time, we effectively capture a series of time-resolved projections of the object's shape from unknown viewpoints. We derive inverse methods based on expectation-maximization to recover the object's shape and location using these measurements. Then, with the help of long exposure times and retroreflective tape, we demonstrate successful experimental results with a prototype keyhole imaging system.
Published: 2019
Full Text: View/download PDF

38. DeepVoxels: Learning Persistent 3D Feature Embeddings

Author: Gordon Wetzstein, Felix Heide, Matthias NieBner, Vincent Sitzmann, Michael Zollhöfer, and Justus Thies
Subjects: FOS: Computer and information sciences, Artificial neural network, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Deep learning, Perspective (graphical), 3D reconstruction, Representation (systemics), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, 020207 software engineering, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, View synthesis, Feature (computer vision), 0202 electrical engineering, electronic engineering, information engineering, Embedding, Artificial intelligence, business, 0105 earth and related environmental sciences
Abstract: In this work, we address the lack of 3D understanding of generative neural networks by introducing a persistent 3D feature embedding for view synthesis. To this end, we propose DeepVoxels, a learned representation that encodes the view-dependent appearance of a 3D scene without having to explicitly model its geometry. At its core, our approach is based on a Cartesian 3D grid of persistent embedded features that learn to make use of the underlying 3D scene structure. Our approach combines insights from 3D geometric computer vision with recent advances in learning image-to-image mappings based on adversarial loss functions. DeepVoxels is supervised, without requiring a 3D reconstruction of the scene, using a 2D re-rendering loss and enforces perspective and multi-view geometry in a principled manner. We apply our persistent 3D scene representation to the problem of novel view synthesis demonstrating high-quality results for a variety of challenging scenes., Video: https://www.youtube.com/watch?v=HM_WsZhoGXw Supplemental material: https://drive.google.com/file/d/1BnZRyNcVUty6-LxAstN83H79ktUq8Cjp/view?usp=sharing Code: https://github.com/vsitzmann/deepvoxels Project page: https://vsitzmann.github.io/deepvoxels/
Published: 2018

39. Real-time non-line-of-sight imaging

Author: Gordon Wetzstein, David B. Lindell, and Matthew O'Toole
Subjects: Computer science, business.industry, Confocal, Detector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, Inverse problem, 01 natural sciences, 010309 optics, Non-line-of-sight propagation, Computational photography, Lidar, Orders of magnitude (time), Light cone, 0103 physical sciences, Computer Science::Networking and Internet Architecture, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Artificial intelligence, business
Abstract: Non-line-of-sight (NLOS) imaging aims at recovering the shape of objects hidden outside the direct line of sight of a camera. In this work, we report on a new approach for acquiring time-resolved measurements that are suitable for NLOS imaging. The system uses a confocalized single-photon detector and pulsed laser. As opposed to previously-proposed NLOS imaging systems, our setup is very similar to LIDAR systems used for autonomous vehicles and it facilitates a closed-form solution of the associated inverse problem, which we derive in this work. This algorithm, dubbed the Light Cone Transform, is three orders of magnitude faster and more memory efficient than existing methods. We demonstrate experimental results for indoor and outdoor scenes captured and reconstructed with the proposed confocal NLOS imaging system.
Published: 2018
Full Text: View/download PDF

40. Confocal non-line-of-sight imaging

Author: Matthew O'Toole, David B. Lindell, and Gordon Wetzstein
Subjects: 0301 basic medicine, Computer science, business.industry, Orders of magnitude (temperature), Confocal, Detector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Inverse problem, 01 natural sciences, 010309 optics, 03 medical and health sciences, Non-line-of-sight propagation, Computational photography, 030104 developmental biology, Lidar, Light cone, 0103 physical sciences, Computer Science::Networking and Internet Architecture, Computer vision, Artificial intelligence, business
Abstract: Non-line-of-sight (NLOS) imaging aims at recovering the shape of objects hidden outside the direct line of sight of a camera. In this work, we report on a new approach for acquiring time-resolved measurements that are suitable for NLOS imaging. The system uses a confocalized single-photon detector and pulsed laser. As opposed to previously-proposed NLOS imaging systems, our setup is very similar to LIDAR systems used for autonomous vehicles and it facilitates a closed-form solution of the associated inverse problem, which we derive in this work. This algorithm, dubbed the Light Cone Transform, is three orders of magnitude faster and more memory efficient than existing methods. We demonstrate experimental results for indoor and outdoor scenes captured and reconstructed with the proposed confocal NLOS imaging system.
Published: 2018
Full Text: View/download PDF

41. Deep End-to-End Time-of-Flight Imaging

Author: Felix Heide, Gordon Wetzstein, Shuochen Su, and Wolfgang Heidrich
Subjects: Normalization (statistics), Computer science, business.industry, Noise reduction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Normalization (image processing), 020207 software engineering, Image processing, 02 engineering and technology, Iterative reconstruction, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, Rendering (computer graphics), Computer Science::Computer Vision and Pattern Recognition, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Artificial intelligence, business, 0105 earth and related environmental sciences
Abstract: We present an end-to-end image processing framework for time-of-flight (ToF) cameras. Existing ToF image processing pipelines consist of a sequence of operations including modulated exposures, denoising, phase unwrapping and multipath interference correction. While this cascaded modular design offers several benefits, such as closed-form solutions and power-efficient processing, it also suffers from error accumulation and information loss as each module can only observe the output from its direct predecessor, resulting in erroneous depth estimates. We depart from a conventional pipeline model and propose a deep convolutional neural network architecture that recovers scene depth directly from dual-frequency, raw ToF correlation measurements. To train this network, we simulate ToF images for a variety of scenes using a time-resolved renderer, devise depth-specific losses, and apply normalization and augmentation strategies to generalize this model to real captures. We demonstrate that the proposed network can efficiently exploit the spatio-temporal structures of ToF frequency measurements, and validate the performance of the joint multipath removal, denoising and phase unwrapping method on a wide range of challenging scenes.
Published: 2018
Full Text: View/download PDF

42. Towards a Machine-Learning Approach for Sickness Prediction in 360° Stereoscopic Videos

Author: Timon Ruban, Anthony M. Norcia, Nitish Padmanaban, Gordon Wetzstein, and Vincent Sitzmann
Subjects: Adult, Male, Visual perception, Databases, Factual, Computer science, Nausea, Motion Sickness, media_common.quotation_subject, Video Recording, Stereoscopy, 02 engineering and technology, Virtual reality, Stimulus (physiology), Machine learning, computer.software_genre, law.invention, Machine Learning, User-Computer Interface, Young Adult, law, Perception, 0202 electrical engineering, electronic engineering, information engineering, medicine, Computer Graphics, Humans, media_common, Depth Perception, business.industry, Virtual Reality, 020206 networking & telecommunications, Middle Aged, medicine.disease, Computer Graphics and Computer-Aided Design, Motion sickness, Signal Processing, Simulator sickness, 020201 artificial intelligence & image processing, Female, Computer Vision and Pattern Recognition, Artificial intelligence, Headaches, medicine.symptom, Depth perception, business, computer, Software, Algorithms
Abstract: Virtual reality systems are widely believed to be the next major computing platform. There are, however, some barriers to adoption that must be addressed, such as that of motion sickness - which can lead to undesirable symptoms including postural instability, headaches, and nausea. Motion sickness in virtual reality occurs as a result of moving visual stimuli that cause users to perceive self-motion while they remain stationary in the real world. There are several contributing factors to both this perception of motion and the subsequent onset of sickness, including field of view, motion velocity, and stimulus depth. We verify first that differences in vection due to relative stimulus depth remain correlated with sickness. Then, we build a dataset of stereoscopic 3D videos and their corresponding sickness ratings in order to quantify their nauseogenicity, which we make available for future use. Using this dataset, we train a machine learning algorithm on hand-crafted features (quantifying speed, direction, and depth as functions of time) from each video, learning the contributions of these various features to the sickness ratings. Our predictor generally outperforms a naive estimate, but is ultimately limited by the size of the dataset. However, our result is promising and opens the door to future work with more extensive datasets. This and further advances in this space have the potential to alleviate developer and end user concerns about motion sickness in the increasingly commonplace virtual world.
Published: 2018

43. Adaptive color display via perceptually-driven factored spectral projection

Author: Samuel Yang, Liang Shi, Ian E. McDowall, Gordon Wetzstein, and Isaac Kauvar
Subjects: Pixel, Computer science, business.industry, Dynamic range, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Graphics and Computer-Aided Design, law.invention, Set (abstract data type), Gamut, Projector, law, Color depth, Computer vision, Artificial intelligence, business, Throughput (business), Selection (genetic algorithm), ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Fundamental display characteristics are constantly being improved, especially resolution, dynamic range, and color reproduction. However, whereas high resolution and high-dynamic range displays have matured as a technology, it remains largely unclear how to extend the color gamut of a display without either sacrificing light throughput or making other tradeoffs. In this paper, we advocate for adaptive color display; with hardware implementations that allow for color primaries to be dynamically chosen, an optimal gamut and corresponding pixel states can be computed in a content-adaptive and user-centric manner. We build a flexible gamut projector and develop a perceptually-driven optimization framework that robustly factors a wide color gamut target image into a set of time-multiplexed primaries and corresponding pixel values. We demonstrate that adaptive primary selection has many benefits over fixed gamut selection and show that our algorithm for joint primary selection and gamut mapping performs better than existing methods. Finally, we evaluate the proposed computational display system extensively in simulation and, via photographs and user experiments, with a prototype adaptive color projector.
Published: 2015
Full Text: View/download PDF

44. The light field stereoscope

Author: Gordon Wetzstein, Kevin Chen, and Fu-Chung Huang
Subjects: Image formation, Focus (computing), Computer science, Orientation (computer vision), business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Retinal, Stereoscopy, Virtual reality, Computer Graphics and Computer-Aided Design, Gaze, law.invention, Computer graphics, chemistry.chemical_compound, chemistry, law, Computer graphics (images), Immersion (virtual reality), Computer vision, Artificial intelligence, Depth perception, business, Stereoscope, Image resolution, Light field
Abstract: Over the last few years, virtual reality (VR) has re-emerged as a technology that is now feasible at low cost via inexpensive cellphone components. In particular, advances of high-resolution micro displays, low-latency orientation trackers, and modern GPUs facilitate immersive experiences at low cost. One of the remaining challenges to further improve visual comfort in VR experiences is the vergence-accommodation conflict inherent to all stereoscopic displays. Accurate reproduction of all depth cues is crucial for visual comfort. By combining well-known stereoscopic display principles with emerging factored light field technology, we present the first wearable VR display supporting high image resolution as well as focus cues. A light field is presented to each eye, which provides more natural viewing experiences than conventional near-eye displays. Since the eye box is just slightly larger than the pupil size, rank-1 light field factorizations are sufficient to produce correct or nearly-correct focus cues; no time-multiplexed image display or gaze tracking is required. We analyze lens distortions in 4D light field space and correct them using the afforded high-dimensional image formation. We also demonstrate significant improvements in resolution and retinal blur quality over related near-eye displays. Finally, we analyze diffraction limits of these types of displays.
Published: 2015
Full Text: View/download PDF

45. 28-3:Invited Paper: Light Field, Focus-tunable, and Monovision Near-eye Displays

Author: Gordon Wetzstein
Subjects: Liquid-crystal display, business.industry, Computer science, 05 social sciences, Visual Discomfort, 020207 software engineering, Stereoscopy, 02 engineering and technology, Virtual reality, law.invention, law, 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, Computer vision, Augmented reality, Artificial intelligence, business, Accommodation, 050107 human factors, Light field
Abstract: Emerging virtual and augmented reality (VR/VR) displays must overcome the prevalent issue of visual discomfort to provide comfortable user experiences. In particular, the mismatch between vergence and accommodation cues inherent to most stereoscopic displays has been a long-standing challenge. We evaluate several display modes that have the promise to mitigate visual discomfort caused by the vergence-accommodation conflict (VAC), and improve user comfort as well as performance in VR/AR applications. In particular, we explore light field, focus-tunable, and monovision display modes. Whereas light field displays seek to synthesize the 4D light field over the eye box of the viewer, for example with a stacked liquid crystal display architecture, focus-tunable display mode use either programmable liquid lenses or actuated displays to adaptively place the accommodation plane at different distances during the VR/AR experience. Monovision is an unconventional mode that accommodates each eye of the observer at different depths, a technique commonly used in ophthalmology. We evaluate the effectiveness of several different display modes enabled by light field and focus-tunable near eye displays.
Published: 2016
Full Text: View/download PDF

46. Convolutional Sparse Coding for RGB+NIR Imaging

Author: Gordon Wetzstein, Xuemei Hu, Felix Heide, and Qionghai Dai
Subjects: Computer science, business.industry, Noise reduction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Iterative reconstruction, 01 natural sciences, Computer Graphics and Computer-Aided Design, Facial recognition system, Rendering (computer graphics), 010309 optics, Computational photography, Color gel, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, RGB color model, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, Neural coding, business, Software, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Emerging sensor designs increasingly rely on novel color filter arrays (CFAs) to sample the incident spectrum in unconventional ways. In particular, capturing a near-infrared (NIR) channel along with conventional RGB color is an exciting new imaging modality. RGB+NIR sensing has broad applications in computational photography, such as low-light denoising, it has applications in computer vision, such as facial recognition and tracking, and it paves the way toward low-cost single-sensor RGB and depth imaging using structured illumination. However, cost-effective commercial CFAs suffer from severe spectral cross talk. This cross talk represents a major challenge in high-quality RGB+NIR imaging, rendering existing spatially multiplexed sensor designs impractical. In this work, we introduce a new approach to RGB+NIR image reconstruction using learned convolutional sparse priors. We demonstrate high-quality color and NIR imaging for challenging scenes, even including high-frequency structured NIR illumination. The effectiveness of the proposed method is validated on a large data set of experimental captures, and simulated benchmark results which demonstrate that this work achieves unprecedented reconstruction quality.
Published: 2018

47. Optical Convolutional Neural Networks with Optimized Phase Masks for Image Classification

Author: Gordon Wetzstein, Julie Chang, and Vincent Sitzmann
Subjects: Point spread function, Contextual image classification, Artificial neural network, business.industry, Computer science, Machine vision, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Phase (waves), Image processing, Pattern recognition, Convolutional neural network, Artificial intelligence, Spatial frequency, business
Abstract: Convolutional neural networks excel in many computer vision applications but exert high computational demands. We propose a zero-power optical convolutional layer that can be incorporated for increased efficiency and demonstrate its potential in simulations.
Published: 2018
Full Text: View/download PDF

48. Saliency in VR: How do people explore virtual environments?

Author: Amy Pavel, Ana Serrano, Vincent Sitzmann, Maneesh Agrawala, Gordon Wetzstein, Diego Gutierrez, and Belen Masia
Subjects: Adult, Male, Panorama, Adolescent, Computer science, Video Recording, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Stereoscopy, 02 engineering and technology, Fixation, Ocular, Virtual reality, law.invention, User-Computer Interface, Young Adult, Salience (neuroscience), law, 0202 electrical engineering, electronic engineering, information engineering, Immersion (virtual reality), Computer Graphics, Visual attention, Humans, Computer vision, Computer Simulation, ComputingMethodologies_COMPUTERGRAPHICS, Depth Perception, business.industry, Orientation (computer vision), Virtual Reality, 020207 software engineering, Fixation (psychology), Middle Aged, Computer Graphics and Computer-Aided Design, Gaze, Visualization, Stereopsis, Signal Processing, Exploratory Behavior, 020201 artificial intelligence & image processing, Female, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software
Abstract: Understanding how people explore immersive virtual environments is crucial for many applications, such as designing virtual reality (VR) content, developing new compression algorithms, or learning computational models of saliency or visual attention. Whereas a body of recent work has focused on modeling saliency in desktop viewing conditions, VR is very different from these conditions in that viewing behavior is governed by stereoscopic vision and by the complex interaction of head orientation, gaze, and other kinematic constraints. To further our understanding of viewing behavior and saliency in VR, we capture and analyze gaze and head orientation data of 169 users exploring stereoscopic, static omni-directional panoramas, for a total of 1980 head and gaze trajectories for three different viewing conditions. We provide a thorough analysis of our data, which leads to several important insights, such as the existence of a particular fixation bias, which we then use to adapt existing saliency predictors to immersive VR conditions. In addition, we explore other applications of our data and analysis, including automatic alignment of VR video cuts, panorama thumbnails, panorama video synopsis, and saliency-basedcompression.
Published: 2018

49. Consensus Convolutional Sparse Coding

Author: Gordon Wetzstein, Wolfgang Heidrich, Biswarup Choudhury, Felix Heide, and Robin Swanson
Subjects: Demosaicing, Optimization problem, Computer science, business.industry, Multispectral image, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, Iterative reconstruction, Machine learning, computer.software_genre, View synthesis, Convolutional code, Encoding (memory), 0202 electrical engineering, electronic engineering, information engineering, Unsupervised learning, 020201 artificial intelligence & image processing, Artificial intelligence, business, Neural coding, Feature learning, computer
Abstract: Convolutional sparse coding (CSC) is a promising direction for unsupervised learning in computer vision. In contrast to recent supervised methods, CSC allows for convolutional image representations to be learned that are equally useful for high-level vision tasks and low-level image reconstruction and can be applied to a wide range of tasks without problem-specific retraining. Due to their extreme memory requirements, however, existing CSC solvers have so far been limited to low-dimensional problems and datasets using a handful of low-resolution example images at a time. In this paper, we propose a new approach to solving CSC as a consensus optimization problem, which lifts these limitations. By learning CSC features from large-scale image datasets for the first time, we achieve significant quality improvements in a number of imaging tasks. Moreover, the proposed method enables new applications in high-dimensional feature learning that has been intractable using existing CSC methods. This is demonstrated for a variety of reconstruction problems across diverse problem domains, including 3D multispectral demosaicing and 4D light field view synthesis.
Published: 2017
Full Text: View/download PDF

50. Confocal non-line-of-sight imaging based on the light-cone transform

Author: David B. Lindell, Gordon Wetzstein, and Matthew O'Toole
Subjects: Multidisciplinary, Computer science, business.industry, Detector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Ranging, 02 engineering and technology, 021001 nanoscience & nanotechnology, Tracking (particle physics), 01 natural sciences, Signal, 010309 optics, Non-line-of-sight propagation, Lidar, 0103 physical sciences, Medical imaging, Computer vision, Noise (video), Artificial intelligence, 0210 nano-technology, business
Abstract: How to image objects that are hidden from a camera's view is a problem of fundamental importance to many fields of research, with applications in robotic vision, defence, remote sensing, medical imaging and autonomous vehicles. Non-line-of-sight (NLOS) imaging at macroscopic scales has been demonstrated by scanning a visible surface with a pulsed laser and a time-resolved detector. Whereas light detection and ranging (LIDAR) systems use such measurements to recover the shape of visible objects from direct reflections, NLOS imaging reconstructs the shape and albedo of hidden objects from multiply scattered light. Despite recent advances, NLOS imaging has remained impractical owing to the prohibitive memory and processing requirements of existing reconstruction algorithms, and the extremely weak signal of multiply scattered light. Here we show that a confocal scanning procedure can address these challenges by facilitating the derivation of the light-cone transform to solve the NLOS reconstruction problem. This method requires much smaller computational and memory resources than previous reconstruction methods do and images hidden objects at unprecedented resolution. Confocal scanning also provides a sizeable increase in signal and range when imaging retroreflective objects. We quantify the resolution bounds of NLOS imaging, demonstrate its potential for real-time tracking and derive efficient algorithms that incorporate image priors and a physically accurate noise model. Additionally, we describe successful outdoor experiments of NLOS imaging under indirect sunlight.
Published: 2017

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

98 results on '"Gordon Wetzstein"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources