Author: "Fouhey, David F." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Fouhey, David F."' showing total 147 results

Start Over Author "Fouhey, David F."

147 results on '"Fouhey, David F."'

1. Multi-Object Hallucination in Vision-Language Models

Author: Chen, Xuweiyi, Ma, Ziqiao, Zhang, Xuejun, Xu, Sihan, Qian, Shengyi, Yang, Jianing, Fouhey, David F., and Chai, Joyce
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent objects or become distracted) when tasked with focusing on multiple objects simultaneously. We introduce Recognition-based Object Probing Evaluation (ROPE), an automated evaluation protocol that considers the distribution of object classes within a single image during testing and uses visual referring prompts to eliminate ambiguity. With comprehensive empirical studies and analysis of potential factors leading to multi-object hallucination, we found that (1). LVLMs suffer more hallucinations when focusing on multiple objects compared to a single object. (2). The tested object class distribution affects hallucination behaviors, indicating that LVLMs may follow shortcuts and spurious correlations. (3). Hallucinatory behaviors are influenced by data-specific factors, salience and frequency, and model intrinsic behaviors. We hope to enable LVLMs to recognize and reason about multiple objects that often occur in realistic visual scenes, provide insights, and quantify our progress towards mitigating the issues., Comment: Accepted to NeurIPS 2024 | Project page: https://multi-object-hallucination.github.io/
Published: 2024

2. 3D-MVP: 3D Multiview Pretraining for Robotic Manipulation

Author: Qian, Shengyi, Mo, Kaichun, Blukis, Valts, Fouhey, David F., Fox, Dieter, and Goyal, Ankit
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent works have shown that visual pretraining on egocentric datasets using masked autoencoders (MAE) can improve generalization for downstream robotics tasks. However, these approaches pretrain only on 2D images, while many robotics applications require 3D scene understanding. In this work, we propose 3D-MVP, a novel approach for 3D multi-view pretraining using masked autoencoders. We leverage Robotic View Transformer (RVT), which uses a multi-view transformer to understand the 3D scene and predict gripper pose actions. We split RVT's multi-view transformer into visual encoder and action decoder, and pretrain its visual encoder using masked autoencoding on large-scale 3D datasets such as Objaverse. We evaluate 3D-MVP on a suite of virtual robot manipulation tasks and demonstrate improved performance over baselines. We also show promising results on a real robot platform with minimal finetuning. Our results suggest that 3D-aware pretraining is a promising approach to improve sample efficiency and generalization of vision-based robotic manipulation policies. We will release code and pretrained models for 3D-MVP to facilitate future research. Project site: https://jasonqsy.github.io/3DMVP
Published: 2024

3. 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Author: Yang, Jianing, Chen, Xuweiyi, Madaan, Nikhil, Iyengar, Madhavan, Qian, Shengyi, Fouhey, David F., and Chai, Joyce
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: The integration of language and 3D perception is crucial for developing embodied agents and robots that comprehend and interact with the physical world. While large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, their adaptation to 3D environments (3D-LLMs) remains in its early stages. A primary challenge is the absence of large-scale datasets that provide dense grounding between language and 3D scenes. In this paper, we introduce 3D-GRAND, a pioneering large-scale dataset comprising 40,087 household scenes paired with 6.2 million densely-grounded scene-language instructions. Our results show that instruction tuning with 3D-GRAND significantly enhances grounding capabilities and reduces hallucinations in 3D-LLMs. As part of our contributions, we propose a comprehensive benchmark 3D-POPE to systematically evaluate hallucination in 3D-LLMs, enabling fair comparisons among future models. Our experiments highlight a scaling effect between dataset size and 3D-LLM performance, emphasizing the critical role of large-scale 3D-text datasets in advancing embodied AI research. Notably, our results demonstrate early signals for effective sim-to-real transfer, indicating that models trained on large synthetic data can perform well on real-world 3D scans. Through 3D-GRAND and 3D-POPE, we aim to equip the embodied AI community with essential resources and insights, setting the stage for more reliable and better-grounded 3D-LLMs. Project website: https://3d-grand.github.io, Comment: Project website: https://3d-grand.github.io
Published: 2024

4. FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation

Author: Rockwell, Chris, Kulkarni, Nilesh, Jin, Linyi, Park, Jeong Joon, Johnson, Justin, and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Estimating relative camera poses between images has been a central problem in computer vision. Methods that find correspondences and solve for the fundamental matrix offer high precision in most cases. Conversely, methods predicting pose directly using neural networks are more robust to limited overlap and can infer absolute translation scale, but at the expense of reduced precision. We show how to combine the best of both methods; our approach yields results that are both precise and robust, while also accurately inferring translation scales. At the heart of our model lies a Transformer that (1) learns to balance between solved and learned pose estimations, and (2) provides a prior to guide a solver. A comprehensive analysis supports our design choices and demonstrates that our method adapts flexibly to various feature extractors and correspondence estimators, showing state-of-the-art performance in 6DoF pose estimation on Matterport3D, InteriorNet, StreetLearn, and Map-free Relocalization., Comment: Accepted to CVPR 2024. Project Page: https://crockwell.github.io/far/
Published: 2024

5. LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

Author: Yang, Jianing, Chen, Xuweiyi, Qian, Shengyi, Madaan, Nikhil, Iyengar, Madhavan, Fouhey, David F., and Chai, Joyce
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Robotics
Abstract: 3D visual grounding is a critical skill for household robots, enabling them to navigate, manipulate objects, and answer questions based on their environment. While existing approaches often rely on extensive labeled data or exhibit limitations in handling complex language queries, we propose LLM-Grounder, a novel zero-shot, open-vocabulary, Large Language Model (LLM)-based 3D visual grounding pipeline. LLM-Grounder utilizes an LLM to decompose complex natural language queries into semantic constituents and employs a visual grounding tool, such as OpenScene or LERF, to identify objects in a 3D scene. The LLM then evaluates the spatial and commonsense relations among the proposed objects to make a final grounding decision. Our method does not require any labeled training data and can generalize to novel 3D scenes and arbitrary text queries. We evaluate LLM-Grounder on the ScanRefer benchmark and demonstrate state-of-the-art zero-shot grounding accuracy. Our findings indicate that LLMs significantly improve the grounding capability, especially for complex language queries, making LLM-Grounder an effective approach for 3D vision-language tasks in robotics. Videos and interactive demos can be found on the project website https://chat-with-nerf.github.io/ ., Comment: Project website: https://chat-with-nerf.github.io/
Published: 2023

6. Learning to Predict Scene-Level Implicit 3D from Posed RGBD Data

Author: Kulkarni, Nilesh, Jin, Linyi, Johnson, Justin, and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce a method that can learn to predict scene-level implicit functions for 3D reconstruction from posed RGBD data. At test time, our system maps a previously unseen RGB image to a 3D reconstruction of a scene via implicit functions. While implicit functions for 3D reconstruction have often been tied to meshes, we show that we can train one using only a set of posed RGBD images. This setting may help 3D reconstruction unlock the sea of accelerometer+RGBD data that is coming with new phones. Our system, D2-DRDF, can match and sometimes outperform current methods that use mesh supervision and shows better robustness to sparse data., Comment: Project page this https://nileshkulkarni.github.io/d2drdf/
Published: 2023

7. Understanding 3D Object Interaction from a Single Image

Author: Qian, Shengyi and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Humans can easily understand a single image as depicting multiple potential objects permitting interaction. We use this skill to plan our interactions with the world and accelerate understanding new objects without engaging in interaction. In this paper, we would like to endow machines with the similar ability, so that intelligent agents can better explore the 3D scene or manipulate objects. Our approach is a transformer-based model that predicts the 3D location, physical properties and affordance of objects. To power this model, we collect a dataset with Internet videos, egocentric videos and indoor images to train and validate our approach. Our model yields strong performance on our data, and generalizes well to robotics data. Project site: https://jasonqsy.github.io/3DOI/, Comment: ICCV 2023
Published: 2023

8. Perspective Fields for Single Image Camera Calibration

Author: Jin, Linyi, Zhang, Jianming, Hold-Geoffroy, Yannick, Wang, Oliver, Matzen, Kevin, Sticha, Matthew, and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Geometric camera calibration is often required for applications that understand the perspective of the image. We propose perspective fields as a representation that models the local perspective properties of an image. Perspective Fields contain per-pixel information about the camera view, parameterized as an up vector and a latitude value. This representation has a number of advantages as it makes minimal assumptions about the camera model and is invariant or equivariant to common image editing operations like cropping, warping, and rotation. It is also more interpretable and aligned with human perception. We train a neural network to predict Perspective Fields and the predicted Perspective Fields can be converted to calibration parameters easily. We demonstrate the robustness of our approach under various scenarios compared with camera calibration-based methods and show example applications in image compositing., Comment: CVPR 2023 Camera Ready. Project Page https://jinlinyi.github.io/PerspectiveFields/
Published: 2022

9. Large-Scale Spatial Cross-Calibration of Hinode/SOT-SP and SDO/HMI

Author: Fouhey, David F., Higgins, Richard E. L., Antiochos, Spiro K., Barnes, Graham, DeRosa, Marc L., Hoeksema, J. Todd, Leka, K. D., Liu, Yang, Schuck, Peter W., and Gombosi, Tamas I.
Subjects: Astrophysics - Solar and Stellar Astrophysics, Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Computer Vision and Pattern Recognition
Abstract: We investigate the cross-calibration of the Hinode/SOT-SP and SDO/HMI instrument meta-data, specifically the correspondence of the scaling and pointing information. Accurate calibration of these datasets gives the correspondence needed by inter-instrument studies and learning-based magnetogram systems, and is required for physically-meaningful photospheric magnetic field vectors. We approach the problem by robustly fitting geometric models on correspondences between images from each instrument's pipeline. This technique is common in computer vision, but several critical details are required when using scanning slit spectrograph data like Hinode/SOT-SP. We apply this technique to data spanning a decade of the Hinode mission. Our results suggest corrections to the published Level 2 Hinode/SOT-SP data. First, an analysis on approximately 2,700 scans suggests that the reported pixel size in Hinode/SOT-SP Level 2 data is incorrect by around 1%. Second, analysis of over 12,000 scans show that the pointing information is often incorrect by dozens of arcseconds with a strong bias. Regression of these corrections indicates that thermal effects have caused secular and cyclic drift in Hinode/SOT-SP pointing data over its mission. We offer two solutions. First, direct co-alignment with SDO/HMI data via our procedure can improve alignments for many Hinode/SOT-SP scans. Second, since the pointing errors are predictable, simple post-hoc corrections can substantially improve the pointing. We conclude by illustrating the impact of this updated calibration on derived physical data products needed for research and interpretation. Among other things, our results suggest that the pointing errors induce a hemispheric bias in estimates of radial current density., Comment: Under revisions at ApJS
Published: 2022
Full Text: View/download PDF

10. The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs

Author: Rockwell, Chris, Johnson, Justin, and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a simple baseline for directly estimating the relative pose (rotation and translation, including scale) between two images. Deep methods have recently shown strong progress but often require complex or multi-stage architectures. We show that a handful of modifications can be applied to a Vision Transformer (ViT) to bring its computations close to the Eight-Point Algorithm. This inductive bias enables a simple method to be competitive in multiple settings, often substantially improving over the state of the art with strong performance gains in limited data regimes., Comment: Accepted to 3DV 2022; Project Page: https://crockwell.github.io/rel_pose/ Revision: Fixed Epipolar Lines in Figure 3, Figure 10
Published: 2022

11. PlaneFormers: From Sparse View Planes to 3D Reconstruction

Author: Agarwala, Samir, Jin, Linyi, Rockwell, Chris, and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present an approach for the planar surface reconstruction of a scene from images with limited overlap. This reconstruction task is challenging since it requires jointly reasoning about single image 3D reconstruction, correspondence between images, and the relative camera pose between images. Past work has proposed optimization-based approaches. We introduce a simpler approach, the PlaneFormer, that uses a transformer applied to 3D-aware plane tokens to perform 3D reasoning. Our experiments show that our approach is substantially more effective than prior work, and that several 3D-specific design decisions are crucial for its success., Comment: Accepted to ECCV 2022
Published: 2022

12. Sound Localization by Self-Supervised Time Delay Estimation

Author: Chen, Ziyang, Fouhey, David F., and Owens, Andrew
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Sounds reach one microphone in a stereo pair sooner than the other, resulting in an interaural time delay that conveys their directions. Estimating a sound's time delay requires finding correspondences between the signals recorded by each microphone. We propose to learn these correspondences through self-supervision, drawing on recent techniques from visual tracking. We adapt the contrastive random walk of Jabri et al. to learn a cycle-consistent representation from unlabeled stereo sounds, resulting in a model that performs on par with supervised methods on "in the wild" internet recordings. We also propose a multimodal contrastive learning model that solves a visually-guided localization task: estimating the time delay for a particular person in a multi-speaker mixture, given a visual representation of their face. Project site: https://ificl.github.io/stereocrw/, Comment: ECCV 2022
Published: 2022

13. Understanding 3D Object Articulation in Internet Videos

Author: Qian, Shengyi, Jin, Linyi, Rockwell, Chris, Chen, Siyi, and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary videos. While seemingly easy for humans, this problem poses many challenges for computers. We propose to approach this problem by combining a top-down detection system that finds planes that can be articulated along with an optimization approach that solves for a 3D plane that can explain a sequence of observed articulations. We show that this system can be trained on a combination of videos and 3D scan datasets. When tested on a dataset of challenging Internet videos and the Charades dataset, our approach obtains strong performance. Project site: https://jasonqsy.github.io/Articulation3D, Comment: CVPR 2022
Published: 2022

14. What's Behind the Couch? Directed Ray Distance Functions (DRDF) for 3D Scene Reconstruction

Author: Kulkarni, Nilesh, Johnson, Justin, and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: We present an approach for full 3D scene reconstruction from a single unseen image. We train on dataset of realistic non-watertight scans of scenes. Our approach predicts a distance function, since these have shown promise in handling complex topologies and large spaces. We identify and analyze two key challenges for predicting such image conditioned distance functions that have prevented their success on real 3D scene data. First, we show that predicting a conventional scene distance from an image requires reasoning over a large receptive field. Second, we analytically show that the optimal output of the network trained to predict these distance functions does not obey all the distance function properties. We propose an alternate distance function, the Directed Ray Distance Function (DRDF), that tackles both challenges. We show that a deep network trained to predict DRDFs outperforms all other methods quantitatively and qualitatively on 3D reconstruction from single image on Matterport3D, 3DFront, and ScanNet., Comment: Updated illustrations for method section. Project Page see https://nileshkulkarni.github.io/scene_drdf
Published: 2021

15. Recognizing Scenes from Novel Viewpoints

Author: Qian, Shengyi, Kirillov, Alexander, Ravi, Nikhila, Chaplot, Devendra Singh, Johnson, Justin, Fouhey, David F., and Gkioxari, Georgia
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects. In this work, we attempt to endow machines with this ability. We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories. All this without access to the RGB images from those views. We pair 2D scene recognition with an implicit 3D representation and learn from multi-view 2D annotations of hundreds of scenes without any 3D supervision beyond camera poses. We experiment on challenging datasets and demonstrate our model's ability to jointly capture semantics and geometry of novel scenes with diverse layouts, object types and shapes.
Published: 2021

16. SynthIA: A Synthetic Inversion Approximation for the Stokes Vector Fusing SDO and Hinode into a Virtual Observatory

Author: Higgins, Richard E. L., Fouhey, David F., Antiochos, Spiro K., Barnes, Graham, Cheung, Mark C. M., Hoeksema, J. Todd, Leka, KD, Liu, Yang, Schuck, Peter W., and Gombosi, Tamas I.
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Solar and Stellar Astrophysics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Both NASA's Solar Dynamics Observatory (SDO) and the JAXA/NASA Hinode mission include spectropolarimetric instruments designed to measure the photospheric magnetic field. SDO's Helioseismic and Magnetic Imager (HMI) emphasizes full-disk high-cadence and good spatial resolution data acquisition while Hinode's Solar Optical Telescope Spectro-Polarimeter (SOT-SP) focuses on high spatial resolution and spectral sampling at the cost of a limited field of view and slower temporal cadence. This work introduces a deep-learning system named SynthIA (Synthetic Inversion Approximation), that can enhance both missions by capturing the best of each instrument's characteristics. We use SynthIA to produce a new magnetogram data product, SynodeP (Synthetic Hinode Pipeline), that mimics magnetograms from the higher spectral resolution Hinode/SOT-SP pipeline, but is derived from full-disk, high-cadence, and lower spectral-resolution SDO/HMI Stokes observations. Results on held-out data show that SynodeP has good agreement with the Hinode/SOT-SP pipeline inversions, including magnetic fill fraction, which is not provided by the current SDO/HMI pipeline. SynodeP further shows a reduction in the magnitude of the 24-hour oscillations present in the SDO/HMI data. To demonstrate SynthIA's generality, we show the use of SDO/AIA data and subsets of the HMI data as inputs, which enables trade-offs between fidelity to the Hinode/SOT-SP inversions, number of observations used, and temporal artifacts. We discuss possible generalizations of SynthIA and its implications for space weather modeling. This work is part of the NASA Heliophysics DRIVE Science Center (SOLSTICE) at the University of Michigan under grant NASA 80NSSC20K0600E, and will be open-sourced.
Published: 2021
Full Text: View/download PDF

17. PixelSynth: Generating a 3D-Consistent Experience from a Single Image

Author: Rockwell, Chris, Fouhey, David F., and Johnson, Justin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancements in differentiable rendering and 3D reasoning have driven exciting results in novel view synthesis from a single image. Despite realistic results, methods are limited to relatively small view change. In order to synthesize immersive scenes, models must also be able to extrapolate. We present an approach that fuses 3D reasoning with autoregressive modeling to outpaint large view changes in a 3D-consistent manner, enabling scene synthesis. We demonstrate considerable improvement in single image large-angle view synthesis results compared to a variety of methods and possible variants across simulated and real datasets. In addition, we show increased 3D consistency compared to alternative accumulation methods. Project website: https://crockwell.github.io/pixelsynth/, Comment: In ICCV 2021
Published: 2021

18. Collision Replay: What Does Bumping Into Things Tell You About Scene Geometry?

Author: Raistrick, Alexander, Kulkarni, Nilesh, and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: What does bumping into things in a scene tell you about scene geometry? In this paper, we investigate the idea of learning from collisions. At the heart of our approach is the idea of collision replay, where we use examples of a collision to provide supervision for observations at a past frame. We use collision replay to train convolutional neural networks to predict a distribution over collision time from new images. This distribution conveys information about the navigational affordances (e.g., corridors vs open spaces) and, as we show, can be converted into the distance function for the scene geometry. We analyze this approach with an agent that has noisy actuation in a photorealistic simulator.
Published: 2021

19. Fast and Accurate Emulation of the SDO/HMI Stokes Inversion with Uncertainty Quantification

Author: Higgins, Richard E. L., Fouhey, David F., Zhang, Dichang, Antiochos, Spiro K., Barnes, Graham, Hoeksema, J. Todd, Leka, K. D., Liu, Yang, Schuck, Peter W., and Gombosi, Tamas I.
Subjects: Astrophysics - Solar and Stellar Astrophysics, Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Computer Vision and Pattern Recognition
Abstract: The Helioseismic and Magnetic Imager (HMI) onboard NASA's Solar Dynamics Observatory (SDO) produces estimates of the photospheric magnetic field which are a critical input to many space weather modelling and forecasting systems. The magnetogram products produced by HMI and its analysis pipeline are the result of a per-pixel optimization that estimates solar atmospheric parameters and minimizes disagreement between a synthesized and observed Stokes vector. In this paper, we introduce a deep learning-based approach that can emulate the existing HMI pipeline results two orders of magnitude faster than the current pipeline algorithms. Our system is a U-Net trained on input Stokes vectors and their accompanying optimization-based VFISV inversions. We demonstrate that our system, once trained, can produce high-fidelity estimates of the magnetic field and kinematic and thermodynamic parameters while also producing meaningful confidence intervals. We additionally show that despite penalizing only per-pixel loss terms, our system is able to faithfully reproduce known systematic oscillations in full-disk statistics produced by the pipeline. This emulation system could serve as an initialization for the full Stokes inversion or as an ultra-fast proxy inversion. This work is part of the NASA Heliophysics DRIVE Science Center (SOLSTICE) at the University of Michigan, under grant NASA 80NSSC20K0600E, and has been open sourced.
Published: 2021

20. Planar Surface Reconstruction from Sparse Views

Author: Jin, Linyi, Qian, Shengyi, Owens, Andrew, and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The paper studies planar surface reconstruction of indoor scenes from two views with unknown camera poses. While prior approaches have successfully created object-centric reconstructions of many scenes, they fail to exploit other structures, such as planes, which are typically the dominant components of indoor scenes. In this paper, we reconstruct planar surfaces from multiple views, while jointly estimating camera pose. Our experiments demonstrate that our method is able to advance the state of the art of reconstruction from sparse views, on challenging scenes from Matterport3D. Project site: https://jinlinyi.github.io/SparsePlanes/, Comment: Accepted to ICCV 2021 (Oral Presentation)
Published: 2021

21. Full-Body Awareness from Partial Observations

Author: Rockwell, Chris and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: There has been great progress in human 3D mesh recovery and great interest in learning about the world from consumer video data. Unfortunately current methods for 3D human mesh recovery work rather poorly on consumer video data, since on the Internet, unusual camera viewpoints and aggressive truncations are the norm rather than a rarity. We study this problem and make a number of contributions to address it: (i) we propose a simple but highly effective self-training framework that adapts human 3D mesh recovery systems to consumer videos and demonstrate its application to two recent systems; (ii) we introduce evaluation protocols and keypoint annotations for 13K frames across four consumer video datasets for studying this task, including evaluations on out-of-image keypoints; and (iii) we show that our method substantially improves PCK and human-subject judgments compared to baselines, both on test videos from the dataset it was trained on, as well as on three other datasets without further adaptation. Project website: https://crockwell.github.io/partial_humans, Comment: In ECCV 2020
Published: 2020

22. Associative3D: Volumetric Reconstruction from Sparse Views

Author: Qian, Shengyi, Jin, Linyi, and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper studies the problem of 3D volumetric reconstruction from two views of a scene with an unknown camera. While seemingly easy for humans, this problem poses many challenges for computers since it requires simultaneously reconstructing objects in the two views while also figuring out their relationship. We propose a new approach that estimates reconstructions, distributions over the camera/object and camera/camera transformations, as well as an inter-view object affinity matrix. This information is then jointly reasoned over to produce the most likely explanation of the scene. We train and test our approach on a dataset of indoor scenes, and rigorously evaluate the merits of our joint reasoning approach. Our experiments show that it is able to recover reasonable scenes from sparse views, while the problem is still challenging. Project site: https://jasonqsy.github.io/Associative3D, Comment: ECCV 2020
Published: 2020

23. Understanding Human Hands in Contact at Internet Scale

Author: Shan, Dandan, Geng, Jiaqi, Shu, Michelle, and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Hands are the central means by which humans manipulate their world and being able to reliably extract hand state information from Internet videos of humans engaged in their hands has the potential to pave the way to systems that can learn from petabytes of video data. This paper proposes steps towards this by inferring a rich representation of hands engaged in interaction method that includes: hand location, side, contact state, and a box around the object in contact. To support this effort, we gather a large-scale dataset of hands in contact with objects consisting of 131 days of footage as well as a 100K annotated hand-contact video frame dataset. The learned model on this dataset can serve as a foundation for hand-contact understanding in videos. We quantitatively evaluate it both on its own and in service of predicting and learning from 3D meshes of human hands., Comment: To appear at CVPR 2020 (Oral). Project and dataset webpage: http://fouheylab.eecs.umich.edu/~dandans/projects/100DOH/
Published: 2020

24. Novel Object Viewpoint Estimation through Reconstruction Alignment

Author: Banani, Mohamed El, Corso, Jason J., and Fouhey, David F.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The goal of this paper is to estimate the viewpoint for a novel object. Standard viewpoint estimation approaches generally fail on this task due to their reliance on a 3D model for alignment or large amounts of class-specific training data and their corresponding canonical pose. We overcome those limitations by learning a reconstruct and align approach. Our key insight is that although we do not have an explicit 3D model or a predefined canonical pose, we can still learn to estimate the object's shape in the viewer's frame and then use an image to provide our reference model or canonical pose. In particular, we propose learning two networks: the first maps images to a 3D geometry-aware feature bottleneck and is trained via an image-to-image translation loss; the second learns whether two instances of features are aligned. At test time, our model finds the relative transformation that best aligns the bottleneck features of our test image to a reference image. We evaluate our method on novel object viewpoint estimation by generalizing across different datasets, analyzing the impact of our different modules, and providing a qualitative analysis of the learned features to identify what representations are being learnt for alignment., Comment: To appear at CVPR 2020. Project page: https://mbanani.github.io/novelviewpoints/
Published: 2020

25. Articulation-aware Canonical Surface Mapping

Author: Kulkarni, Nilesh, Gupta, Abhinav, Fouhey, David F., and Tulsiani, Shubham
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape, and 2) inferring the articulation and pose of the template corresponding to the input image. While previous approaches rely on keypoint supervision for learning, we present an approach that can learn without such annotations. Our key insight is that these tasks are geometrically related, and we can obtain supervisory signal via enforcing consistency among the predictions. We present results across a diverse set of animal object categories, showing that our method can learn articulation and CSM prediction from image collections using only foreground mask labels for training. We empirically show that allowing articulation helps learn more accurate CSM prediction, and that enforcing the consistency with predicted CSM is similarly critical for learning meaningful articulation., Comment: To appear at CVPR 2020, project page https://nileshkulkarni.github.io/acsm/
Published: 2020

26. A Machine Learning Dataset Prepared From the NASA Solar Dynamics Observatory Mission

Author: Galvez, Richard, Fouhey, David F., Jin, Meng, Szenicer, Alexandre, Muñoz-Jaramillo, Andrés, Cheung, Mark C. M., Wright, Paul J., Bobra, Monica G., Liu, Yang, Mason, James, and Thomas, Rajat
Subjects: Astrophysics - Solar and Stellar Astrophysics, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Machine Learning
Abstract: In this paper we present a curated dataset from the NASA Solar Dynamics Observatory (SDO) mission in a format suitable for machine learning research. Beginning from level 1 scientific products we have processed various instrumental corrections, downsampled to manageable spatial and temporal resolutions, and synchronized observations spatially and temporally. We illustrate the use of this dataset with two example applications: forecasting future EVE irradiance from present EVE irradiance and translating HMI observations into AIA observations. For each application we provide metrics and baselines for future model comparison. We anticipate this curated dataset will facilitate machine learning research in heliophysics and the physical sciences generally, increasing the scientific return of the SDO mission. This work is a direct result of the 2018 NASA Frontier Development Laboratory Program. Please see the appendix for access to the dataset., Comment: Accepted to The Astrophysical Journal Supplement Series; 11 pages, 8 figures
Published: 2019
Full Text: View/download PDF

27. Directed Ray Distance Functions for 3D Scene Reconstruction

Author: Kulkarni, Nilesh, Johnson, Justin, Fouhey, David F., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
Published: 2022
Full Text: View/download PDF

28. From Lifestyle Vlogs to Everyday Interactions

Author: Fouhey, David F., Kuo, Wei-cheng, Efros, Alexei A., and Malik, Jitendra
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data. Most past efforts have gathered this data explicitly: starting with a laundry list of action labels, and then querying search engines for videos tagged with each label. In this work, we do the reverse and search implicitly: we start with a large collection of interaction-rich video data and then annotate and analyze it. We use Internet Lifestyle Vlogs as the source of surprisingly large and diverse interaction data. We show that by collecting the data first, we are able to achieve greater scale and far greater diversity in terms of actions and actors. Additionally, our data exposes biases built into common explicitly gathered data. We make sense of our data by analyzing the central component of interaction -- hands. We benchmark two tasks: identifying semantic object contact at the video level and non-semantic contact state at the frame level. We additionally demonstrate future prediction of hands., Comment: Project page at: http://people.eecs.berkeley.edu/~dfouhey/2017/VLOG/
Published: 2017

29. From Images to 3D Shape Attributes

Author: Fouhey, David F., Gupta, Abhinav, and Zisserman, Andrew
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Our goal in this paper is to investigate properties of 3D shape that can be determined from a single image. We define 3D shape attributes -- generic properties of the shape that capture curvature, contact and occupied space. Our first objective is to infer these 3D shape attributes from a single image. A second objective is to infer a 3D shape embedding -- a low dimensional vector representing the 3D shape. We study how the 3D shape attributes and embedding can be obtained from a single image by training a Convolutional Neural Network (CNN) for this task. We start with synthetic images so that the contribution of various cues and nuisance parameters can be controlled. We then turn to real images and introduce a large scale image dataset of sculptures containing 143K images covering 2197 works from 242 artists. For the CNN trained on the sculpture dataset we show the following: (i) which regions of the imaged sculpture are used by the CNN to infer the 3D shape attributes; (ii) that the shape embedding can be used to match previously unseen sculptures largely independent of viewpoint; and (iii) that the 3D attributes generalize to images of other (non-sculpture) object classes., Comment: Updated based on TPAMI reviews: title changed, sections reordered, moderate modifications throughout text
Published: 2016

30. Learning a Predictable and Generative Vector Representation for Objects

Author: Girdhar, Rohit, Fouhey, David F., Rodriguez, Mikel, and Gupta, Abhinav
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: What is a good vector representation of an object? We believe that it should be generative in 3D, in the sense that it can produce new 3D objects; as well as be predictable from 2D, in the sense that it can be perceived from 2D images. We propose a novel architecture, called the TL-embedding network, to learn an embedding space with these properties. The network consists of two components: (a) an autoencoder that ensures the representation is generative; and (b) a convolutional network that ensures the representation is predictable. This enables tackling a number of tasks including voxel prediction from 2D images and 3D model retrieval. Extensive experimental analysis demonstrates the usefulness and versatility of this embedding., Comment: To appear in ECCV 2016. Project webpage: rohitgirdhar.github.io/GenerativePredictableVoxels/
Published: 2016

31. PlaneFormers: From Sparse View Planes to 3D Reconstruction

Author: Agarwala, Samir, primary, Jin, Linyi, additional, Rockwell, Chris, additional, and Fouhey, David F., additional
Published: 2022
Full Text: View/download PDF

32. Sound Localization by Self-supervised Time Delay Estimation

Author: Chen, Ziyang, primary, Fouhey, David F., additional, and Owens, Andrew, additional
Published: 2022
Full Text: View/download PDF

33. In Defense of the Direct Perception of Affordances

Author: Fouhey, David F., Wang, Xiaolong, and Gupta, Abhinav
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The field of functional recognition or affordance estimation from images has seen a revival in recent years. As originally proposed by Gibson, the affordances of a scene were directly perceived from the ambient light: in other words, functional properties like sittable were estimated directly from incoming pixels. Recent work, however, has taken a mediated approach in which affordances are derived by first estimating semantics or geometry and then reasoning about the affordances. In a tribute to Gibson, this paper explores his theory of affordances as originally proposed. We propose two approaches for direct perception of affordances and show that they obtain good results and can out-perform mediated approaches. We hope this paper can rekindle discussion around direct perception and its implications in the long term.
Published: 2015

34. Designing Deep Networks for Surface Normal Estimation

Author: Wang, Xiaolong, Fouhey, David F., and Gupta, Abhinav
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In the past few years, convolutional neural nets (CNN) have shown incredible promise for learning visual representations. In this paper, we use CNNs for the task of predicting surface normals from a single image. But what is the right architecture we should use? We propose to build upon the decades of hard work in 3D scene understanding, to design new CNN architecture for the task of surface normal estimation. We show by incorporating several constraints (man-made, manhattan world) and meaningful intermediate representations (room layout, edge labels) in the architecture leads to state of the art performance on surface normal estimation. We also show that our network is quite robust and show state of the art results on other datasets as well without any fine-tuning.
Published: 2014

35. Associative3D: Volumetric Reconstruction from Sparse Views

Author: Qian, Shengyi, primary, Jin, Linyi, additional, and Fouhey, David F., additional
Published: 2020
Full Text: View/download PDF

36. Full-Body Awareness from Partial Observations

Author: Rockwell, Chris, primary and Fouhey, David F., additional
Published: 2020
Full Text: View/download PDF

37. MOVES: Manipulated Objects in Video Enable Segmentation

Author: Higgins, Richard E. L., primary and Fouhey, David F., additional
Published: 2023
Full Text: View/download PDF

38. Perspective Fields for Single Image Camera Calibration

Author: Jin, Linyi, primary, Zhang, Jianming, additional, Hold-Geoffroy, Yannick, additional, Wang, Oliver, additional, Blackburn-Matzen, Kevin, additional, Sticha, Matthew, additional, and Fouhey, David F., additional
Published: 2023
Full Text: View/download PDF

39. Skeletal morphology of bird wings is determined by thermoregulatory demand for heat dissipation in warmer climates

Author: Weeks, Brian C., primary, Harvey, Christina, additional, Tobias, Joseph A., additional, Sheard, Catherine, additional, Zhou, Zhizhuo, additional, and Fouhey, David F., additional
Published: 2023
Full Text: View/download PDF

40. Large-scale Spatial Cross-calibration of Hinode/SOT-SP and SDO/HMI

Author: Fouhey, David F., primary, Higgins, Richard E. L., additional, Antiochos, Spiro K., additional, Barnes, Graham, additional, DeRosa, Marc L., additional, Hoeksema, J. Todd, additional, Leka, K. D., additional, Liu, Yang, additional, Schuck, Peter W., additional, and Gombosi, Tamas I., additional
Published: 2023
Full Text: View/download PDF

41. Object Recognition Robust to Imperfect Depth Data

Author: Fouhey, David F., Collet, Alvaro, Hebert, Martial, Srinivasa, Siddhartha, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Fusiello, Andrea, editor, Murino, Vittorio, editor, and Cucchiara, Rita, editor
Published: 2012
Full Text: View/download PDF

42. Scene Semantics from Long-Term Observation of People

Author: Delaitre, Vincent, Fouhey, David F., Laptev, Ivan, Sivic, Josef, Gupta, Abhinav, Efros, Alexei A., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Fitzgibbon, Andrew, editor, Lazebnik, Svetlana, editor, Perona, Pietro, editor, Sato, Yoichi, editor, and Schmid, Cordelia, editor
Published: 2012
Full Text: View/download PDF

43. People Watching: Human Actions as a Cue for Single View Geometry

Author: Fouhey, David F., Delaitre, Vincent, Gupta, Abhinav, Efros, Alexei A., Laptev, Ivan, Sivic, Josef, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Fitzgibbon, Andrew, editor, Lazebnik, Svetlana, editor, Perona, Pietro, editor, Sato, Yoichi, editor, and Schmid, Cordelia, editor
Published: 2012
Full Text: View/download PDF

44. The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs

Author: Rockwell, Chris, primary, Johnson, Justin, additional, and Fouhey, David F., additional
Published: 2022
Full Text: View/download PDF

45. Understanding 3D Object Articulation in Internet Videos

Author: Qian, Shengyi, primary, Jin, Linyi, additional, Rockwell, Chris, additional, Chen, Siyi, additional, and Fouhey, David F., additional
Published: 2022
Full Text: View/download PDF

46. A deep neural network for high‐throughput measurement of functional traits on museum skeletal specimens

Author: Weeks, Brian C., primary, Zhou, Zhizhuo, additional, O'Brien, Bruce K., additional, Darling, Rachel, additional, Dean, Morgan, additional, Dias, Tiffany, additional, Hassena, Gemmechu, additional, Zhang, Mingyu, additional, and Fouhey, David F., additional
Published: 2022
Full Text: View/download PDF

47. SynthIA: A Synthetic Inversion Approximation for the Stokes Vector Fusing SDO and Hinode into a Virtual Observatory

Author: Higgins, Richard E. L., primary, Fouhey, David F., additional, Antiochos, Spiro K., additional, Barnes, Graham, additional, Cheung, Mark C. M., additional, Hoeksema, J. Todd, additional, Leka, K. D., additional, Liu, Yang, additional, Schuck, Peter W., additional, and Gombosi, Tamas I., additional
Published: 2022
Full Text: View/download PDF

48. People Watching: Human Actions as a Cue for Single View Geometry

Author: Fouhey, David F., Delaitre, Vincent, Gupta, Abhinav, Efros, Alexei A., Laptev, Ivan, and Sivic, Josef
Published: 2014
Full Text: View/download PDF

49. Learning a Predictable and Generative Vector Representation for Objects

Author: Girdhar, Rohit, primary, Fouhey, David F., additional, Rodriguez, Mikel, additional, and Gupta, Abhinav, additional
Published: 2016
Full Text: View/download PDF

50. Author response for 'A deep neural network for high-throughput measurement of functional traits on museum skeletal specimens'

Author: null Weeks, Brian C., null Zhou, Zhizhuo, null O'Brien, Bruce K., null Darling, Rachel, null Dean, Morgan, null Dias, Tiffany, null Hassena, Gemmechu, null Zhang, Mingyu, and null Fouhey, David F.
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

147 results on '"Fouhey, David F."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources