Author: "Duan, Yueqi" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Duan, Yueqi"' showing total 157 results

Start Over Author "Duan, Yueqi"

157 results on '"Duan, Yueqi"'

1. PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views

Author: Fei, Xin, Zheng, Wenzhao, Duan, Yueqi, Zhan, Wei, Tomizuka, Masayoshi, Keutzer, Kurt, and Lu, Jiwen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaussians for each view and cannot generalize well to more input views. Differently, our PixelGaussian dynamically adapts both the Gaussian distribution and quantity based on geometric complexity, leading to more efficient representations and significant improvements in reconstruction quality. Specifically, we introduce a Cascade Gaussian Adapter to adjust Gaussian distribution according to local geometry complexity identified by a keypoint scorer. CGA leverages deformable attention in context-aware hypernetworks to guide Gaussian pruning and splitting, ensuring accurate representation in complex regions while reducing redundancy. Furthermore, we design a transformer-based Iterative Gaussian Refiner module that refines Gaussian representations through direct image-Gaussian interactions. Our PixelGaussian can effectively reduce Gaussian redundancy as input views increase. We conduct extensive experiments on the large-scale ACID and RealEstate10K datasets, where our method achieves state-of-the-art performance with good generalization to various numbers of views. Code: https://github.com/Barrybarry-Smith/PixelGaussian., Comment: Code is available at: https://github.com/Barrybarry-Smith/PixelGaussian
Published: 2024

2. Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection

Author: Yan, Hongru, Zheng, Yu, and Duan, Yueqi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Skins wrapping around our bodies, leathers covering over the sofa, sheet metal coating the car - it suggests that objects are enclosed by a series of continuous surfaces, which provides us with informative geometry prior for objectness deduction. In this paper, we propose Gaussian-Det which leverages Gaussian Splatting as surface representation for multi-view based 3D object detection. Unlike existing monocular or NeRF-based methods which depict the objects via discrete positional data, Gaussian-Det models the objects in a continuous manner by formulating the input Gaussians as feature descriptors on a mass of partial surfaces. Furthermore, to address the numerous outliers inherently introduced by Gaussian splatting, we accordingly devise a Closure Inferring Module (CIM) for the comprehensive surface-based objectness deduction. CIM firstly estimates the probabilistic feature residuals for partial surfaces given the underdetermined nature of Gaussian Splatting, which are then coalesced into a holistic representation on the overall surface closure of the object proposal. In this way, the surface information Gaussian-Det exploits serves as the prior on the quality and reliability of objectness and the information basis of proposal refinement. Experiments on both synthetic and real-world datasets demonstrate that Gaussian-Det outperforms various existing approaches, in terms of both average precision and recall.
Published: 2024

3. OPONeRF: One-Point-One NeRF for Robust Neural Rendering

Author: Zheng, Yu, Duan, Yueqi, Zheng, Kangfu, Yan, Hongru, Lu, Jiwen, and Zhou, Jie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we propose a One-Point-One NeRF (OPONeRF) framework for robust scene rendering. Existing NeRFs are designed based on a key assumption that the target scene remains unchanged between the training and test time. However, small but unpredictable perturbations such as object movements, light changes and data contaminations broadly exist in real-life 3D scenes, which lead to significantly defective or failed rendering results even for the recent state-of-the-art generalizable methods. To address this, we propose a divide-and-conquer framework in OPONeRF that adaptively responds to local scene variations via personalizing appropriate point-wise parameters, instead of fitting a single set of NeRF parameters that are inactive to test-time unseen changes. Moreover, to explicitly capture the local uncertainty, we decompose the point representation into deterministic mapping and probabilistic inference. In this way, OPONeRF learns the sharable invariance and unsupervisedly models the unexpected scene variations between the training and testing scenes. To validate the effectiveness of the proposed method, we construct benchmarks from both realistic and synthetic data with diverse test-time perturbations including foreground motions, illumination variations and multi-modality noises, which are more challenging than conventional generalization and temporal reconstruction benchmarks. Experimental results show that our OPONeRF outperforms state-of-the-art NeRFs on various evaluation metrics through benchmark experiments and cross-scene evaluations. We further show the efficacy of the proposed method via experimenting on other existing generalization-based benchmarks and incorporating the idea of One-Point-One NeRF into other advanced baseline methods., Comment: Project page and dataset: https://yzheng97.github.io/OPONeRF/
Published: 2024

4. ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Author: Liu, Fangfu, Sun, Wenqiang, Wang, Hanyang, Wang, Yikai, Sun, Haowen, Ye, Junliang, Zhang, Jun, and Duan, Yueqi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics
Abstract: Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a detailed scene from insufficient captured views is still an ill-posed optimization problem, often resulting in artifacts and distortions in unseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task. The key insight is to unleash the strong generative prior of large pre-trained video diffusion models for sparse-view reconstruction. However, 3D view consistency struggles to be accurately preserved in directly generated video frames from pre-trained models. To address this, given limited input views, the proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition. Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency, ensuring the coherence of the scene from various perspectives. Finally, we recover the 3D scene from the generated video through a confidence-aware 3D Gaussian Splatting optimization scheme. Extensive experiments on various real-world datasets show the superiority of our ReconX over state-of-the-art methods in terms of quality and generalizability., Comment: Project page: https://liuff19.github.io/ReconX
Published: 2024

5. DreamCinema: Cinematic Transfer with Free Camera and 3D Character

Author: Chen, Weiliang, Liu, Fangfu, Wu, Diankun, Sun, Haowen, Song, Haixu, and Duan, Yueqi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Multimedia
Abstract: We are living in a flourishing era of digital media, where everyone has the potential to become a personal filmmaker. Current research on cinematic transfer empowers filmmakers to reproduce and manipulate the visual elements (e.g., cinematography and character behaviors) from classic shots. However, characters in the reimagined films still rely on manual crafting, which involves significant technical complexity and high costs, making it unattainable for ordinary users. Furthermore, their estimated cinematography lacks smoothness due to inadequate capturing of inter-frame motion and modeling of physical trajectories. Fortunately, the remarkable success of 2D and 3D AIGC has opened up the possibility of efficiently generating characters tailored to users' needs, diversifying cinematography. In this paper, we propose DreamCinema, a novel cinematic transfer framework that pioneers generative AI into the film production paradigm, aiming at facilitating user-friendly film creation. Specifically, we first extract cinematic elements (i.e., human and camera pose) and optimize the camera trajectory. Then, we apply a character generator to efficiently create 3D high-quality characters with a human structure prior. Finally, we develop a structure-guided motion transfer strategy to incorporate generated characters into film creation and transfer it via 3D graphics engines smoothly. Extensive experiments demonstrate the effectiveness of our method for creating high-quality films with free camera and 3D characters., Comment: Project page: https://liuff19.github.io/DreamCinema
Published: 2024

6. Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Author: Liu, Fangfu, Wang, Hanyang, Yao, Shunyu, Zhang, Shengjun, Zhou, Jie, and Duan, Yueqi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics
Abstract: In recent years, there has been rapid development in 3D generation models, opening up new possibilities for applications such as simulating the dynamic movements of 3D objects and customizing their behaviors. However, current 3D generative models tend to focus only on surface features such as color and shape, neglecting the inherent physical properties that govern the behavior of objects in the real world. To accurately simulate physics-aligned dynamics, it is essential to predict the physical properties of materials and incorporate them into the behavior prediction process. Nonetheless, predicting the diverse materials of real-world objects is still challenging due to the complex nature of their physical attributes. In this paper, we propose \textbf{Physics3D}, a novel method for learning various physical properties of 3D objects through a video diffusion model. Our approach involves designing a highly generalizable physical simulation system based on a viscoelastic material model, which enables us to simulate a wide range of materials with high-fidelity capabilities. Moreover, we distill the physical priors from a video diffusion model that contains more understanding of realistic object materials. Extensive experiments demonstrate the effectiveness of our method with both elastic and plastic materials. Physics3D shows great potential for bridging the gap between the physical world and virtual neural space, providing a better integration and application of realistic physical principles in virtual environments. Project page: https://liuff19.github.io/Physics3D., Comment: Project page: https://liuff19.github.io/Physics3D
Published: 2024

7. Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

Author: Wu, Kailu, Liu, Fangfu, Cai, Zhihan, Yan, Runjie, Wang, Hanyang, Hu, Yating, Duan, Yueqi, and Ma, Kaisheng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Machine Learning, I.2.10
Abstract: In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from long per-case optimization time with inconsistent issues. Recent works address the problem and generate better 3D results either by finetuning a multi-view diffusion model or training a fast feed-forward model. However, they still lack intricate textures and complex geometries due to inconsistency and limited generated resolution. To simultaneously achieve high fidelity, consistency, and efficiency in single image-to-3D, we propose a novel framework Unique3D that includes a multi-view diffusion model with a corresponding normal diffusion model to generate multi-view images with their normal maps, a multi-level upscale process to progressively improve the resolution of generated orthographic multi-views, as well as an instant and consistent mesh reconstruction algorithm called ISOMER, which fully integrates the color and geometric priors into mesh results. Extensive experiments demonstrate that our Unique3D significantly outperforms other image-to-3D baselines in terms of geometric and textural details., Comment: Project page: https://wukailu.github.io/Unique3D
Published: 2024

8. Learning a Category-level Object Pose Estimator without Pose Annotations

Author: Tian, Fengrui, Liu, Yaoyao, Kortylewski, Adam, Duan, Yueqi, Du, Shaoyi, Yuille, Alan, and Wang, Angtian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D object pose estimation is a challenging task. Previous works always require thousands of object images with annotated poses for learning the 3D pose correspondence, which is laborious and time-consuming for labeling. In this paper, we propose to learn a category-level 3D object pose estimator without pose annotations. Instead of using manually annotated images, we leverage diffusion models (e.g., Zero-1-to-3) to generate a set of images under controlled pose differences and propose to learn our object pose estimator with those images. Directly using the original diffusion model leads to images with noisy poses and artifacts. To tackle this issue, firstly, we exploit an image encoder, which is learned from a specially designed contrastive pose learning, to filter the unreasonable details and extract image feature maps. Additionally, we propose a novel learning strategy that allows the model to learn object poses from those generated image sets without knowing the alignment of their canonical poses. Experimental results show that our method has the capability of category-level object pose estimation from a single shot setting (as pose definition), while significantly outperforming other state-of-the-art methods on the few-shot category-level object pose estimation benchmarks.
Published: 2024

9. Semantic Flow: Learning Semantic Field of Dynamic Scenes from Monocular Videos

Author: Tian, Fengrui, Duan, Yueqi, Wang, Angtian, Guo, Jianfei, and Du, Shaoyi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we pioneer Semantic Flow, a neural semantic representation of dynamic scenes from monocular videos. In contrast to previous NeRF methods that reconstruct dynamic scenes from the colors and volume densities of individual points, Semantic Flow learns semantics from continuous flows that contain rich 3D motion information. As there is 2D-to-3D ambiguity problem in the viewing direction when extracting 3D flow features from 2D video frames, we consider the volume densities as opacity priors that describe the contributions of flow features to the semantics on the frames. More specifically, we first learn a flow network to predict flows in the dynamic scene, and propose a flow feature aggregation module to extract flow features from video frames. Then, we propose a flow attention module to extract motion information from flow features, which is followed by a semantic network to output semantic logits of flows. We integrate the logits with volume densities in the viewing direction to supervise the flow features with semantic labels on video frames. Experimental results show that our model is able to learn from multiple dynamic scenes and supports a series of new tasks such as instance-level scene editing, semantic completions, dynamic scene tracking and semantic adaption on novel scenes. Codes are available at https://github.com/tianfr/Semantic-Flow/., Comment: Accepted by ICLR 2024, Codes are available at https://github.com/tianfr/Semantic-Flow/
Published: 2024

10. GeoAuxNet: Towards Universal 3D Representation Learning for Multi-sensor Point Clouds

Author: Zhang, Shengjun, Fei, Xin, and Duan, Yueqi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Point clouds captured by different sensors such as RGB-D cameras and LiDAR possess non-negligible domain gaps. Most existing methods design different network architectures and train separately on point clouds from various sensors. Typically, point-based methods achieve outstanding performances on even-distributed dense point clouds from RGB-D cameras, while voxel-based methods are more efficient for large-range sparse LiDAR point clouds. In this paper, we propose geometry-to-voxel auxiliary learning to enable voxel representations to access point-level geometric information, which supports better generalisation of the voxel-based backbone with additional interpretations of multi-sensor point clouds. Specifically, we construct hierarchical geometry pools generated by a voxel-guided dynamic point network, which efficiently provide auxiliary fine-grained geometric information adapted to different stages of voxel features. We conduct experiments on joint multi-sensor datasets to demonstrate the effectiveness of GeoAuxNet. Enjoying elaborate geometric information, our method outperforms other models collectively trained on multi-sensor datasets, and achieve competitive results with the-state-of-art experts on each single dataset., Comment: CVPR 2024
Published: 2024

11. DreamReward: Text-to-3D Generation with Human Preference

Author: Ye, Junliang, Liu, Fangfu, Li, Qixiu, Wang, Zhengyi, Wang, Yikai, Wang, Xinzhou, Duan, Yueqi, and Zhu, Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: 3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D -- the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models., Comment: Project page: https://jamesyjl.github.io/DreamReward
Published: 2024

12. Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation

Author: Liu, Fangfu, Wang, Hanyang, Chen, Weiliang, Sun, Haowen, and Duan, Yueqi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Recent years have witnessed the strong power of 3D generation models, which offer a new level of creative flexibility by allowing users to guide the 3D content generation process through a single image or natural language. However, it remains challenging for existing 3D generation methods to create subject-driven 3D content across diverse prompts. In this paper, we introduce a novel 3D customization method, dubbed Make-Your-3D that can personalize high-fidelity and consistent 3D content from only a single image of a subject with text description within 5 minutes. Our key insight is to harmonize the distributions of a multi-view diffusion model and an identity-specific 2D generative model, aligning them with the distribution of the desired 3D subject. Specifically, we design a co-evolution framework to reduce the variance of distributions, where each model undergoes a process of learning from the other through identity-aware optimization and subject-prior optimization, respectively. Extensive experiments demonstrate that our method can produce high-quality, consistent, and subject-specific 3D content with text-driven modifications that are unseen in subject image., Comment: Project page: https://liuff19.github.io/Make-Your-3D
Published: 2024

13. Memory-based Adapters for Online 3D Scene Perception

Author: Xu, Xiuwei, Xia, Chong, Wang, Ziwei, Zhao, Linqing, Duan, Yueqi, Zhou, Jie, and Lu, Jiwen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we propose a new framework for online 3D scene perception. Conventional 3D scene perception methods are offline, i.e., take an already reconstructed 3D scene geometry as input, which is not applicable in robotic applications where the input data is streaming RGB-D videos rather than a complete 3D scene reconstructed from pre-collected RGB-D videos. To deal with online 3D scene perception tasks where data collection and perception should be performed simultaneously, the model should be able to process 3D scenes frame by frame and make use of the temporal information. To this end, we propose an adapter-based plug-and-play module for the backbone of 3D scene perception model, which constructs memory to cache and aggregate the extracted RGB-D features to empower offline models with temporal learning ability. Specifically, we propose a queued memory mechanism to cache the supporting point cloud and image features. Then we devise aggregation modules which directly perform on the memory and pass temporal information to current frame. We further propose 3D-to-2D adapter to enhance image features with strong global context. Our adapters can be easily inserted into mainstream offline architectures of different tasks and significantly boost their performance on online tasks. Extensive experiments on ScanNet and SceneNN datasets demonstrate our approach achieves leading performance on three 3D scene perception tasks compared with state-of-the-art online methods by simply finetuning existing offline models, without any model and task-specific designs. \href{https://xuxw98.github.io/Online3D/}{Project page}., Comment: Accepted to CVPR24. Link: https://xuxw98.github.io/Online3D/
Published: 2024

14. OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments

Author: Zhang, Chubin, Yan, Juncheng, Wei, Yi, Li, Jiaxin, Liu, Li, Tang, Yansong, Duan, Yueqi, and Lu, Jiwen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Occupancy prediction reconstructs 3D structures of surrounding environments. It provides detailed information for autonomous driving planning and navigation. However, most existing methods heavily rely on the LiDAR point clouds to generate occupancy ground truth, which is not available in the vision-based system. In this paper, we propose an OccNeRF method for training occupancy networks without 3D supervision. Different from previous works which consider a bounded scene, we parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. The neural rendering is adopted to convert occupancy fields to multi-camera depth maps, supervised by multi-frame photometric consistency. Moreover, for semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model. Extensive experiments for both self-supervised depth estimation and 3D occupancy prediction tasks on nuScenes and SemanticKITTI datasets demonstrate the effectiveness of our method., Comment: Code: https://github.com/LinShan-Bin/OccNeRF
Published: 2023

15. Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior

Author: Liu, Fangfu, Wu, Diankun, Wei, Yi, Rao, Yongming, and Duan, Yueqi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: Recently, 3D content creation from text prompts has demonstrated remarkable progress by utilizing 2D and 3D diffusion models. While 3D diffusion models ensure great multi-view consistency, their ability to generate high-quality and diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data. However, 2D lifting methods suffer from inherent view-agnostic ambiguity thereby leading to serious multi-face Janus issues, where text prompts fail to provide sufficient guidance to learn coherent 3D results. Instead of retraining a costly viewpoint-aware model, we study how to fully exploit easily accessible coarse 3D knowledge to enhance the prompts and guide 2D lifting optimization for refinement. In this paper, we propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously. Specifically, we design a pair of guiding strategies derived from the coarse 3D prior generated by the 3D diffusion model: a structural guidance for geometric fidelity and a semantic guidance for 3D coherence. Employing the two types of guidance, the 2D diffusion model enriches the 3D content with diversified and high-quality results. Extensive experiments show the superiority of our Sherpa3D over the state-of-the-art text-to-3D methods in terms of quality and 3D consistency., Comment: Project page: https://liuff19.github.io/Sherpa3D/
Published: 2023

16. OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Author: Zheng, Wenzhao, Chen, Weiliang, Huang, Yuanhui, Zhang, Borui, Duan, Yueqi, Lu, Jiwen, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

17. Discovering Dynamic Causal Space for DAG Structure Learning

Author: Liu, Fangfu, Ma, Wenchang, Zhang, An, Wang, Xiang, Duan, Yueqi, and Chua, Tat-Seng
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Discovering causal structure from purely observational data (i.e., causal discovery), aiming to identify causal relationships among variables, is a fundamental task in machine learning. The recent invention of differentiable score-based DAG learners is a crucial enabler, which reframes the combinatorial optimization problem into a differentiable optimization with a DAG constraint over directed graph space. Despite their great success, these cutting-edge DAG learners incorporate DAG-ness independent score functions to evaluate the directed graph candidates, lacking in considering graph structure. As a result, measuring the data fitness alone regardless of DAG-ness inevitably leads to discovering suboptimal DAGs and model vulnerabilities. Towards this end, we propose a dynamic causal space for DAG structure learning, coined CASPER, that integrates the graph structure into the score function as a new measure in the causal space to faithfully reflect the causal distance between estimated and ground truth DAG. CASPER revises the learning process as well as enhances the DAG structure learning via adaptive attention to DAG-ness. Grounded by empirical visualization, CASPER, as a space, satisfies a series of desired properties, such as structure awareness and noise robustness. Extensive experiments on both synthetic and real-world datasets clearly validate the superiority of our CASPER over the state-of-the-art causal discovery methods in terms of accuracy and robustness., Comment: Accepted by KDD 2023. Our codes are available at https://github.com/liuff19/CASPER
Published: 2023
Full Text: View/download PDF

18. Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention

Author: Liu, Fangfu, Zhang, Chubin, Zheng, Yu, and Duan, Yueqi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we aim to learn a semantic radiance field from multiple scenes that is accurate, efficient and generalizable. While most existing NeRFs target at the tasks of neural scene rendering, image synthesis and multi-view reconstruction, there are a few attempts such as Semantic-NeRF that explore to learn high-level semantic understanding with the NeRF structure. However, Semantic-NeRF simultaneously learns color and semantic label from a single ray with multiple heads, where the single ray fails to provide rich semantic information. As a result, Semantic NeRF relies on positional encoding and needs to train one specific model for each scene. To address this, we propose Semantic Ray (S-Ray) to fully exploit semantic information along the ray direction from its multi-view reprojections. As directly performing dense attention over multi-view reprojected rays would suffer from heavy computational cost, we design a Cross-Reprojection Attention module with consecutive intra-view radial and cross-view sparse attentions, which decomposes contextual information along reprojected rays and cross multiple views and then collects dense connections by stacking the modules. Experiments show that our S-Ray is able to learn from multiple scenes, and it presents strong generalization ability to adapt to unseen scenes., Comment: Accepted by CVPR 2023. Project page: https://liuff19.github.io/S-Ray/
Published: 2023

19. Category-Level Multi-Part Multi-Joint 3D Shape Assembly

Author: Li, Yichen, Mo, Kaichun, Duan, Yueqi, Wang, He, Zhang, Jiequan, Shao, Lin, Matusik, Wojciech, and Guibas, Leonidas
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Shape assembly composes complex shapes geometries by arranging simple part geometries and has wide applications in autonomous robotic assembly and CAD modeling. Existing works focus on geometry reasoning and neglect the actual physical assembly process of matching and fitting joints, which are the contact surfaces connecting different parts. In this paper, we consider contacting joints for the task of multi-part assembly. A successful joint-optimized assembly needs to satisfy the bilateral objectives of shape structure and joint alignment. We propose a hierarchical graph learning approach composed of two levels of graph representation learning. The part graph takes part geometries as input to build the desired shape structure. The joint-level graph uses part joints information and focuses on matching and aligning joints. The two kinds of information are combined to achieve the bilateral objectives. Extensive experiments demonstrate that our method outperforms previous methods, achieving better shape structure and higher joint alignment accuracy.
Published: 2023

20. MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular Videos

Author: Tian, Fengrui, Du, Shaoyi, and Duan, Yueqi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we target at the problem of learning a generalizable dynamic radiance field from monocular videos. Different from most existing NeRF methods that are based on multiple views, monocular videos only contain one view at each timestamp, thereby suffering from ambiguity along the view direction in estimating point features and scene flows. Previous studies such as DynNeRF disambiguate point features by positional encoding, which is not transferable and severely limits the generalization ability. As a result, these methods have to train one independent model for each scene and suffer from heavy computational costs when applying to increasing monocular videos in real-world applications. To address this, We propose MonoNeRF to simultaneously learn point features and scene flows with point trajectory and feature correspondence constraints across frames. More specifically, we learn an implicit velocity field to estimate point trajectory from temporal features with Neural ODE, which is followed by a flow-based feature aggregation module to obtain spatial features along the point trajectory. We jointly optimize temporal and spatial features in an end-to-end manner. Experiments show that our MonoNeRF is able to learn from multiple scenes and support new applications such as scene editing, unseen frame synthesis, and fast novel scene adaptation. Codes are available at https://github.com/tianfr/MonoNeRF., Comment: Accepted by ICCV 2023
Published: 2022

21. Diffusion-SDF: Text-to-Shape via Voxelized Diffusion

Author: Li, Muheng, Duan, Yueqi, Zhou, Jie, and Lu, Jiwen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: With the rising industrial attention to 3D virtual modeling technology, generating novel 3D content based on specified conditions (e.g. text) has become a hot issue. In this paper, we propose a new generative 3D modeling framework called Diffusion-SDF for the challenging task of text-to-shape synthesis. Previous approaches lack flexibility in both 3D data representation and shape generation, thereby failing to generate highly diversified 3D shapes conforming to the given text descriptions. To address this, we propose a SDF autoencoder together with the Voxelized Diffusion model to learn and generate representations for voxelized signed distance fields (SDFs) of 3D shapes. Specifically, we design a novel UinU-Net architecture that implants a local-focused inner network inside the standard U-Net architecture, which enables better reconstruction of patch-independent SDF representations. We extend our approach to further text-to-shape tasks including text-conditioned shape completion and manipulation. Experimental results show that Diffusion-SDF generates both higher quality and more diversified 3D shapes that conform well to given text descriptions when compared to previous approaches. Code is available at: https://github.com/ttlmh/Diffusion-SDF, Comment: Accepted to CVPR 2023, project page: https://ttlmh.github.io/DiffusionSDF/
Published: 2022

22. Dynamics-aware Adversarial Attack of Adaptive Neural Networks

Author: Tao, An, Duan, Yueqi, Wang, Yingqi, Lu, Jiwen, and Zhou, Jie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we investigate the dynamics-aware adversarial attack problem of adaptive neural networks. Most existing adversarial attack algorithms are designed under a basic assumption -- the network architecture is fixed throughout the attack process. However, this assumption does not hold for many recently proposed adaptive neural networks, which adaptively deactivate unnecessary execution units based on inputs to improve computational efficiency. It results in a serious issue of lagged gradient, making the learned attack at the current step ineffective due to the architecture change afterward. To address this issue, we propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient. More specifically, we reformulate the gradients to be aware of the potential dynamic changes of network architectures, so that the learned attack better "leads" the next step than the dynamics-unaware methods when network architecture changes dynamically. Extensive experiments on representative types of adaptive neural networks for both 2D images and 3D point clouds show that our LGM achieves impressive adversarial attack performance compared with the dynamic-unaware attack methods. Code is available at https://github.com/antao97/LGM., Comment: arXiv admin note: text overlap with arXiv:2112.09428
Published: 2022
Full Text: View/download PDF

23. SEFormer: Structure Embedding Transformer for 3D Object Detection

Author: Feng, Xiaoyu, Du, Heming, Duan, Yueqi, Liu, Yongpan, and Fan, Hehe
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Effectively preserving and encoding structure features from objects in irregular and sparse LiDAR points is a key challenge to 3D object detection on point cloud. Recently, Transformer has demonstrated promising performance on many 2D and even 3D vision tasks. Compared with the fixed and rigid convolution kernels, the self-attention mechanism in Transformer can adaptively exclude the unrelated or noisy points and thus suitable for preserving the local spatial structure in irregular LiDAR point cloud. However, Transformer only performs a simple sum on the point features, based on the self-attention mechanism, and all the points share the same transformation for value. Such isotropic operation lacks the ability to capture the direction-distance-oriented local structure which is important for 3D object detection. In this work, we propose a Structure-Embedding transFormer (SEFormer), which can not only preserve local structure as traditional Transformer but also have the ability to encode the local structure. Compared to the self-attention mechanism in traditional Transformer, SEFormer learns different feature transformations for value points based on the relative directions and distances to the query point. Then we propose a SEFormer based network for high-performance 3D object detection. Extensive experiments show that the proposed architecture can achieve SOTA results on Waymo Open Dataset, the largest 3D detection benchmark for autonomous driving. Specifically, SEFormer achieves 79.02% mAP, which is 1.2% higher than existing works. We will release the codes.
Published: 2022

24. Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

Author: Shen, Shuai, Li, Wanhua, Zhu, Zheng, Duan, Yueqi, Zhou, Jie, and Lu, Jiwen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Talking head synthesis is an emerging technology with wide applications in film dubbing, virtual avatars and online education. Recent NeRF-based methods generate more natural talking videos, as they better capture the 3D structural information of faces. However, a specific model needs to be trained for each identity with a large dataset. In this paper, we propose Dynamic Facial Radiance Fields (DFRF) for few-shot talking head synthesis, which can rapidly generalize to an unseen identity with few training data. Different from the existing NeRF-based methods which directly encode the 3D geometry and appearance of a specific person into the network, our DFRF conditions face radiance field on 2D appearance images to learn the face prior. Thus the facial radiance field can be flexibly adjusted to the new identity with few reference images. Additionally, for better modeling of the facial deformations, we propose a differentiable face warping module conditioned on audio signals to deform all reference images to the query space. Extensive experiments show that with only tens of seconds of training clip available, our proposed DFRF can synthesize natural and high-quality audio-driven talking head videos for novel identities with only 40k iterations. We highly recommend readers view our supplementary video for intuitive comparisons. Code is available in https://sstzal.github.io/DFRF/., Comment: Accepted by ECCV 2022. Project page: https://sstzal.github.io/DFRF/
Published: 2022

25. 6D Camera Relocalization in Visually Ambiguous Extreme Environments

Author: Zheng, Yang, Birdal, Tolga, Xia, Fei, Yang, Yanchao, Duan, Yueqi, and Guibas, Leonidas J.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a novel method to reliably estimate the pose of a camera given a sequence of images acquired in extreme environments such as deep seas or extraterrestrial terrains. Data acquired under these challenging conditions are corrupted by textureless surfaces, image degradation, and presence of repetitive and highly ambiguous structures. When naively deployed, the state-of-the-art methods can fail in those scenarios as confirmed by our empirical analysis. In this paper, we attempt to make camera relocalization work in these extreme situations. To this end, we propose: (i) a hierarchical localization system, where we leverage temporal information and (ii) a novel environment-aware image enhancement method to boost the robustness and accuracy. Our extensive experimental results demonstrate superior performance in favor of our method under two extreme settings: localizing an autonomous underwater vehicle and localizing a planetary rover in a Mars-like desert. In addition, our method achieves comparable performance with state-of-the-art methods on the indoor benchmark (7-Scenes dataset) using only 20% training data.
Published: 2022

26. HyperDet3D: Learning a Scene-conditioned 3D Object Detector

Author: Zheng, Yu, Duan, Yueqi, Lu, Jiwen, Zhou, Jie, and Tian, Qi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: A bathtub in a library, a sink in an office, a bed in a laundry room -- the counter-intuition suggests that scene provides important prior knowledge for 3D object detection, which instructs to eliminate the ambiguous detection of similar objects. In this paper, we propose HyperDet3D to explore scene-conditioned prior knowledge for 3D object detection. Existing methods strive for better representation of local elements and their relations without scene-conditioned knowledge, which may cause ambiguity merely based on the understanding of individual points and object candidates. Instead, HyperDet3D simultaneously learns scene-agnostic embeddings and scene-specific knowledge through scene-conditioned hypernetworks. More specifically, our HyperDet3D not only explores the sharable abstracts from various 3D scenes, but also adapts the detector to the given scene at test time. We propose a discriminative Multi-head Scene-specific Attention (MSA) module to dynamically control the layer parameters of the detector conditioned on the fusion of scene-conditioned knowledge. Our HyperDet3D achieves state-of-the-art results on the 3D object detection benchmark of the ScanNet and SUN RGB-D datasets. Moreover, through cross-dataset evaluation, we show the acquired scene-conditioned prior knowledge still takes effect when facing 3D scenes with domain gap., Comment: to be published on CVPR2022
Published: 2022

27. Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Author: Li, Muheng, Chen, Lei, Duan, Yueqi, Hu, Zhilan, Feng, Jianjiang, Zhou, Jie, and Lu, Jiwen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Action recognition models have shown a promising capability to classify human actions in short video clips. In a real scenario, multiple correlated human actions commonly occur in particular orders, forming semantically meaningful human activities. Conventional action recognition approaches focus on analyzing single actions. However, they fail to fully reason about the contextual relations between adjacent actions, which provide potential temporal logic for understanding long videos. In this paper, we propose a prompt-based framework, Bridge-Prompt (Br-Prompt), to model the semantics across adjacent actions, so that it simultaneously exploits both out-of-context and contextual information from a series of ordinal actions in instructional videos. More specifically, we reformulate the individual action labels as integrated text prompts for supervision, which bridge the gap between individual action semantics. The generated text prompts are paired with corresponding video clips, and together co-train the text encoder and the video encoder via a contrastive approach. The learned vision encoder has a stronger capability for ordinal-action-related downstream tasks, e.g. action segmentation and human activity recognition. We evaluate the performances of our approach on several video datasets: Georgia Tech Egocentric Activities (GTEA), 50Salads, and the Breakfast dataset. Br-Prompt achieves state-of-the-art on multiple benchmarks. Code is available at https://github.com/ttlmh/Bridge-Prompt, Comment: Accepted to CVPR 2022
Published: 2022

28. Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network

Author: Tao, An, Duan, Yueqi, Wang, He, Wu, Ziyi, Ji, Pengliang, Sun, Haowen, Zhou, Jie, and Lu, Jiwen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we investigate the dynamics-aware adversarial attack problem in deep neural networks. Most existing adversarial attack algorithms are designed under a basic assumption -- the network architecture is fixed throughout the attack process. However, this assumption does not hold for many recently proposed networks, e.g. 3D sparse convolution network, which contains input-dependent execution to improve computational efficiency. It results in a serious issue of lagged gradient, making the learned attack at the current step ineffective due to the architecture changes afterward. To address this issue, we propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient. More specifically, we re-formulate the gradients to be aware of the potential dynamic changes of network architectures, so that the learned attack better "leads" the next step than the dynamics-unaware methods when network architecture changes dynamically. Extensive experiments on various datasets show that our LGM achieves impressive performance on semantic segmentation and classification. Compared with the dynamic-unaware methods, LGM achieves about 20% lower mIoU averagely on the ScanNet and S3DIS datasets. LGM also outperforms the recent point cloud attacks., Comment: We have improved the quality of this work and updated a new version to address the limitations of the proposed method
Published: 2021

29. Object Pursuit: Building a Space of Objects via Discriminative Weight Generation

Author: Pan, Chuanyu, Yang, Yanchao, Mo, Kaichun, Duan, Yueqi, and Guibas, Leonidas
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We propose a framework to continuously learn object-centric representations for visual learning and understanding. Existing object-centric representations either rely on supervisions that individualize objects in the scene, or perform unsupervised disentanglement that can hardly deal with complex scenes in the real world. To mitigate the annotation burden and relax the constraints on the statistical complexity of the data, our method leverages interactions to effectively sample diverse variations of an object and the corresponding training signals while learning the object-centric representations. Throughout learning, objects are streamed one by one in random order with unknown identities, and are associated with latent codes that can synthesize discriminative weights for each object through a convolutional hypernetwork. Moreover, re-identification of learned objects and forgetting prevention are employed to make the learning process efficient and robust. We perform an extensive study of the key features of the proposed framework and analyze the characteristics of the learned representations. Furthermore, we demonstrate the capability of the proposed framework in learning representations that can improve label efficiency in downstream tasks. Our code and trained models are made publicly available at: https://github.com/pptrick/Object-Pursuit., Comment: 24 pages. This paper has been accepted by ICLR2022 (OpenReview: https://openreview.net/forum?id=lbauk6wK2-y)
Published: 2021

30. Vector Neurons: A General Framework for SO(3)-Equivariant Networks

Author: Deng, Congyue, Litany, Or, Duan, Yueqi, Poulenard, Adrien, Tagliasacchi, Andrea, and Guibas, Leonidas
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Invariance and equivariance to the rotation group have been widely discussed in the 3D deep learning community for pointclouds. Yet most proposed methods either use complex mathematical tools that may limit their accessibility, or are tied to specific input data types and network architectures. In this paper, we introduce a general framework built on top of what we call Vector Neuron representations for creating SO(3)-equivariant neural networks for pointcloud processing. Extending neurons from 1D scalars to 3D vectors, our vector neurons enable a simple mapping of SO(3) actions to latent spaces thereby providing a framework for building equivariance in common neural operations -- including linear layers, non-linearities, pooling, and normalizations. Due to their simplicity, vector neurons are versatile and, as we demonstrate, can be incorporated into diverse network architecture backbones, allowing them to process geometry inputs in arbitrary poses. Despite its simplicity, our method performs comparably well in accuracy and generalization with other more complex and specialized state-of-the-art methods on classification and segmentation tasks. We also show for the first time a rotation equivariant reconstruction network.
Published: 2021

31. CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds

Author: Weng, Yijia, Wang, He, Zhou, Qiang, Qin, Yuzhe, Duan, Yueqi, Fan, Qingnan, Chen, Baoquan, Su, Hao, and Guibas, Leonidas J.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we tackle the problem of category-level online pose tracking of objects from point cloud sequences. For the first time, we propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances as well as per-part pose tracking for articulated objects from known categories. Here the 9DoF pose, comprising 6D pose and 3D size, is equivalent to a 3D amodal bounding box representation with free 6D pose. Given the depth point cloud at the current frame and the estimated pose from the last frame, our novel end-to-end pipeline learns to accurately update the pose. Our pipeline is composed of three modules: 1) a pose canonicalization module that normalizes the pose of the input depth point cloud; 2) RotationNet, a module that directly regresses small interframe delta rotations; and 3) CoordinateNet, a module that predicts the normalized coordinates and segmentation, enabling analytical computation of the 3D size and translation. Leveraging the small pose regime in the pose-canonicalized point clouds, our method integrates the best of both worlds by combining dense coordinate prediction and direct rotation regression, thus yielding an end-to-end differentiable pipeline optimized for 9DoF pose accuracy (without using non-differentiable RANSAC). Our extensive experiments demonstrate that our method achieves new state-of-the-art performance on category-level rigid object pose (NOCS-REAL275) and articulated object pose benchmarks (SAPIEN, BMVC) at the fastest FPS ~12., Comment: ICCV 2021 (Oral). Project page: https://yijiaweng.github.io/CAPTRA/
Published: 2021

32. SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation

Author: Tao, An, Duan, Yueqi, Wei, Yi, Lu, Jiwen, and Zhou, Jie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Most existing point cloud instance and semantic segmentation methods rely heavily on strong supervision signals, which require point-level labels for every point in the scene. However, such strong supervision suffers from large annotation costs, arousing the need to study efficient annotating. In this paper, we discover that the locations of instances matter for both instance and semantic 3D scene segmentation. By fully taking advantage of locations, we design a weakly-supervised point cloud segmentation method that only requires clicking on one point per instance to indicate its location for annotation. With over-segmentation for pre-processing, we extend these location annotations into segments as seg-level labels. We further design a segment grouping network (SegGroup) to generate point-level pseudo labels under seg-level labels by hierarchically grouping the unlabeled segments into the relevant nearby labeled segments, so that existing point-level supervised segmentation models can directly consume these pseudo labels for training. Experimental results show that our seg-level supervised method (SegGroup) achieves comparable results with the fully annotated point-level supervised methods. Moreover, it outperforms the recent weakly-supervised methods given a fixed annotation budget. Code is available at https://github.com/AnTao97/SegGroup.
Published: 2020
Full Text: View/download PDF

33. IF-Defense: 3D Adversarial Point Cloud Defense via Implicit Function based Restoration

Author: Wu, Ziyi, Duan, Yueqi, Wang, He, Fan, Qingnan, and Guibas, Leonidas J.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Point cloud is an important 3D data representation widely used in many essential applications. Leveraging deep neural networks, recent works have shown great success in processing 3D point clouds. However, those deep neural networks are vulnerable to various 3D adversarial attacks, which can be summarized as two primary types: point perturbation that affects local point distribution, and surface distortion that causes dramatic changes in geometry. In this paper, we simultaneously address both the aforementioned attacks by learning to restore the clean point clouds from the attacked ones. More specifically, we propose an IF-Defense framework to directly optimize the coordinates of input points with geometry-aware and distribution-aware constraints. The former aims to recover the surface of point cloud through implicit function, while the latter encourages evenly-distributed points. Our experimental results show that IF-Defense achieves the state-of-the-art defense performance against existing 3D adversarial attacks on PointNet, PointNet++, DGCNN, PointConv and RS-CNN. For example, compared with previous methods, IF-Defense presents 20.02% improvement in classification accuracy against salient point dropping attack and 16.29% against LG-GAN attack on PointNet. Our code is available at https://github.com/Wuziyi616/IF-Defense., Comment: 17 pages, 8 figures. Update several experimental results compared with v2, e.g. cross dataset evaluation and baseline results of adversarial training
Published: 2020

34. Graph-Based Social Relation Reasoning

Author: Li, Wanhua, Duan, Yueqi, Lu, Jiwen, Feng, Jianjiang, and Zhou, Jie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Human beings are fundamentally sociable -- that we generally organize our social lives in terms of relations with other people. Understanding social relations from an image has great potential for intelligent systems such as social chatbots and personal assistants. In this paper, we propose a simpler, faster, and more accurate method named graph relational reasoning network (GR2N) for social relation recognition. Different from existing methods which process all social relations on an image independently, our method considers the paradigm of jointly inferring the relations by constructing a social relation graph. Furthermore, the proposed GR2N constructs several virtual relation graphs to explicitly grasp the strong logical constraints among different types of social relations. Experimental results illustrate that our method generates a reasonable and consistent social relation graph and improves the performance in both accuracy and efficiency., Comment: ECCV 2020
Published: 2020

35. Curriculum DeepSDF

Author: Duan, Yueqi, Zhu, Haidong, Wang, He, Yi, Li, Nevatia, Ram, and Guibas, Leonidas J.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: When learning to sketch, beginners start with simple and flexible shapes, and then gradually strive for more complex and accurate ones in the subsequent training sessions. In this paper, we design a "shape curriculum" for learning continuous Signed Distance Function (SDF) on shapes, namely Curriculum DeepSDF. Inspired by how humans learn, Curriculum DeepSDF organizes the learning task in ascending order of difficulty according to the following two criteria: surface accuracy and sample difficulty. The former considers stringency in supervising with ground truth, while the latter regards the weights of hard training samples near complex geometry and fine structure. More specifically, Curriculum DeepSDF learns to reconstruct coarse shapes at first, and then gradually increases the accuracy and focuses more on complex local details. Experimental results show that a carefully-designed curriculum leads to significantly better shape reconstructions with the same training data, training epochs and network architecture as DeepSDF. We believe that the application of shape curricula can benefit the training process of a wide variety of 3D shape representation learning methods., Comment: ECCV 2020
Published: 2020

36. Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

Author: Shen, Shuai, primary, Li, Wanhua, additional, Zhu, Zheng, additional, Duan, Yueqi, additional, Zhou, Jie, additional, and Lu, Jiwen, additional
Published: 2022
Full Text: View/download PDF

37. Image Set Querying Based Localization

Author: Deng, Lei, Huang, Siyuan, Duan, Yueqi, Chen, Baohua, and Zhou, Jie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Conventional single image based localization methods usually fail to localize a querying image when there exist large variations between the querying image and the pre-built scene. To address this, we propose an image-set querying based localization approach. When the localization by a single image fails to work, the system will ask the user to capture more auxiliary images. First, a local 3D model is established for the querying image set. Then, the pose of the querying image set is estimated by solving a nonlinear optimization problem, which aims to match the local 3D model against the pre-built scene. Experiments have shown the effectiveness and feasibility of the proposed approach., Comment: VCIP2015, 4 pages
Published: 2015

38. Dynamics-aware Adversarial Attack of Adaptive Neural Networks

Author: Tao, An, primary, Duan, Yueqi, additional, Wang, Yingqi, additional, Lu, Jiwen, additional, and Zhou, Jie, additional
Published: 2024
Full Text: View/download PDF

39. Deep Variational Metric Learning

Author: Lin, Xudong, Duan, Yueqi, Dong, Qiyuan, Lu, Jiwen, Zhou, Jie, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Ferrari, Vittorio, editor, Hebert, Martial, editor, Sminchisescu, Cristian, editor, and Weiss, Yair, editor
Published: 2018
Full Text: View/download PDF

40. Curriculum DeepSDF

Author: Duan, Yueqi, primary, Zhu, Haidong, additional, Wang, He, additional, Yi, Li, additional, Nevatia, Ram, additional, and Guibas, Leonidas J., additional
Published: 2020
Full Text: View/download PDF

41. Graph-Based Social Relation Reasoning

Author: Li, Wanhua, primary, Duan, Yueqi, additional, Lu, Jiwen, additional, Feng, Jianjiang, additional, and Zhou, Jie, additional
Published: 2020
Full Text: View/download PDF

42. Discovering Dynamic Causal Space for DAG Structure Learning

Author: Liu, Fangfu, primary, Ma, Wenchang, additional, Zhang, An, additional, Wang, Xiang, additional, Duan, Yueqi, additional, and Chua, Tat-Seng, additional
Published: 2023
Full Text: View/download PDF

43. HOI-aware Adaptive Network for Weakly-supervised Action Segmentation

Author: Zhang, Runzhong, primary, Wang, Suchen, additional, Duan, Yueqi, additional, Tang, Yansong, additional, Zhang, Yue, additional, and Tan, Yap-Peng, additional
Published: 2023
Full Text: View/download PDF

44. SEFormer: Structure Embedding Transformer for 3D Object Detection

Author: Feng, Xiaoyu, primary, Du, Heming, additional, Fan, Hehe, additional, Duan, Yueqi, additional, and Liu, Yongpan, additional
Published: 2023
Full Text: View/download PDF

45. Diffusion-SDF: Text-to-Shape via Voxelized Diffusion

Author: Li, Muheng, primary, Duan, Yueqi, additional, Zhou, Jie, additional, and Lu, Jiwen, additional
Published: 2023
Full Text: View/download PDF

46. Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention

Author: Liu, Fangfu, primary, Zhang, Chubin, additional, Zheng, Yu, additional, and Duan, Yueqi, additional
Published: 2023
Full Text: View/download PDF

47. OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Author: Zheng, Wenzhao, Chen, Weiliang, Huang, Yuanhui, Zhang, Borui, Duan, Yueqi, Lu, Jiwen, Zheng, Wenzhao, Chen, Weiliang, Huang, Yuanhui, Zhang, Borui, Duan, Yueqi, and Lu, Jiwen
Abstract: Understanding how the 3D scene evolves is vital for making decisions in autonomous driving. Most existing methods achieve this by predicting the movements of object boxes, which cannot capture more fine-grained scene information. In this paper, we explore a new framework of learning a world model, OccWorld, in the 3D Occupancy space to simultaneously predict the movement of the ego car and the evolution of the surrounding scenes. We propose to learn a world model based on 3D occupancy rather than 3D bounding boxes and segmentation maps for three reasons: 1) expressiveness. 3D occupancy can describe the more fine-grained 3D structure of the scene; 2) efficiency. 3D occupancy is more economical to obtain (e.g., from sparse LiDAR points). 3) versatility. 3D occupancy can adapt to both vision and LiDAR. To facilitate the modeling of the world evolution, we learn a reconstruction-based scene tokenizer on the 3D occupancy to obtain discrete scene tokens to describe the surrounding scenes. We then adopt a GPT-like spatial-temporal generative transformer to generate subsequent scene and ego tokens to decode the future occupancy and ego trajectory. Extensive experiments on the widely used nuScenes benchmark demonstrate the ability of OccWorld to effectively model the evolution of the driving scenes. OccWorld also produces competitive planning results without using instance and map supervision. Code: https://github.com/wzzheng/OccWorld., Comment: Code is available at: https://github.com/wzzheng/OccWorld
Published: 2023

48. Learning Dynamic Scene-Conditioned 3D Object Detectors

Author: Zheng, Yu, Duan, Yueqi, Li, Zongtai, Zhou, Jie, and Lu, Jiwen
Abstract: In this paper, we propose a dynamic 3D object detector named HyperDet3D, which is adaptively adjusted based on the hyper scene-level knowledge on the fly. Existing methods strive for object-level representations of local elements and their relations without scene-level priors, which suffer from ambiguity between similarly-structured objects only based on the understanding of individual points and object candidates. Instead, we design scene-conditioned hypernetworks to simultaneously learn scene-agnostic embeddings to exploit sharable abstracts from various 3D scenes, and scene-specific knowledge which adapts the 3D detector to the given scene at test time. As a result, the lower-level ambiguity in object representations can be addressed by hierarchical context in scene priors. However, since the upstream hypernetwork in HyperDet3D takes raw scenes as input which contain noises and redundancy, it leads to sub-optimal parameters produced for the 3D detector simply under the constraint of downstream detection losses. Based on the fact that the downstream 3D detection task can be factorized into object-level semantic classification and bounding box regression, we furtherly propose HyperFormer3D by correspondingly designing their scene-level prior tasks in upstream hypernetworks, namely Semantic Occurrence and Objectness Localization. To this end, we design a transformer-based hypernetwork that translates the task-oriented scene priors into parameters of the downstream detector, which refrains from noises and redundancy of the scenes. Extensive experimental results on the ScanNet, SUN RGB-D and MatterPort3D datasets demonstrate the effectiveness of the proposed methods.
Published: 2024
Full Text: View/download PDF

49. Deep Variational Metric Learning

Author: Lin, Xudong, primary, Duan, Yueqi, additional, Dong, Qiyuan, additional, Lu, Jiwen, additional, and Zhou, Jie, additional
Published: 2018
Full Text: View/download PDF

50. Learning Deep Binary Descriptors via Bitwise Interaction Mining

Author: Wang, Ziwei, primary, Xiao, Han, additional, Duan, Yueqi, additional, Zhou, Jie, additional, and Lu, Jiwen, additional
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

157 results on '"Duan, Yueqi"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources