Author: "Xu, Sirui" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Xu, Sirui"' showing total 20 results

Start Over Author "Xu, Sirui"

20 results on '"Xu, Sirui"'

1. DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

Author: Wu, Qingxuan, Dou, Zhiyang, Xu, Sirui, Shimada, Soshi, Wang, Chen, Yu, Zhengming, Liu, Yuan, Lin, Cheng, Cao, Zeyu, Komura, Taku, Golyanik, Vladislav, Theobalt, Christian, Wang, Wenping, and Liu, Lingjie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand-face interaction recovery, Decaf, introduces a global fitting optimization guided by contact and deformation estimation networks trained on studio-collected data with 3D annotations. However, Decaf suffers from a time-consuming optimization process and limited generalization capability due to its reliance on 3D annotations of hand-face interaction data. To address these issues, we present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image. DICE estimates the poses of hands and faces, contacts, and deformations simultaneously using a Transformer-based architecture. It features disentangling the regression of local deformation fields and global mesh vertex locations into two network branches, enhancing deformation and contact estimation for precise and robust hand-face mesh recovery. To improve generalizability, we propose a weakly-supervised training approach that augments the training set using in-the-wild images without 3D ground-truth annotations, employing the depths of 2D keypoints estimated by off-the-shelf models and adversarial priors of poses for supervision. Our experiments demonstrate that DICE achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility. Additionally, our method operates at an interactive rate (20 fps) on an Nvidia 4090 GPU, whereas Decaf requires more than 15 seconds for a single image. Our code will be publicly available upon publication., Comment: 23 pages, 9 figures, 3 tables
Published: 2024

2. InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction

Author: Xu, Sirui, Wang, Ziyin, Wang, Yu-Xiong, and Gui, Liang-Yan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Text-conditioned human motion generation has experienced significant advancements with diffusion models trained on extensive motion capture data and corresponding textual annotations. However, extending such success to 3D dynamic human-object interaction (HOI) generation faces notable challenges, primarily due to the lack of large-scale interaction data and comprehensive descriptions that align with these interactions. This paper takes the initiative and showcases the potential of generating human-object interactions without direct training on text-interaction pair data. Our key insight in achieving this is that interaction semantics and dynamics can be decoupled. Being unable to learn interaction semantics through supervised training, we instead leverage pre-trained large models, synergizing knowledge from a large language model and a text-to-motion model. While such knowledge offers high-level control over interaction semantics, it cannot grasp the intricacies of low-level interaction dynamics. To overcome this issue, we further introduce a world model designed to comprehend simple physics, modeling how human actions influence object motion. By integrating these components, our novel framework, InterDreamer, is able to generate text-aligned 3D HOI sequences in a zero-shot manner. We apply InterDreamer to the BEHAVE and CHAIRS datasets, and our comprehensive experimental analysis demonstrates its capability to generate realistic and coherent interaction sequences that seamlessly align with the text directives., Comment: Project Page: https://sirui-xu.github.io/InterDreamer/
Published: 2024

3. Appearance, Microstructure, and Bioactive Components of Bletilla striata Tuber as Affected by Different Drying Methods

Author: Li, Lihong, Zhang, Man, Lu, Chenfei, Xu, Sirui, Fu, Zhongdong, Lin, Ding, and Zheng, Ying
Published: 2024
Full Text: View/download PDF

4. InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion

Author: Xu, Sirui, Li, Zhengyuan, Wang, Yu-Xiong, and Gui, Liang-Yan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics
Abstract: This paper addresses a novel task of anticipating 3D human-object interactions (HOIs). Most existing research on HOI synthesis lacks comprehensive whole-body interactions with dynamic objects, e.g., often limited to manipulating small or static objects. Our task is significantly more challenging, as it requires modeling dynamic objects with various shapes, capturing whole-body motion, and ensuring physically valid interactions. To this end, we propose InterDiff, a framework comprising two key steps: (i) interaction diffusion, where we leverage a diffusion model to encode the distribution of future human-object interactions; (ii) interaction correction, where we introduce a physics-informed predictor to correct denoised HOIs in a diffusion step. Our key insight is to inject prior knowledge that the interactions under reference with respect to contact points follow a simple pattern and are easily predictable. Experiments on multiple human-object interaction datasets demonstrate the effectiveness of our method for this task, capable of producing realistic, vivid, and remarkably long-term 3D HOI predictions., Comment: ICCV 2023; Project Page: https://sirui-xu.github.io/InterDiff/
Published: 2023

5. Stochastic Multi-Person 3D Motion Forecasting

Author: Xu, Sirui, Wang, Yu-Xiong, and Gui, Liang-Yan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper aims to deal with the ignored real-world complexities in prior work on human motion forecasting, emphasizing the social properties of multi-person motion, the diversity of motion and social interactions, and the complexity of articulated motion. To this end, we introduce a novel task of stochastic multi-person 3D motion forecasting. We propose a dual-level generative modeling framework that separately models independent individual motion at the local level and social interactions at the global level. Notably, this dual-level modeling mechanism can be achieved within a shared generative model, through introducing learnable latent codes that represent intents of future motion and switching the codes' modes of operation at different levels. Our framework is general; we instantiate it with different generative models, including generative adversarial networks and diffusion models, and various multi-person forecasting models. Extensive experiments on CMU-Mocap, MuPoTS-3D, and SoMoF benchmarks show that our approach produces diverse and accurate multi-person predictions, significantly outperforming the state of the art., Comment: ICLR 2023 (Top 25% Paper); Project Page: https://sirui-xu.github.io/DuMMF
Published: 2023

6. Diverse Human Motion Prediction Guided by Multi-Level Spatial-Temporal Anchors

Author: Xu, Sirui, Wang, Yu-Xiong, and Gui, Liang-Yan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Predicting diverse human motions given a sequence of historical poses has received increasing attention. Despite rapid progress, existing work captures the multi-modal nature of human motions primarily through likelihood-based sampling, where the mode collapse has been widely observed. In this paper, we propose a simple yet effective approach that disentangles randomly sampled codes with a deterministic learnable component named anchors to promote sample precision and diversity. Anchors are further factorized into spatial anchors and temporal anchors, which provide attractively interpretable control over spatial-temporal disparity. In principle, our spatial-temporal anchor-based sampling (STARS) can be applied to different motion predictors. Here we propose an interaction-enhanced spatial-temporal graph convolutional network (IE-STGCN) that encodes prior knowledge of human motions (e.g., spatial locality), and incorporate the anchors into it. Extensive experiments demonstrate that our approach outperforms state of the art in both stochastic and deterministic prediction, suggesting it as a unified framework for modeling human motions. Our code and pretrained models are available at https://github.com/Sirui-Xu/STARS., Comment: ECCV 2022 (Oral); Project Page: https://sirui-xu.github.io/STARS/
Published: 2023
Full Text: View/download PDF

7. Insight into saffron associated microbiota from different origins and explore the endophytes for enhancement of bioactive compounds

Author: Xu, Sirui, Hong, Liang, Wu, Tong, Liu, Xinting, Ding, Zihan, Liu, Li, Shao, Qingsong, Zheng, Ying, and Xing, Bingcong
Published: 2024
Full Text: View/download PDF

8. Visible-light-driven calcium alginate hydrogel encapsulating BiOBr0.75I0.25 for efficient removal of oxytetracycline from wastewater

Author: Zhao, Meihua, Xu, Sirui, Nkinahamira, François, Liao, Weiquan, Rong, Hongwei, Zhong, Siming, Zhou, Xiasong, Chen, Chunlian, and Chen, Shangchun
Published: 2024
Full Text: View/download PDF

9. The improvement of kinsenoside in wild-imitated cultivation Anoectochilus roxburghii associated with endophytic community

Author: Zheng, Ying, Li, Lihong, Liu, Xinting, Xu, Sirui, Sun, Xutong, Zhang, Zili, Guo, Haipeng, and Shao, Qingsong
Published: 2024
Full Text: View/download PDF

10. Dual-phase nanostructure of amorphous carbon and TaCB solid solution: Robust high-performance protective coating for marine equipment

Author: Dong, Chuanyao, Dai, Xuan, Lv, Tianshu, Li, Yiwei, Zhang, Wentao, Xu, Sirui, Wen, Mao, and Zhang, Kan
Published: 2023
Full Text: View/download PDF

11. Robust high-performance self-lubrication of nanostructured Mo-S-Cu-B film

Author: Pan, Jingjie, Sun, Weidong, Dong, Chuanyao, Gu, Xinlei, Xu, Sirui, and Zhang, Kan
Published: 2023
Full Text: View/download PDF

12. Diverse Human Motion Prediction Guided by Multi-level Spatial-Temporal Anchors

Author: Xu, Sirui, Wang, Yu-Xiong, Gui, Liang-Yan, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
Published: 2022
Full Text: View/download PDF

13. Knowledge-Driven and Diffusion Model-Based Methods for Generating Historical Building Facades: A Case Study of Traditional Minnan Residences in China

Author: Xu, Sirui, primary, Zhang, Jiaxin, additional, and Yunqin, Li, additional
Published: 2024
Full Text: View/download PDF

14. Green GDP Calculation Methods and Practical Applications Based on an Environmental Perspective

Author: Wang, Yan, primary, Xue, Tong, additional, and Xu, Sirui, additional
Published: 2023
Full Text: View/download PDF

15. “Sandwich” wound dressing to reduce surgical site infections during sacrococcygeal surgery: A retrospective analysis

Author: Xu, Sirui, primary, Li, Song, additional, Yan, Fei, additional, Han, Shuang, additional, Lin, Shan, additional, Gu, Jiaao, additional, Yu, Zhange, additional, and Shao, Tuo, additional
Published: 2021
Full Text: View/download PDF

16. Spatial and Channel Attention Based Convolutional Neural Networks for Modeling Noisy Speech

Author: Xu, Sirui, primary and Fosler-Lussier, Eric, additional
Published: 2019
Full Text: View/download PDF

17. Application of Progressive Neural Networks for Multi-Stream Wfst Combination in One-Pass Decoding

Author: Xu, Sirui, primary and Fosler-Lussier, Eric, additional
Published: 2018
Full Text: View/download PDF

18. Indoor mapping using gmapping on embedded system

Author: Lin, Qinjie, primary, Ke, Zhaowu, additional, Bi, Sheng, additional, Xu, Sirui, additional, Liang, Yuhong, additional, Hong, Fating, additional, and Feng, Liqian, additional
Published: 2017
Full Text: View/download PDF

19. A WFST Framework for Single-Pass Multi-Stream Decoding

Author: Xu, Sirui, primary and Fosler-Lussier, Eric, additional
Published: 2016
Full Text: View/download PDF

20. Approaches for Modeling Noisy Speech

Author: Xu, Sirui
Subjects: Computer Science
Abstract: In this dissertation, we present our work that focuses on improving noisy speech recognition.Although recent ASR research has achieved considerable improvement on cleandata, the mismatch between the lab data and the noise environment in realistic speech situationsis still a major challenge for further enhancing the recognition performance of speechapplications. The variety of background noise types, the multi-speaker talking scenario aswell as the distant speaking situation are just a few problems that need to be tackled.One of the common approaches for handling noisy speech recognition is to adoptspeech enhancement methods such as denoising and beamforming to improve the qualityof the speech audio before passing it to the downstream of the speech system. Our workinstead focuses on the decoding and acoustic modeling phases of the speech pipeline, andit can still be used in conjunction with speech enhancement methods to potentially furtherimprove the speech recognition systems.The first project in Chapter 2 proposes a WFST framework for single-pass multi-streamdecoding. This work focuses on the decoding stage of the speech system, and our proposedframework for integrating disparate automatic speech recognition systems achieves one-passdecoding by using vector semirings to extend the traditional WFST. This frameworkoffers flexibility in combining systems at different levels of the decoding pipeline, and our experiments showed that it achieved comparable performance as MBR-based combinationwhile significantly reducing the computation time. The framework is also relativelymemory-efficient due to the shared decoding structure between the streams.In Chapter 3, we integrate transfer learning and system combination techniques. We applyProgressive Neural Networks (ProgNets) on modeling noisy acoustic speech and thenemploy our WFST system to achieve system combination in the decoding phase. To takeadvantage of the ability of ProgNets in transferring knowledge between different domainsor datasets, we sub-divided the data according to different noise conditions so that thetrained models can share the information about the noisy data. In addition, the word-levelcombination in the multi-stream WFST framework further helps improve the performance,as it can make use of a longer range of acoustic information for the combination. Combiningthe two techniques, our experiments achieved considerable improvement over thebaseline, which is a 7-layer DNN speaker independent system. We also compared the performanceof our system with that of frame-level acoustic fusion techniques and observedreduction in WER.The work in Chapter 4 introduces spatial and channel attention into the modeling ofnoisy speech to suppress noise and emphasize informative acoustic features during theacoustic modeling process. The spatial attention mechanism is implemented as hourglassstructures where the input features are first down sampled and then up sampled to generateattention maps, which are anticipated to assign higher weights to more importantfeatures and lower weights to noise features. At each block of the ResNet, CNN featuresare composed with the attention maps to learn to attend to the most salient acoustic featuresand suppress noises. On the other hand, the channel attention learns to attend to differentchannels according to their importance in feature maps. ResNet blocks with the spatial and channel attention modules can be easily stacked up to generate deeper networks. Weexperimented with the attended ResNet on noisy datasets and achieved promising WERreductions.We conclude the dissertation in Chapter 5 with a summary of contributions and a discussionon directions for future research.
Published: 2018

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

20 results on '"Xu, Sirui"'

1. DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image

2. InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction

3. Appearance, Microstructure, and Bioactive Components of Bletilla striata Tuber as Affected by Different Drying Methods

4. InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion

5. Stochastic Multi-Person 3D Motion Forecasting

6. Diverse Human Motion Prediction Guided by Multi-Level Spatial-Temporal Anchors

7. Insight into saffron associated microbiota from different origins and explore the endophytes for enhancement of bioactive compounds

8. Visible-light-driven calcium alginate hydrogel encapsulating BiOBr0.75I0.25 for efficient removal of oxytetracycline from wastewater

9. The improvement of kinsenoside in wild-imitated cultivation Anoectochilus roxburghii associated with endophytic community

10. Dual-phase nanostructure of amorphous carbon and TaCB solid solution: Robust high-performance protective coating for marine equipment

11. Robust high-performance self-lubrication of nanostructured Mo-S-Cu-B film

12. Diverse Human Motion Prediction Guided by Multi-level Spatial-Temporal Anchors

13. Knowledge-Driven and Diffusion Model-Based Methods for Generating Historical Building Facades: A Case Study of Traditional Minnan Residences in China

14. Green GDP Calculation Methods and Practical Applications Based on an Environmental Perspective

15. “Sandwich” wound dressing to reduce surgical site infections during sacrococcygeal surgery: A retrospective analysis

16. Spatial and Channel Attention Based Convolutional Neural Networks for Modeling Noisy Speech

17. Application of Progressive Neural Networks for Multi-Stream Wfst Combination in One-Pass Decoding

18. Indoor mapping using gmapping on embedded system

19. A WFST Framework for Single-Pass Multi-Stream Decoding

20. Approaches for Modeling Noisy Speech

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

20 results on '"Xu, Sirui"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources