Author: "Xue, Zihui" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Xue, Zihui"' showing total 25 results

Start Over Author "Xue, Zihui"

25 results on '"Xue, Zihui"'

1. Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Author: Chen, Changan, Peng, Puyuan, Baid, Ami, Xue, Zihui, Hsu, Wei-Ning, Harwath, David, and Grauman, Kristen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Generating realistic audio for human actions is important for many applications, such as creating sound effects for films or virtual reality games. Existing approaches implicitly assume total correspondence between the video and audio during training, yet many sounds happen off-screen and have weak to no correspondence with the visuals -- resulting in uncontrolled ambient sounds or hallucinations at test time. We propose a novel ambient-aware audio generation model, AV-LDM. We devise a novel audio-conditioning mechanism to learn to disentangle foreground action sounds from the ambient background sounds in in-the-wild training videos. Given a novel silent video, our model uses retrieval-augmented generation to create audio that matches the visual content both semantically and temporally. We train and evaluate our model on two in-the-wild egocentric video datasets, Ego4D and EPIC-KITCHENS, and we introduce Ego4D-Sounds -- 1.2M curated clips with action-audio correspondence. Our model outperforms an array of existing methods, allows controllable generation of the ambient sound, and even shows promise for generalizing to computer graphics game clips. Overall, our approach is the first to focus video-to-audio generation faithfully on the observed visual content despite training from uncurated clips with natural background sounds., Comment: Project page: https://vision.cs.utexas.edu/projects/action2sound. ECCV 2024 camera-ready version
Published: 2024

2. HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

Author: Xue, Zihui, Luo, Mi, Chen, Changan, and Grauman, Kristen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We study the problem of precisely swapping objects in videos, with a focus on those interacted with by hands, given one user-provided reference object image. Despite the great advancements that diffusion models have made in video editing recently, these models often fall short in handling the intricacies of hand-object interactions (HOI), failing to produce realistic edits -- especially when object swapping results in object shape or functionality changes. To bridge this gap, we present HOI-Swap, a novel diffusion-based video editing framework trained in a self-supervised manner. Designed in two stages, the first stage focuses on object swapping in a single frame with HOI awareness; the model learns to adjust the interaction patterns, such as the hand grasp, based on changes in the object's properties. The second stage extends the single-frame edit across the entire sequence; we achieve controllable motion alignment with the original video by: (1) warping a new sequence from the stage-I edited frame based on sampled motion points and (2) conditioning video generation on the warped sequence. Comprehensive qualitative and quantitative evaluations demonstrate that HOI-Swap significantly outperforms existing methods, delivering high-quality video edits with realistic HOIs., Comment: Project website: https://vision.cs.utexas.edu/projects/HOI-Swap/
Published: 2024

3. Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos

Author: Luo, Mi, Xue, Zihui, Dimakis, Alex, and Grauman, Kristen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We investigate exocentric-to-egocentric cross-view translation, which aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective. To this end, we propose a generative framework called Exo2Ego that decouples the translation process into two stages: high-level structure transformation, which explicitly encourages cross-view correspondence between exocentric and egocentric views, and a diffusion-based pixel-level hallucination, which incorporates a hand layout prior to enhance the fidelity of the generated egocentric view. To pave the way for future advancements in this field, we curate a comprehensive exo-to-ego cross-view translation benchmark. It consists of a diverse collection of synchronized ego-exo tabletop activity video pairs sourced from three public datasets: H2O, Aria Pilot, and Assembly101. The experimental results validate that Exo2Ego delivers photorealistic video results with clear hand manipulation details and outperforms several baselines in terms of both synthesis quality and generalization ability to new actions., Comment: 22 pages
Published: 2024

4. Detours for Navigating Instructional Videos

Author: Ashutosh, Kumar, Xue, Zihui, Nagarajan, Tushar, and Grauman, Kristen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce the video detours problem for navigating instructional videos. Given a source video and a natural language query asking to alter the how-to video's current path of execution in a certain way, the goal is to find a related ''detour video'' that satisfies the requested alteration. To address this challenge, we propose VidDetours, a novel video-language approach that learns to retrieve the targeted temporal segments from a large repository of how-to's using video-and-text conditioned queries. Furthermore, we devise a language-based pipeline that exploits how-to video narration text to create weakly supervised training data. We demonstrate our idea applied to the domain of how-to cooking videos, where a user can detour from their current recipe to find steps with alternate ingredients, tools, and techniques. Validating on a ground truth annotated dataset of 16K samples, we show our model's significant improvements over best available methods for video retrieval and question answering, with recall rates exceeding the state of the art by 35%., Comment: CVPR 2024
Published: 2024

5. Learning Object State Changes in Videos: An Open-World Perspective

Author: Xue, Zihui, Ashutosh, Kumar, and Grauman, Kristen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Object State Changes (OSCs) are pivotal for video understanding. While humans can effortlessly generalize OSC understanding from familiar to unknown objects, current approaches are confined to a closed vocabulary. Addressing this gap, we introduce a novel open-world formulation for the video OSC problem. The goal is to temporally localize the three stages of an OSC -- the object's initial state, its transitioning state, and its end state -- whether or not the object has been observed during training. Towards this end, we develop VidOSC, a holistic learning approach that: (1) leverages text and vision-language models for supervisory signals to obviate manually labeling OSC training data, and (2) abstracts fine-grained shared state representations from objects to enhance generalization. Furthermore, we present HowToChange, the first open-world benchmark for video OSC localization, which offers an order of magnitude increase in the label space and annotation volume compared to the best existing benchmark. Experimental results demonstrate the efficacy of our approach, in both traditional closed-world and open-world scenarios., Comment: Accepted by CVPR 2024, Project website: https://vision.cs.utexas.edu/projects/VidOSC/
Published: 2023

6. Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos

Author: Luo, Mi, Xue, Zihui, Dimakis, Alex, Grauman, Kristen, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

7. Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Author: Grauman, Kristen, Westbury, Andrew, Torresani, Lorenzo, Kitani, Kris, Malik, Jitendra, Afouras, Triantafyllos, Ashutosh, Kumar, Baiyya, Vijay, Bansal, Siddhant, Boote, Bikram, Byrne, Eugene, Chavis, Zach, Chen, Joya, Cheng, Feng, Chu, Fu-Jen, Crane, Sean, Dasgupta, Avijit, Dong, Jing, Escobar, Maria, Forigua, Cristhian, Gebreselasie, Abrham, Haresh, Sanjay, Huang, Jing, Islam, Md Mohaiminul, Jain, Suyog, Khirodkar, Rawal, Kukreja, Devansh, Liang, Kevin J, Liu, Jia-Wei, Majumder, Sagnik, Mao, Yongsen, Martin, Miguel, Mavroudi, Effrosyni, Nagarajan, Tushar, Ragusa, Francesco, Ramakrishnan, Santhosh Kumar, Seminara, Luigi, Somayazulu, Arjun, Song, Yale, Su, Shan, Xue, Zihui, Zhang, Edward, Zhang, Jinxu, Castillo, Angela, Chen, Changan, Fu, Xinzhu, Furuta, Ryosuke, Gonzalez, Cristina, Gupta, Prince, Hu, Jiabo, Huang, Yifei, Huang, Yiming, Khoo, Weslie, Kumar, Anush, Kuo, Robert, Lakhavani, Sach, Liu, Miao, Luo, Mi, Luo, Zhengyi, Meredith, Brighid, Miller, Austin, Oguntola, Oluwatumininu, Pan, Xiaqing, Peng, Penny, Pramanick, Shraman, Ramazanova, Merey, Ryan, Fiona, Shan, Wei, Somasundaram, Kiran, Song, Chenan, Southerland, Audrey, Tateno, Masatoshi, Wang, Huiyu, Wang, Yuchen, Yagi, Takuma, Yan, Mingfei, Yang, Xitong, Yu, Zecheng, Zha, Shengxin Cindy, Zhao, Chen, Zhao, Ziwei, Zhu, Zhifan, Zhuo, Jeff, Arbelaez, Pablo, Bertasius, Gedas, Crandall, David, Damen, Dima, Engel, Jakob, Farinella, Giovanni Maria, Furnari, Antonino, Ghanem, Bernard, Hoffman, Judy, Jawahar, C. V., Newcombe, Richard, Park, Hyun Soo, Rehg, James M., Sato, Yoichi, Savva, Manolis, Shi, Jianbo, Shou, Mike Zheng, and Wray, Michael
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,286 hours of video combined. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions -- including a novel "expert commentary" done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources are open sourced to fuel new research in the community. Project page: http://ego-exo4d-data.org/, Comment: Expanded manuscript (compared to arxiv v1 from Nov 2023 and CVPR 2024 paper from June 2024) for more comprehensive dataset and benchmark presentation, plus new results on v2 data release
Published: 2023

8. Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment

Author: Xue, Zihui and Grauman, Kristen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The egocentric and exocentric viewpoints of a human activity look dramatically different, yet invariant representations to link them are essential for many potential applications in robotics and augmented reality. Prior work is limited to learning view-invariant features from paired synchronized viewpoints. We relax that strong data assumption and propose to learn fine-grained action features that are invariant to the viewpoints by aligning egocentric and exocentric videos in time, even when not captured simultaneously or in the same environment. To this end, we propose AE2, a self-supervised embedding approach with two key designs: (1) an object-centric encoder that explicitly focuses on regions corresponding to hands and active objects; (2) a contrastive-based alignment objective that leverages temporally reversed frames as negative samples. For evaluation, we establish a benchmark for fine-grained video understanding in the ego-exo context, comprising four datasets -- including an ego tennis forehand dataset we collected, along with dense per-frame labels we annotated for each dataset. On the four datasets, our AE2 method strongly outperforms prior work in a variety of fine-grained downstream tasks, both in regular and cross-view settings., Comment: Project website: https://vision.cs.utexas.edu/projects/AlignEgoExo/
Published: 2023

9. Egocentric Video Task Translation @ Ego4D Challenge 2022

Author: Xue, Zihui, Song, Yale, Grauman, Kristen, and Torresani, Lorenzo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This technical report describes the EgoTask Translation approach that explores relations among a set of egocentric video tasks in the Ego4D challenge. To improve the primary task of interest, we propose to leverage existing models developed for other related tasks and design a task translator that learns to ''translate'' auxiliary task features to the primary task. With no modification to the baseline architectures, our proposed approach achieves competitive performance on two Ego4D challenges, ranking the 1st in the talking to me challenge and the 3rd in the PNR keyframe localization challenge., Comment: The technical report of ECCV@2022 Ego4D challenge
Published: 2023

10. Egocentric Video Task Translation

Author: Xue, Zihui, Song, Yale, Grauman, Kristen, and Torresani, Lorenzo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Different video understanding tasks are typically treated in isolation, and even with distinct types of curated data (e.g., classifying sports in one dataset, tracking animals in another). However, in wearable cameras, the immersive egocentric perspective of a person engaging with the world around them presents an interconnected web of video understanding tasks -- hand-object manipulations, navigation in the space, or human-human interactions -- that unfold continuously, driven by the person's goals. We argue that this calls for a much more unified approach. We propose EgoTask Translation (EgoT2), which takes a collection of models optimized on separate tasks and learns to translate their outputs for improved performance on any or all of them at once. Unlike traditional transfer or multi-task learning, EgoT2's flipped design entails separate task-specific backbones and a task translator shared across all tasks, which captures synergies between even heterogeneous tasks and mitigates task competition. Demonstrating our model on a wide array of video tasks from Ego4D, we show its advantages over existing transfer paradigms and achieve top-ranked results on four of the Ego4D 2022 benchmark challenges., Comment: Accepted by CVPR 2023 (Highlight), Project website: https://vision.cs.utexas.edu/projects/egot2/
Published: 2022

11. The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation

Author: Xue, Zihui, Gao, Zhengqi, Ren, Sucheng, and Zhao, Hang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Crossmodal knowledge distillation (KD) extends traditional knowledge distillation to the area of multimodal learning and demonstrates great success in various applications. To achieve knowledge transfer across modalities, a pretrained network from one modality is adopted as the teacher to provide supervision signals to a student network learning from another modality. In contrast to the empirical success reported in prior works, the working mechanism of crossmodal KD remains a mystery. In this paper, we present a thorough understanding of crossmodal KD. We begin with two case studies and demonstrate that KD is not a universal cure in crossmodal knowledge transfer. We then present the modality Venn diagram to understand modality relationships and the modality focusing hypothesis revealing the decisive factor in the efficacy of crossmodal KD. Experimental results on 6 multimodal datasets help justify our hypothesis, diagnose failure cases, and point directions to improve crossmodal knowledge transfer in the future., Comment: Accepted by ICLR 2023 (top-5%). The first three authors contribute equally. Project website: https://zihuixue.github.io/MFH/index.html
Published: 2022

12. Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization

Author: Gao, Zhengqi, Ren, Sucheng, Xue, Zihui, Li, Siting, and Zhao, Hang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Multimodal fusion emerges as an appealing technique to improve model performances on many tasks. Nevertheless, the robustness of such fusion methods is rarely involved in the present literature. In this paper, we propose a training-free robust late-fusion method by exploiting conditional independence assumption and Jacobian regularization. Our key is to minimize the Frobenius norm of a Jacobian matrix, where the resulting optimization problem is relaxed to a tractable Sylvester equation. Furthermore, we provide a theoretical error bound of our method and some insights about the function of the extra modality. Several numerical experiments on AV-MNIST, RAVDESS, and VGGsound demonstrate the efficacy of our method under both adversarial attacks and random corruptions.
Published: 2022

13. Dynamic Multimodal Fusion

Author: Xue, Zihui and Marculescu, Radu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Multimedia
Abstract: Deep multimodal learning has achieved great progress in recent years. However, current fusion approaches are static in nature, i.e., they process and fuse multimodal inputs with identical computation, without accounting for diverse computational demands of different multimodal data. In this work, we propose dynamic multimodal fusion (DynMM), a new approach that adaptively fuses multimodal data and generates data-dependent forward paths during inference. To this end, we propose a gating function to provide modality-level or fusion-level decisions on-the-fly based on multimodal features and a resource-aware loss function that encourages computational efficiency. Results on various multimodal tasks demonstrate the efficiency and wide applicability of our approach. For instance, DynMM can reduce the computation costs by 46.5% with only a negligible accuracy loss (CMU-MOSEI sentiment analysis) and improve segmentation performance with over 21% savings in computation (NYU Depth V2 semantic segmentation) when compared with static fusion approaches. We believe our approach opens a new direction towards dynamic multimodal network design, with applications to a wide range of multimodal tasks., Comment: Accepted by 6th Multi-Modal Learning and Applications Workshop (MULA), CVPR 2023. Code available at: https://github.com/zihuixue/DynMM
Published: 2022

14. SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning

Author: Xue, Zihui, Yang, Yuedong, Yang, Mengtian, and Marculescu, Radu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Graph Neural Networks (GNNs) have demonstrated a great potential in a variety of graph-based applications, such as recommender systems, drug discovery, and object recognition. Nevertheless, resource-efficient GNN learning is a rarely explored topic despite its many benefits for edge computing and Internet of Things (IoT) applications. To improve this state of affairs, this work proposes efficient subgraph-level training via resource-aware graph partitioning (SUGAR). SUGAR first partitions the initial graph into a set of disjoint subgraphs and then performs local training at the subgraph-level. We provide a theoretical analysis and conduct extensive experiments on five graph benchmarks to verify its efficacy in practice. Our results show that SUGAR can achieve up to 33 times runtime speedup and 3.8 times memory reduction on large-scale graphs. We believe SUGAR opens a new research direction towards developing GNN methods that are resource-efficient, hence suitable for IoT deployment.
Published: 2022

15. Co-advise: Cross Inductive Bias Distillation

Author: Ren, Sucheng, Gao, Zhengqi, Hua, Tianyu, Xue, Zihui, Tian, Yonglong, He, Shengfeng, and Zhao, Hang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Transformers recently are adapted from the community of natural language processing as a promising substitute of convolution-based neural networks for visual learning tasks. However, its supremacy degenerates given an insufficient amount of training data (e.g., ImageNet). To make it into practical utility, we propose a novel distillation-based method to train vision transformers. Unlike previous works, where merely heavy convolution-based teachers are provided, we introduce lightweight teachers with different architectural inductive biases (e.g., convolution and involution) to co-advise the student transformer. The key is that teachers with different inductive biases attain different knowledge despite that they are trained on the same dataset, and such different knowledge compounds and boosts the student's performance during distillation. Equipped with this cross inductive bias distillation method, our vision transformers (termed as CivT) outperform all previous transformers of the same architecture on ImageNet.
Published: 2021

16. What Makes Multi-modal Learning Better than Single (Provably)

Author: Huang, Yu, Du, Chenzhuang, Xue, Zihui, Chen, Xuanyao, Zhao, Hang, and Huang, Longbo
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: The world provides us with data of multiple modalities. Intuitively, models fusing data from different modalities outperform their uni-modal counterparts, since more information is aggregated. Recently, joining the success of deep learning, there is an influential line of work on deep multi-modal learning, which has remarkable empirical results on various applications. However, theoretical justifications in this field are notably lacking. Can multi-modal learning provably perform better than uni-modal? In this paper, we answer this question under a most popular multi-modal fusion framework, which firstly encodes features from different modalities into a common latent space and seamlessly maps the latent representations into the task space. We prove that learning with multiple modalities achieves a smaller population risk than only using its subset of modalities. The main intuition is that the former has a more accurate estimate of the latent space representation. To the best of our knowledge, this is the first theoretical treatment to capture important qualitative phenomena observed in real multi-modal applications from the generalization perspective. Combining with experiment results, we show that multi-modal learning does possess an appealing formal guarantee., Comment: Accepted to NeurIPS 2021
Published: 2021

17. On Feature Decorrelation in Self-Supervised Learning

Author: Hua, Tianyu, Wang, Wenxiao, Xue, Zihui, Ren, Sucheng, Wang, Yue, and Zhao, Hang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Abstract: In self-supervised representation learning, a common idea behind most of the state-of-the-art approaches is to enforce the robustness of the representations to predefined augmentations. A potential issue of this idea is the existence of completely collapsed solutions (i.e., constant features), which are typically avoided implicitly by carefully chosen implementation details. In this work, we study a relatively concise framework containing the most common components from recent approaches. We verify the existence of complete collapse and discover another reachable collapse pattern that is usually overlooked, namely dimensional collapse. We connect dimensional collapse with strong correlations between axes and consider such connection as a strong motivation for feature decorrelation (i.e., standardizing the covariance matrix). The gains from feature decorrelation are verified empirically to highlight the importance and the potential of this insight., Comment: ICCV 2021 Oral. The first two authors contribute equally
Published: 2021

18. Multimodal Knowledge Expansion

Author: Xue, Zihui, Ren, Sucheng, Gao, Zhengqi, and Zhao, Hang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: The popularity of multimodal sensors and the accessibility of the Internet have brought us a massive amount of unlabeled multimodal data. Since existing datasets and well-trained models are primarily unimodal, the modality gap between a unimodal network and unlabeled multimodal data poses an interesting problem: how to transfer a pre-trained unimodal network to perform the same task on unlabeled multimodal data? In this work, we propose multimodal knowledge expansion (MKE), a knowledge distillation-based framework to effectively utilize multimodal data without requiring labels. Opposite to traditional knowledge distillation, where the student is designed to be lightweight and inferior to the teacher, we observe that a multimodal student model consistently denoises pseudo labels and generalizes better than its teacher. Extensive experiments on four tasks and different modalities verify this finding. Furthermore, we connect the mechanism of MKE to semi-supervised learning and offer both empirical and theoretical explanations to understand the denoising capability of a multimodal student., Comment: Accepted by ICCV 2021. Project website: https://tsinghua-mars-lab.github.io/MKE/
Published: 2021

19. Sampling Graphlets of Multi-layer Networks: A Restricted Random Walk Approach

Author: Jiao, Simiao, Xue, Zihui, Chen, Xiaowei, and Xu, Yuedong
Subjects: Computer Science - Social and Information Networks
Abstract: Graphlets are induced subgraph patterns that are crucial to the understanding of the structure and function of a large network. A lot of efforts have been devoted to calculating graphlet statistics where random walk based approaches are commonly used to access restricted graphs through the available application programming interfaces (APIs). However, most of them merely consider individual networks while overlooking the strong coupling between different networks. In this paper, we estimate the graphlet concentration in multi-layer networks with real-world applications. An inter-layer edge connects two nodes in different layers if they belong to the same person. The access to a multi-layer network is restrictive in the sense that the upper layer allows random walk sampling, whereas the nodes of lower layers can be accessed only though the inter-layer edges and only support random node or edge sampling. To cope with this new challenge, we define a suit of two-layer graphlets and propose a novel random walk sampling algorithm to estimate the proportion of all the 3-node graphlets. An analytical bound on the sampling steps is proved to guarantee the convergence of our unbiased estimator. We further generalize our algorithm to explore the tradeoff between the estimated accuracies of different graphlets when the sample size is split on different layers. Experimental evaluation on real-world and synthetic multi-layer networks demonstrate the accuracy and high efficiency of our unbiased estimators., Comment: 14 pages
Published: 2020

20. Dynamic Multimodal Fusion

Author: Xue, Zihui, primary and Marculescu, Radu, additional
Published: 2023
Full Text: View/download PDF

21. SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning

Author: Xue, Zihui, primary, Yang, Yuedong, additional, and Marculescu, Radu, additional
Published: 2023
Full Text: View/download PDF

22. Co-advise: Cross Inductive Bias Distillation

Author: Ren, Sucheng, primary, Gao, Zhengqi, additional, Hua, Tianyu, additional, Xue, Zihui, additional, Tian, Yonglong, additional, He, Shengfeng, additional, and Zhao, Hang, additional
Published: 2022
Full Text: View/download PDF

23. Multimodal Knowledge Expansion

Author: Xue, Zihui, primary, Ren, Sucheng, additional, Gao, Zhengqi, additional, and Zhao, Hang, additional
Published: 2021
Full Text: View/download PDF

24. On Feature Decorrelation in Self-Supervised Learning

Author: Hua, Tianyu, primary, Wang, Wenxiao, additional, Xue, Zihui, additional, Ren, Sucheng, additional, Wang, Yue, additional, and Zhao, Hang, additional
Published: 2021
Full Text: View/download PDF

25. Sampling Graphlets of Multiplex Networks: A Restricted Random Walk Approach

Author: Jiao, Simiao, primary, Xue, Zihui, additional, Chen, Xiaowei, additional, and Xu, Yuedong, additional
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

25 results on '"Xue, Zihui"'

1. Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

2. HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

3. Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos

4. Detours for Navigating Instructional Videos

5. Learning Object State Changes in Videos: An Open-World Perspective

6. Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos

7. Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

8. Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment

9. Egocentric Video Task Translation @ Ego4D Challenge 2022

10. Egocentric Video Task Translation

11. The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation

12. Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization

13. Dynamic Multimodal Fusion

14. SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning

15. Co-advise: Cross Inductive Bias Distillation

16. What Makes Multi-modal Learning Better than Single (Provably)

17. On Feature Decorrelation in Self-Supervised Learning

18. Multimodal Knowledge Expansion

19. Sampling Graphlets of Multi-layer Networks: A Restricted Random Walk Approach

20. Dynamic Multimodal Fusion

21. SUGAR: Efficient Subgraph-level Training via Resource-aware Graph Partitioning

22. Co-advise: Cross Inductive Bias Distillation

23. Multimodal Knowledge Expansion

24. On Feature Decorrelation in Self-Supervised Learning

25. Sampling Graphlets of Multiplex Networks: A Restricted Random Walk Approach

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

25 results on '"Xue, Zihui"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources