Author: "ZHENG, WEI" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"ZHENG, WEI"' showing total 38,758 results

Start Over Author "ZHENG, WEI"

38,758 results on '"ZHENG, WEI"'

1. Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models

Author: Fu, Shenghao, Yan, Junkai, Yang, Qize, Wei, Xihan, Xie, Xiaohua, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent vision foundation models can extract universal representations and show impressive abilities in various tasks. However, their application on object detection is largely overlooked, especially without fine-tuning them. In this work, we show that frozen foundation models can be a versatile feature enhancer, even though they are not pre-trained for object detection. Specifically, we explore directly transferring the high-level image understanding of foundation models to detectors in the following two ways. First, the class token in foundation models provides an in-depth understanding of the complex scene, which facilitates decoding object queries in the detector's decoder by providing a compact context. Additionally, the patch tokens in foundation models can enrich the features in the detector's encoder by providing semantic details. Utilizing frozen foundation models as plug-and-play modules rather than the commonly used backbone can significantly enhance the detector's performance while preventing the problems caused by the architecture discrepancy between the detector's backbone and the foundation model. With such a novel paradigm, we boost the SOTA query-based detector DINO from 49.0% AP to 51.9% AP (+2.9% AP) and further to 53.8% AP (+4.8% AP) by integrating one or two foundation models respectively, on the COCO validation set after training for 12 epochs with R50 as the detector's backbone., Comment: Accepted to NeurIPS 2024
Published: 2024

2. Real-to-Sim Grasp: Rethinking the Gap between Simulation and Real World in Grasp Detection

Author: Cai, Jia-Feng, Chen, Zibo, Wu, Xiao-Ming, Jiang, Jian-Jian, Wei, Yi-Lin, and Zheng, Wei-Shi
Subjects: Computer Science - Robotics
Abstract: For 6-DoF grasp detection, simulated data is expandable to train more powerful model, but it faces the challenge of the large gap between simulation and real world. Previous works bridge this gap with a sim-to-real way. However, this way explicitly or implicitly forces the simulated data to adapt to the noisy real data when training grasp detectors, where the positional drift and structural distortion within the camera noise will harm the grasp learning. In this work, we propose a Real-to-Sim framework for 6-DoF Grasp detection, named R2SGrasp, with the key insight of bridging this gap in a real-to-sim way, which directly bypasses the camera noise in grasp detector training through an inference-time real-to-sim adaption. To achieve this real-to-sim adaptation, our R2SGrasp designs the Real-to-Sim Data Repairer (R2SRepairer) to mitigate the camera noise of real depth maps in data-level, and the Real-to-Sim Feature Enhancer (R2SEnhancer) to enhance real features with precise simulated geometric primitives in feature-level. To endow our framework with the generalization ability, we construct a large-scale simulated dataset cost-efficiently to train our grasp detector, which includes 64,000 RGB-D images with 14.4 million grasp annotations. Sufficient experiments show that R2SGrasp is powerful and our real-to-sim perspective is effective. The real-world experiments further show great generalization ability of R2SGrasp. Project page is available on https://isee-laboratory.github.io/R2SGrasp.
Published: 2024

3. Professor X: Manipulating EEG BCI with Invisible and Robust Backdoor Attack

Author: Liu, Xuan-Hao, Song, Xinhao, He, Dexuan, Lu, Bao-Liang, and Zheng, Wei-Long
Subjects: Computer Science - Cryptography and Security, Computer Science - Human-Computer Interaction
Abstract: While electroencephalogram (EEG) based brain-computer interface (BCI) has been widely used for medical diagnosis, health care, and device control, the safety of EEG BCI has long been neglected. In this paper, we propose Professor X, an invisible and robust "mind-controller" that can arbitrarily manipulate the outputs of EEG BCI through backdoor attack, to alert the EEG community of the potential hazard. However, existing EEG attacks mainly focus on single-target class attacks, and they either require engaging the training stage of the target BCI, or fail to maintain high stealthiness. Addressing these limitations, Professor X exploits a three-stage clean label poisoning attack: 1) selecting one trigger for each class; 2) learning optimal injecting EEG electrodes and frequencies strategy with reinforcement learning for each trigger; 3) generating poisoned samples by injecting the corresponding trigger's frequencies into poisoned data for each class by linearly interpolating the spectral amplitude of both data according to previously learned strategies. Experiments on datasets of three common EEG tasks demonstrate the effectiveness and robustness of Professor X, which also easily bypasses existing backdoor defenses., Comment: 27 pages,13 figures
Published: 2024

4. Super-Heisenberg scaling in a triple point criticality

Author: Cheng, Jia-Ming, Zhang, Yong-Chang, Zhou, Xiang-Fa, and Zhou, Zheng-Wei
Subjects: Quantum Physics
Abstract: We investigate quantum-enhanced metrology in a triple point criticality and discover that quantum criticality can not always enhance measuring precision. We have developed suitable adiabatic evolution protocols approaching a final point around the triple point to effectively restrain excitations, which could accelerate the adiabatic evolutions and lead to an exponential super-Heisenberg scaling. This scaling behavior is quite valuable in practical parameter estimating experiments with limited coherence time. The super-Heisenberg scaling will degrade into a sub-Heisenberg scaling if the adiabatic parameter modulations adopted can not reduce excitations and weaken the slowing down effect. Additionally, a feasible experimental scheme is also suggested to achieve the anticipated exponential super-Heisenberg scaling. Our findings strongly indicate that criticality-enhanced metrology can indeed significantly enhance measuring precision to a super-Heisenberg scaling when combining a triple point and beneficial parameter modulations in the adiabatic evolution, which will be conducive to the exploration of other super-Heisenberg scaling and their applications.
Published: 2024

5. Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM

Author: Lim, Zheng Wei, Gupta, Nitish, Yu, Honglin, and Cohn, Trevor
Subjects: Computer Science - Computation and Language
Abstract: Multilingual large language models (LLMs) are great translators, but this is largely limited to high-resource languages. For many LLMs, translating in and out of low-resource languages remains a challenging task. To maximize data efficiency in this low-resource setting, we introduce Mufu, which includes a selection of automatically generated multilingual candidates and an instruction to correct inaccurate translations in the prompt. Mufu prompts turn a translation task into a postediting one, and seek to harness the LLM's reasoning capability with auxiliary translation candidates, from which the model is required to assess the input quality, align the semantics cross-lingually, copy from relevant inputs and override instances that are incorrect. Our experiments on En-XX translations over the Flores-200 dataset show LLMs finetuned against Mufu-style prompts are robust to poor quality auxiliary translation candidates, achieving performance superior to NLLB 1.3B distilled model in 64% of low- and very-low-resource language pairs. We then distill these models to reduce inference cost, while maintaining on average 3.1 chrF improvement over finetune-only baseline in low-resource translations., Comment: 29 pages
Published: 2024

6. Topological inverse Anderson insulator

Author: Zuo, Zheng-Wei, Lin, Jing-Run, and Kang, Dawei
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Disordered Systems and Neural Networks
Abstract: A different type of topological phase dubbed topological inverse Anderson insulators is proposed, which is characterized by the disorder-induced extended bulk states from the flat-band localization and topological edge states. Based on the topological invariant, the behaviors of the localization length of the zero-energy modes, and quantum transport, we identify its existence in several all-band-flat models with the disordered potentials or hopping including the $\pi$-flux Creutz ladder, the fully dimerized Su-Schrieffer-Heeger chain, and $\pi$-flux diamond chain. Unlike the topological Anderson insulator, where disorder induces localization and exponential suppression of transport, the disorder-assisted quantum ballistic coherent transport can appear in the topological inverse Anderson insulator. In addition, our proposal and results could be realized by the current experimental techniques., Comment: 9 pages, 5 figures
Published: 2024
Full Text: View/download PDF

7. Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization

Author: Du, Jia-Run, Lin, Kun-Yu, Meng, Jingke, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: To address the zero-shot temporal action localization (ZSTAL) task, existing works develop models that are generalizable to detect and classify actions from unseen categories. They typically develop a category-agnostic action detector and combine it with the Contrastive Language-Image Pre-training (CLIP) model to solve ZSTAL. However, these methods suffer from incomplete action proposals generated for \textit{unseen} categories, since they follow a frame-level prediction paradigm and require hand-crafted post-processing to generate action proposals. To address this problem, in this work, we propose a novel model named Generalizable Action Proposal generator (GAP), which can interface seamlessly with CLIP and generate action proposals in a holistic way. Our GAP is built in a query-based architecture and trained with a proposal-level objective, enabling it to estimate proposal completeness and eliminate the hand-crafted post-processing. Based on this architecture, we propose an Action-aware Discrimination loss to enhance the category-agnostic dynamic information of actions. Besides, we introduce a Static-Dynamic Rectifying module that incorporates the generalizable static information from CLIP to refine the predicted proposals, which improves proposal completeness in a generalizable manner. Our experiments show that our GAP achieves state-of-the-art performance on two challenging ZSTAL benchmarks, i.e., Thumos14 and ActivityNet1.3. Specifically, our model obtains significant performance improvement over previous works on the two benchmarks, i.e., +3.2% and +3.4% average mAP, respectively., Comment: Accepted to ICPR 2024. Code is available at https://github.com/Run542968/GAP
Published: 2024

8. ParGo: Bridging Vision-Language with Partial and Global Views

Author: Wang, An-Lan, Shan, Bin, Shi, Wei, Lin, Kun-Yu, Fei, Xiang, Tang, Guozhi, Liao, Lei, Tang, Jingqun, Huang, Can, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This work presents ParGo, a novel Partial-Global projector designed to connect the vision and language modalities for Multimodal Large Language Models (MLLMs). Unlike previous works that rely on global attention-based projectors, our ParGo bridges the representation gap between the separately pre-trained vision encoders and the LLMs by integrating global and partial views, which alleviates the overemphasis on prominent regions. To facilitate the effective training of ParGo, we collect a large-scale detail-captioned image-text dataset named ParGoCap-1M-PT, consisting of 1 million images paired with high-quality captions. Extensive experiments on several MLLM benchmarks demonstrate the effectiveness of our ParGo, highlighting its superiority in aligning vision and language modalities. Compared to conventional Q-Former projector, our ParGo achieves an improvement of 259.96 in MME benchmark. Furthermore, our experiments reveal that ParGo significantly outperforms other projectors, particularly in tasks that emphasize detail perception ability.
Published: 2024

9. PixelFade: Privacy-preserving Person Re-identification with Noise-guided Progressive Replacement

Author: Zhang, Delong, Peng, Yi-Xing, Wu, Xiao-Ming, Wu, Ancong, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Online person re-identification services face privacy breaches from potential data leakage and recovery attacks, exposing cloud-stored images to malicious attackers and triggering public concern. The privacy protection of pedestrian images is crucial. Previous privacy-preserving person re-identification methods are unable to resist recovery attacks and compromise accuracy. In this paper, we propose an iterative method (PixelFade) to optimize pedestrian images into noise-like images to resist recovery attacks. We first give an in-depth study of protected images from previous privacy methods, which reveal that the chaos of protected images can disrupt the learning of recovery models. Accordingly, Specifically, we propose Noise-guided Objective Function with the feature constraints of a specific authorization model, optimizing pedestrian images to normal-distributed noise images while preserving their original identity information as per the authorization model. To solve the above non-convex optimization problem, we propose a heuristic optimization algorithm that alternately performs the Constraint Operation and the Partial Replacement Operation. This strategy not only safeguards that original pixels are replaced with noises to protect privacy, but also guides the images towards an improved optimization direction to effectively preserve discriminative features. Extensive experiments demonstrate that our PixelFade outperforms previous methods in resisting recovery attacks and Re-ID performance. The code is available at https://github.com/iSEE-Laboratory/PixelFade., Comment: accepted by ACMMM24
Published: 2024

10. Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation

Author: Tian, Huilin, Meng, Jingke, Zheng, Wei-Shi, Li, Yuan-Ming, Yan, Junkai, and Zhang, Yunong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Vision and Language Navigation (VLN) is a challenging task that requires agents to understand instructions and navigate to the destination in a visual environment.One of the key challenges in outdoor VLN is keeping track of which part of the instruction was completed. To alleviate this problem, previous works mainly focus on grounding the natural language to the visual input, but neglecting the crucial role of the agent's spatial position information in the grounding process. In this work, we first explore the substantial effect of spatial position locating on the grounding of outdoor VLN, drawing inspiration from human navigation. In real-world navigation scenarios, before planning a path to the destination, humans typically need to figure out their current location. This observation underscores the pivotal role of spatial localization in the navigation process. In this work, we introduce a novel framework, Locating be for Planning (Loc4Plan), designed to incorporate spatial perception for action planning in outdoor VLN tasks. The main idea behind Loc4Plan is to perform the spatial localization before planning a decision action based on corresponding guidance, which comprises a block-aware spatial locating (BAL) module and a spatial-aware action planning (SAP) module. Specifically, to help the agent perceive its spatial location in the environment, we propose to learn a position predictor that measures how far the agent is from the next intersection for reflecting its position, which is achieved by the BAL module. After the locating process, we propose the SAP module to incorporate spatial information to ground the corresponding guidance and enhance the precision of action planning. Extensive experiments on the Touchdown and map2seq datasets show that the proposed Loc4Plan outperforms the SOTA methods., Comment: arXiv admin note: text overlap with arXiv:2203.13838 by other authors
Published: 2024

11. SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses

Author: Tan, Chaolei, Lin, Zihang, Pu, Junfu, Qi, Zhongang, Pei, Wei-Yi, Qu, Zhi, Wang, Yexin, Shan, Ying, Zheng, Wei-Shi, and Hu, Jian-Fang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Video grounding is a fundamental problem in multimodal content understanding, aiming to localize specific natural language queries in an untrimmed video. However, current video grounding datasets merely focus on simple events and are either limited to shorter videos or brief sentences, which hinders the model from evolving toward stronger multimodal understanding capabilities. To address these limitations, we present a large-scale video grounding dataset named SynopGround, in which more than 2800 hours of videos are sourced from popular TV dramas and are paired with accurately localized human-written synopses. Each paragraph in the synopsis serves as a language query and is manually annotated with precise temporal boundaries in the long video. These paragraph queries are tightly correlated to each other and contain a wealth of abstract expressions summarizing video storylines and specific descriptions portraying event details, which enables the model to learn multimodal perception on more intricate concepts over longer context dependencies. Based on the dataset, we further introduce a more complex setting of video grounding dubbed Multi-Paragraph Video Grounding (MPVG), which takes as input multiple paragraphs and a long video for grounding each paragraph query to its temporal interval. In addition, we propose a novel Local-Global Multimodal Reasoner (LGMR) to explicitly model the local-global structures of long-term multimodal inputs for MPVG. Our method provides an effective baseline solution to the multi-paragraph video grounding problem. Extensive experiments verify the proposed model's effectiveness as well as its superiority in long-term multi-paragraph video grounding over prior state-of-the-arts. Dataset and code are publicly available. Project page: https://synopground.github.io/., Comment: Accepted to ACM MM 2024. Project page: https://synopground.github.io/
Published: 2024

12. Morphing median fin enhances untethered bionic robotic tuna's linear acceleration and turning maneuverability

Author: Huang, Hongbin, Lin, Zhonglu, Zheng, Wei, Zhang, Jinhu, Liu, Zhibin, Zhou, Wei, and Zhang, Yu
Subjects: Computer Science - Robotics, Physics - Biological Physics, Physics - Fluid Dynamics
Abstract: Median fins of fish-like swimmers play a crucial role in linear acceleration and maneuvering processes. However, few research focused on untethered robotic fish experiments. Imitating the behaviour of real tuna, we developed a free-swimming bionic tuna with a foldable dorsal fin. The erection of dorsal fin, at proper conditions, can reduce head heave by 50%, enhance linear acceleration by 15.7%, increase turning angular velocity by 32.78%, and turning radius decreasing by 33.13%. Conversely, erecting the dorsal fin increases the wetted surface area, resulting in decreased maximum speed and efficiency during steady swimming phase. This finding partially explains why tuna erect their median fins during maneuvers or acceleration and fold them afterward to reduce drag. In addition, we verified that folding the median fins after acceleration does not significantly affect locomotion efficiency. This study supports the application of morphing median fins in undulating underwater robots and helps to further understand the impact of median fins on fish locomotion., Comment: 7 pages, 5 figures
Published: 2024

13. Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection

Author: Mo, Qijie, Gao, Yipeng, Fu, Shenghao, Yan, Junkai, Wu, Ancong, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In incremental object detection, knowledge distillation has been proven to be an effective way to alleviate catastrophic forgetting. However, previous works focused on preserving the knowledge of old models, ignoring that images could simultaneously contain categories from past, present, and future stages. The co-occurrence of objects makes the optimization objectives inconsistent across different stages since the definition for foreground objects differs across various stages, which limits the model's performance greatly. To overcome this problem, we propose a method called ``Bridge Past and Future'' (BPF), which aligns models across stages, ensuring consistent optimization directions. In addition, we propose a novel Distillation with Future (DwF) loss, fully leveraging the background probability to mitigate the forgetting of old classes while ensuring a high level of adaptability in learning new classes. Extensive experiments are conducted on both Pascal VOC and MS COCO benchmarks. Without memory, BPF outperforms current state-of-the-art methods under various settings. The code is available at https://github.com/iSEE-Laboratory/BPF., Comment: Accepted to ECCV 2024
Published: 2024

14. PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation

Author: Lu, Renjie, Meng, Jingke, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision and language navigation is a task that requires an agent to navigate according to a natural language instruction. Recent methods predict sub-goals on constructed topology map at each step to enable long-term action planning. However, they suffer from high computational cost when attempting to support such high-level predictions with GCN-like models. In this work, we propose an alternative method that facilitates navigation planning by considering the alignment between instructions and directed fidelity trajectories, which refers to a path from the initial node to the candidate locations on a directed graph without detours. This planning strategy leads to an efficient model while achieving strong performance. Specifically, we introduce a directed graph to illustrate the explored area of the environment, emphasizing directionality. Then, we firstly define the trajectory representation as a sequence of directed edge features, which are extracted from the panorama based on the corresponding orientation. Ultimately, we assess and compare the alignment between instruction and different trajectories during navigation to determine the next navigation target. Our method outperforms previous SOTA method BEVBert on RxR dataset and is comparable on R2R dataset while largely reducing the computational cost. Code is available: https://github.com/iSEE-Laboratory/VLN-PRET.
Published: 2024

15. Human-Centric Transformer for Domain Adaptive Action Recognition

Author: Lin, Kun-Yu, Zhou, Jiaming, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We study the domain adaptation task for action recognition, namely domain adaptive action recognition, which aims to effectively transfer action recognition power from a label-sufficient source domain to a label-free target domain. Since actions are performed by humans, it is crucial to exploit human cues in videos when recognizing actions across domains. However, existing methods are prone to losing human cues but prefer to exploit the correlation between non-human contexts and associated actions for recognition, and the contexts of interest agnostic to actions would reduce recognition performance in the target domain. To overcome this problem, we focus on uncovering human-centric action cues for domain adaptive action recognition, and our conception is to investigate two aspects of human-centric action cues, namely human cues and human-context interaction cues. Accordingly, our proposed Human-Centric Transformer (HCTransformer) develops a decoupled human-centric learning paradigm to explicitly concentrate on human-centric action cues in domain-variant video feature learning. Our HCTransformer first conducts human-aware temporal modeling by a human encoder, aiming to avoid a loss of human cues during domain-invariant video feature learning. Then, by a Transformer-like architecture, HCTransformer exploits domain-invariant and action-correlated contexts by a context encoder, and further models domain-invariant interaction between humans and action-correlated contexts. We conduct extensive experiments on three benchmarks, namely UCF-HMDB, Kinetics-NecDrone and EPIC-Kitchens-UDA, and the state-of-the-art performance demonstrates the effectiveness of our proposed HCTransformer., Comment: Accepted by TPAMI
Published: 2024

16. An Economic Framework for 6-DoF Grasp Detection

Author: Wu, Xiao-Ming, Cai, Jia-Feng, Jiang, Jian-Jian, Zheng, Dian, Wei, Yi-Lin, and Zheng, Wei-Shi
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Robotic grasping in clutters is a fundamental task in robotic manipulation. In this work, we propose an economic framework for 6-DoF grasp detection, aiming to economize the resource cost in training and meanwhile maintain effective grasp performance. To begin with, we discover that the dense supervision is the bottleneck of current SOTA methods that severely encumbers the entire training overload, meanwhile making the training difficult to converge. To solve the above problem, we first propose an economic supervision paradigm for efficient and effective grasping. This paradigm includes a well-designed supervision selection strategy, selecting key labels basically without ambiguity, and an economic pipeline to enable the training after selection. Furthermore, benefit from the economic supervision, we can focus on a specific grasp, and thus we devise a focal representation module, which comprises an interactive grasp head and a composite score estimation to generate the specific grasp more accurately. Combining all together, the EconomicGrasp framework is proposed. Our extensive experiments show that EconomicGrasp surpasses the SOTA grasp method by about 3AP on average, and with extremely low resource cost, for about 1/4 training time cost, 1/8 memory cost and 1/30 storage cost. Our code is available at https://github.com/iSEE-Laboratory/EconomicGrasp., Comment: 19 pages, 7 figures. Accepted in ECCV 2024!
Published: 2024

17. Rethinking Few-shot Class-incremental Learning: Learning from Yourself

Author: Tang, Yu-Ming, Peng, Yi-Xing, Meng, Jingke, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Few-shot class-incremental learning (FSCIL) aims to learn sequential classes with limited samples in a few-shot fashion. Inherited from the classical class-incremental learning setting, the popular benchmark of FSCIL uses averaged accuracy (aAcc) and last-task averaged accuracy (lAcc) as the evaluation metrics. However, we reveal that such evaluation metrics may not provide adequate emphasis on the novel class performance, and the continual learning ability of FSCIL methods could be ignored under this benchmark. In this work, as a complement to existing metrics, we offer a new metric called generalized average accuracy (gAcc) which is designed to provide an extra equitable evaluation by incorporating different perspectives of the performance under the guidance of a parameter $\alpha$. We also present an overall metric in the form of the area under the curve (AUC) along the $\alpha$. Under the guidance of gAcc, we release the potential of intermediate features of the vision transformers to boost the novel-class performance. Taking information from intermediate layers which are less class-specific and more generalizable, we manage to rectify the final features, leading to a more generalizable transformer-based FSCIL framework. Without complex network designs or cumbersome training procedures, our method outperforms existing FSCIL methods at aAcc and gAcc on three datasets. See codes at https://github.com/iSEE-Laboratory/Revisting_FSCIL, Comment: Accepted to ECCV 2024
Published: 2024

18. Quantum Simulation with Gauge Fixing: from Ising Lattice Gauge Theory to Dynamical Flux Model

Author: Wang, Junsen, Sun, Xiangxiang, and Zheng, Wei
Subjects: Condensed Matter - Quantum Gases, High Energy Physics - Lattice, Quantum Physics
Abstract: Quantum simulation of synthetic dynamic gauge field has attracted much attentions in recent years. There are two traditional ways to simulate gauge theories. One is to directly simulate the full Hamiltonian of gauge theories with local gauge symmetries. And the other is to engineer the projected Hamiltonian in one gauge subsector. In this work, we provide the third way towards the simulation of gauge theories based on \emph{gauge fixing}. To demonstrate this concept, we fix the gauge of an Ising lattice gauge field coupled with spinless fermions on a ladder geometry. After the gauge fixing, this gauge theory is reduced to a simpler model, in which fermions hop on a ladder with a fluctuating dynamical $\mathbb{Z}_{2}$ flux. Then we shows that this model can be realized via Floquet engineering in ultracold atomic gases. By analytical and numerical studies of this dynamical flux model, we deduce that there is confinement to deconfinement phase transition in the original unfixed gauge theory. This work paves the way to quantum simulate lattice gauge theory using the concept of gauge fixing, relevant both for condensed matter and high energy physics., Comment: 12 pages, 9 figures
Published: 2024

19. Focused State Recognition Using EEG with Eye Movement-Assisted Annotation

Author: Li, Tian-Hua, Ma, Tian-Fang, Peng, Dan, Zheng, Wei-Long, and Lu, Bao-Liang
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: With the rapid advancement in machine learning, the recognition and analysis of brain activity based on EEG and eye movement signals have attained a high level of sophistication. Utilizing deep learning models for learning EEG and eye movement features proves effective in classifying brain activities. A focused state indicates intense concentration on a task or thought. Distinguishing focused and unfocused states can be achieved through eye movement behaviors, reflecting variations in brain activities. By calculating binocular focusing point disparity in eye movement signals and integrating relevant EEG features, we propose an annotation method for focused states. The resulting comprehensive dataset, derived from raw data processed through a bio-acquisition device, includes both EEG features and focused labels annotated by eye movements. Extensive training and testing on several deep learning models, particularly the Transformer, yielded a 90.16% accuracy on the subject-dependent experiments. The validity of this approach was demonstrated, with cross-subject experiments, key frequency band and brain region analyses confirming its generalizability and providing physiological explanations.
Published: 2024

20. EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

Author: Li, Yuan-Ming, Huang, Wei-Jin, Wang, An-Lan, Zeng, Ling-An, Meng, Jing-Ke, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal boundaries are provided to localize single action videos along with sub-steps of each action. More importantly, EgoExo-Fitness introduces innovative annotations for interpretable action judgement--including technical keypoint verification, natural language comments on action execution, and action quality scores. Combining all of these, EgoExo-Fitness provides new resources to study egocentric and exocentric full-body action understanding across dimensions of "what", "when", and "how well". To facilitate research on egocentric and exocentric full-body action understanding, we construct benchmarks on a suite of tasks (i.e., action classification, action localization, cross-view sequence verification, cross-view skill determination, and a newly proposed task of guidance-based execution verification), together with detailed analysis. Code and data will be available at https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main., Comment: Accepted by ECCV2024
Published: 2024

21. Emergent Universal Quench Dynamics in Randomly Interacting Spin Models

Author: Li, Yuchen, Zhou, Tian-Gang, Wu, Ze, Peng, Pai, Zhang, Shengyu, Fu, Riqiang, Zhang, Ren, Zheng, Wei, Zhang, Pengfei, Zhai, Hui, Peng, Xinhua, and Du, Jiangfeng
Subjects: Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Quantum Gases, Quantum Physics
Abstract: Universality often emerges in low-energy equilibrium physics of quantum many-body systems, despite their microscopic complexity and variety. Recently, there has been a growing interest in studying far-from-equilibrium dynamics of quantum many-body systems. Such dynamics usually involves highly excited states beyond the traditional low-energy theory description. Whether universal behaviors can also emerge in such non-equilibrium dynamics is a central issue at the frontier of quantum dynamics. Here we report the experimental observation of universal dynamics by monitoring the spin depolarization process in a solid-state NMR system described by an ensemble of randomly interacting spins. The spin depolarization can be related to temporal spin-spin correlation functions at high temperatures. We discover a remarkable phenomenon that these correlation functions obey a universal functional form. This experimental fact helps us identify the dominant interacting processes in the spin depolarization dynamics that lead to this universality. Our observation demonstrates the existence of universality even in non-equilibrium dynamics at high temperatures, thereby complementing the well-established universality in low-energy physics., Comment: 10 pages, 4 figures; Supplementary Information 26 pages, 11 figures, 2 tables
Published: 2024
Full Text: View/download PDF

22. CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Author: Romero, David, Lyu, Chenyang, Wibowo, Haryo Akbarianto, Lynn, Teresa, Hamed, Injy, Kishore, Aditya Nanda, Mandal, Aishik, Dragonetti, Alina, Abzaliev, Artem, Tonja, Atnafu Lambebo, Balcha, Bontu Fufa, Whitehouse, Chenxi, Salamea, Christian, Velasco, Dan John, Adelani, David Ifeoluwa, Meur, David Le, Villa-Cueva, Emilio, Koto, Fajri, Farooqui, Fauzan, Belcavello, Frederico, Batnasan, Ganzorig, Vallejo, Gisela, Caulfield, Grainne, Ivetta, Guido, Song, Haiyue, Ademtew, Henok Biadglign, Maina, Hernán, Lovenia, Holy, Azime, Israel Abebe, Cruz, Jan Christian Blaise, Gala, Jay, Geng, Jiahui, Ortiz-Barajas, Jesus-German, Baek, Jinheon, Dunstan, Jocelyn, Alemany, Laura Alonso, Nagasinghe, Kumaranage Ravindu Yasas, Benotti, Luciana, D'Haro, Luis Fernando, Viridiano, Marcelo, Estecha-Garitagoitia, Marcos, Cabrera, Maria Camila Buitrago, Rodríguez-Cantelar, Mario, Jouitteau, Mélanie, Mihaylov, Mihail, Imam, Mohamed Fazli Mohamed, Adilazuarda, Muhammad Farid, Gochoo, Munkhjargal, Otgonbold, Munkh-Erdene, Etori, Naome, Niyomugisha, Olivier, Silva, Paula Mónica, Chitale, Pranjal, Dabre, Raj, Chevi, Rendi, Zhang, Ruochen, Diandaru, Ryandito, Cahyawijaya, Samuel, Góngora, Santiago, Jeong, Soyeong, Purkayastha, Sukannya, Kuribayashi, Tatsuki, Jayakumar, Thanmay, Torrent, Tiago Timponi, Ehsan, Toqeer, Araujo, Vladimir, Kementchedjhieva, Yova, Burzo, Zara, Lim, Zheng Wei, Yong, Zheng Xin, Ignat, Oana, Nwatu, Joan, Mihalcea, Rada, Solorio, Thamar, and Aji, Alham Fikri
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recent efforts have tried to increase the number of languages covered on VQA datasets, they still lack diversity in low-resource languages. More importantly, although these datasets often extend their linguistic range via translation or some other approaches, they usually keep images the same, resulting in narrow cultural representation. To address these limitations, we construct CVQA, a new Culturally-diverse multilingual Visual Question Answering benchmark, designed to cover a rich set of languages and cultures, where we engage native speakers and cultural experts in the data collection process. As a result, CVQA includes culturally-driven images and questions from across 28 countries on four continents, covering 26 languages with 11 scripts, providing a total of 9k questions. We then benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models. This benchmark can serve as a probing evaluation suite for assessing the cultural capability and bias of multimodal models and hopefully encourage more research efforts toward increasing cultural awareness and linguistic diversity in this field.
Published: 2024

23. Highly sensitive AuNCs@GSH/Ch-PtNPs metal nanoprobes for fluorescent and colorimetric dual-mode detection of ascorbic acid in drink

Author: Zheng, Wei
Subjects: Physics - Optics
Abstract: Fluorescence detection is a commonly used analytical method with the advantages of fast response, good selectivity and low destructiveness. However, fluorescence detection, a single-mode detection method, has some limitations, such as background interference that affects the accuracy of the fluorescence signal, lack of visualization of the detection results, and low sensitivity for detecting low-concentration samples. In order to overcome the shortcomings of fluorescence single-mode detection, we used the dual-mode method of fluorescence and colorimetry to detect ascorbic acid. The dual-mode detection of AA by fluorescence and colorimetry in the probe system enhances the specificity and accuracy of the detection. This bimodal detection method solved the problem of low detection sensitivity in the low concentration range of the analytes to be tested, and was linear in the lower (0-50 {\mu}M) and higher (50-350 {\mu}M) concentration ranges, respectively, and had a lower detection limit (0.034 {\mu}M). This glutathione-based gold cluster assay is characterized by simplicity, rapidity and accuracy, and provides a new way for the quantitative analysis of ascorbic acid. In addition, the method was validated during the determination of AA in beverages, which has the advantages of high sensitivity and fast response time.
Published: 2024

24. Grasp as You Say: Language-guided Dexterous Grasp Generation

Author: Wei, Yi-Lin, Jiang, Jian-Jian, Xing, Chengyi, Tan, Xiantuo, Wu, Xiao-Ming, Li, Hao, Cutkosky, Mark, and Zheng, Wei-Shi
Subjects: Computer Science - Robotics
Abstract: This paper explores a novel task ""Dexterous Grasp as You Say"" (DexGYS), enabling robots to perform dexterous grasping based on human commands expressed in natural language. However, the development of this field is hindered by the lack of datasets with natural human guidance; thus, we propose a language-guided dexterous grasp dataset, named DexGYSNet, offering high-quality dexterous grasp annotations along with flexible and fine-grained human language guidance. Our dataset construction is cost-efficient, with the carefully-design hand-object interaction retargeting strategy, and the LLM-assisted language guidance annotation system. Equipped with this dataset, we introduce the DexGYSGrasp framework for generating dexterous grasps based on human language instructions, with the capability of producing grasps that are intent-aligned, high quality and diversity. To achieve this capability, our framework decomposes the complex learning process into two manageable progressive objectives and introduce two components to realize them. The first component learns the grasp distribution focusing on intention alignment and generation diversity. And the second component refines the grasp quality while maintaining intention consistency. Extensive experiments are conducted on DexGYSNet and real world environment for validation., Comment: 9 pages, 7 figures
Published: 2024

25. Code Repair with LLMs gives an Exploration-Exploitation Tradeoff

Author: Tang, Hao, Hu, Keya, Zhou, Jin Peng, Zhong, Sicheng, Zheng, Wei-Long, Si, Xujie, and Ellis, Kevin
Subjects: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Programming Languages
Abstract: Iteratively improving and repairing source code with large language models (LLMs), known as refinement, has emerged as a popular way of generating programs that would be too complex to construct in one shot. Given a bank of test cases, together with a candidate program, an LLM can improve that program by being prompted with failed test cases. But it remains an open question how to best iteratively refine code, with prior work employing simple greedy or breadth-first strategies. We show here that refinement exposes an explore-exploit tradeoff: exploit by refining the program that passes the most test cases, or explore by refining a lesser considered program. We frame this as an arm-acquiring bandit problem, which we solve with Thompson Sampling. The resulting LLM-based program synthesis algorithm is broadly applicable: Across loop invariant synthesis, visual reasoning puzzles, and competition programming problems, we find that our new method can solve more problems using fewer language model calls.
Published: 2024

26. Superconducting and topological properties of compound Lu$_4$H$_7$N

Author: Liao, Zheng-Wei, Yi, Xin-Wei, You, Jing-Yang, Gu, Bo, and Su, Gang
Subjects: Condensed Matter - Superconductivity
Abstract: A recent experiment has reported a nitrogen-doped lutetium hydride acheving a remarkable Tc of 294 K at just 1 GPa, significantly reducing the required pressure for obtaining room temperature superconductivity. However, subsequent experimental and theoretical investigations have encountered difficulties in replicating these results, leaving the structure of this Lu-H-N compound shrouded in uncertainty. Here, we propose a stable structure for Lu$_4$H$_7$N employing first-principles calculations. Our calculations reveal that Lu$_4$H$_7$N has a Tc of 1.044 K, which can be substantially enhanced to 11.721 K at 150 GPa, due to the increasing electron-phonon coupling (EPC). Notably, we delve into the nontrivial Z2 band topology of Lu$_4$H$_7$N, featuring discernible surface states near the Fermi level, and we explore its spin Hall conductivity characteristics. Furthermore, we find that the electron doping can enhance the EPC strength and Tc of Lu$_4$H$_7$N, such as the Lu$_4$H$_7$O structure we predict simulating electron doping for Lu$_4$H$_7$N with an impressive Tc of 3.837 K. This work demonstrates the coexistence of superconducting and topological properties in a Lu-H-N system compound, which holds the promise of guiding the search for novel topological superconducting materials.
Published: 2024

27. Dexterous Grasp Transformer

Author: Xu, Guo-Hao, Wei, Yi-Lin, Zheng, Dian, Wu, Xiao-Ming, and Zheng, Wei-Shi
Subjects: Computer Science - Robotics
Abstract: In this work, we propose a novel discriminative framework for dexterous grasp generation, named Dexterous Grasp TRansformer (DGTR), capable of predicting a diverse set of feasible grasp poses by processing the object point cloud with only one forward pass. We formulate dexterous grasp generation as a set prediction task and design a transformer-based grasping model for it. However, we identify that this set prediction paradigm encounters several optimization challenges in the field of dexterous grasping and results in restricted performance. To address these issues, we propose progressive strategies for both the training and testing phases. First, the dynamic-static matching training (DSMT) strategy is presented to enhance the optimization stability during the training phase. Second, we introduce the adversarial-balanced test-time adaptation (AB-TTA) with a pair of adversarial losses to improve grasping quality during the testing phase. Experimental results on the DexGraspNet dataset demonstrate the capability of DGTR to predict dexterous grasp poses with both high quality and diversity. Notably, while keeping high quality, the diversity of grasp poses predicted by DGTR significantly outperforms previous works in multiple metrics without any data pre-processing. Codes are available at https://github.com/iSEE-Laboratory/DGTR ., Comment: Accepted to CVPR 2024
Published: 2024

28. Partial confinement in a quantum-link simulator

Author: Tang, Zheng, Zhu, Fei, Luo, Yi-Fan, Zheng, Wei, and Chen, Li
Subjects: Condensed Matter - Quantum Gases, Condensed Matter - Strongly Correlated Electrons, Quantum Physics
Abstract: Confinement/deconfinement, captivating attributes of high-energy elementary particles, have recently garnered wide attention in quantum simulations based on cold atoms. Yet, the partial confinement, an intermediate state between the confinement and deconfinement, remains underexplored. The partial confinement encapsulates the phenomenon that the confining behavior of charged particles is contingent upon their relative positions. In this paper, we demonstrate that the spin-1 quantum link model provides an excellent platform for exploring partial confinement. We conduct a comprehensive investigation of the physics emerging from partial confinement in both the context of equilibrium and non-equilibrium dynamics. Potential experimental setups using cold atoms are also discussed. Our work offers a simple and feasible routine for the study of confinement-related physics in the state-of-the-art artificial quantum systems subject to gauge symmetries.
Published: 2024

29. Topological Origin of Floquet Thermalization in Periodically Driven Many-body Systems

Author: Qi, Hao-Yue, Wu, Yue, and Zheng, Wei
Subjects: Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Quantum Gases, Condensed Matter - Statistical Mechanics
Abstract: Floquet engineering is a powerful manipulation method in modern quantum technology. However, unwanted heating is the main challenge of Floquet engineering, therefore the Floquet thermalization has attracting considerable attentions recently. In this work, we investigate thermalization of periodically driven many-body systems through the lens of Krylov complexity, and find a topological origin of different thermalization behaviors. We demonstrate that If the topology of the Krylov chain is nontrivial, a periodically driven system will reach a state with finite temperature. When the Krylov chain is topologically trivial, the system will be heated to infinite temperature. We further show that the prethermalization can be understood as the tunnelling process of a quasi-edge mode through the local gap on Krylov chain. This picture provides a systematically method to obtain the effective prethermal Hamiltonian., Comment: 6 pages, 5 figures
Published: 2024

30. Single-View Scene Point Cloud Human Grasp Generation

Author: Wang, Yan-Kang, Xing, Chengyi, Wei, Yi-Lin, Wu, Xiao-Ming, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we explore a novel task of generating human grasps based on single-view scene point clouds, which more accurately mirrors the typical real-world situation of observing objects from a single viewpoint. Due to the incompleteness of object point clouds and the presence of numerous scene points, the generated hand is prone to penetrating into the invisible parts of the object and the model is easily affected by scene points. Thus, we introduce S2HGrasp, a framework composed of two key modules: the Global Perception module that globally perceives partial object point clouds, and the DiffuGrasp module designed to generate high-quality human grasps based on complex inputs that include scene points. Additionally, we introduce S2HGD dataset, which comprises approximately 99,000 single-object single-view scene point clouds of 1,668 unique objects, each annotated with one human grasp. Our extensive experiments demonstrate that S2HGrasp can not only generate natural human grasps regardless of scene points, but also effectively prevent penetration between the hand and invisible parts of the object. Moreover, our model showcases strong generalization capability when applied to unseen objects. Our code and dataset are available at https://github.com/iSEE-Laboratory/S2HGrasp.
Published: 2024

31. Adaptive Anomaly Detection Disruption Prediction Starting from First Discharge on Tokamak

Author: Ai, Xinkun, Zheng, Wei, Zhang, Ming, Ding, Yonghua, Chen, Dalong, Chen, Zhongyong, Guo, Bihao, Shen, Chengshuo, Wang, Nengchao, Yang, Zhoujun, Chen, Zhipeng, Pan, Yuan, Shen, Biao, and Xiao, Binjia
Subjects: Physics - Plasma Physics
Abstract: Plasma disruption presents a significant challenge in tokamak fusion, where it can cause severe damage and economic losses. Current disruption predictors mainly rely on data-driven methods, requiring extensive discharge data for training. However, future tokamaks require disruption prediction from the first shot, posing challenges of data scarcity during the early operation period. In this period disruption prediction aims to support safe exploration of operation range and accumulate necessary data to develop advanced prediction models. Thus, predictors must adapt to evolving plasma environments during this exploration phase. To address these issues, this study proposes a cross-tokamak adaptive deployment method using the Enhanced Convolutional Autoencoder Anomaly Detection (E-CAAD) predictor, enabling disruption prediction from the first shot of new devices. Experimental results indicate the ability of E-CAAD model trained on existing devices to effectively differentiate between disruption precursors and non-disruption samples on new devices, proving the feasibility of model cross-device transfer. Building upon this, adaptive learning from scratch and threshold adaptive adjustment strategies are proposed to achieve model cross-device transfer. The adaptive learning from scratch strategy enables the predictor to use scarce data during the early operation of the new device while rapidly adapting to changes in operation environment. The threshold adaptive adjustment strategy addresses the challenge of selecting warning thresholds on new devices where validation set is lacking, ensuring that the warning thresholds adapt to changes in the operation environment. Finally, experiments transferring the model from J-TEXT to EAST exhibit comparable performance to EAST models trained with ample data, achieving a TPR of 85.88% and a FPR of 6.15%, with a 20ms reserved MGI system reaction time., Comment: 18 pages, 7 figures
Published: 2024

32. Topological states constructed by two different trivial quantum wires

Author: Lin, Jing-Run, Lv, Linxi, and Zuo, Zheng-Wei
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: The topological states of the two-leg and three-leg ladders formed by two trivial quantum wires with different lattice constants are theoretically investigated. For the symmetric nearest-neighbor intra-chain hopping two-leg ladder, the inversion symmetry topological insulator phase with two degenerate topological edge states appears. When the inversion symmetry is broken, the topological insulators with one or two topological edge states of different energies and topological metals with edge states embedded in the bulk states could emerge dependent on the filling factor. The topological origin of these topological states in the two-leg ladders is the topological properties of the Chern insulators and Chern metals. According to the arrangement of two trivial quantum wires, we construct two types of three-leg ladders. Each type of the three-leg ladder could be divided into one trivial subspace and one topological nontrivial subspace by unitary transformation. The topological nontrivial subspace corresponds to the effective two-leg ladder model. As the filling factor changes, the system could be in topological insulators or topological metals phases. These rich topological states in the two-leg and three-leg ladders could be confirmed by current experimental techniques.
Published: 2024

33. DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation

Author: Yan, Junkai, Gao, Yipeng, Yang, Qize, Wei, Xihan, Xie, Xuansong, Wu, Ancong, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Text-to-3D generation, which synthesizes 3D assets according to an overall text description, has significantly progressed. However, a challenge arises when the specific appearances need customizing at designated viewpoints but referring solely to the overall description for generating 3D objects. For instance, ambiguity easily occurs when producing a T-shirt with distinct patterns on its front and back using a single overall text guidance. In this work, we propose DreamView, a text-to-image approach enabling multi-view customization while maintaining overall consistency by adaptively injecting the view-specific and overall text guidance through a collaborative text guidance injection module, which can also be lifted to 3D generation via score distillation sampling. DreamView is trained with large-scale rendered multi-view images and their corresponding view-specific texts to learn to balance the separate content manipulation in each view and the global consistency of the overall object, resulting in a dual achievement of customization and consistency. Consequently, DreamView empowers artists to design 3D objects creatively, fostering the creation of more innovative and diverse 3D assets. Code and model will be released at https://github.com/iSEE-Laboratory/DreamView., Comment: Accepted to ECCV 2024, camera ready version
Published: 2024

34. On the surface helium abundance of B-type hot subdwarf stars from the WD+MS channel of Type Ia supernovae

Author: Ji, Rui-Jie, Meng, Xiang-Cun, and Liu, Zheng-Wei
Subjects: Astrophysics - Solar and Stellar Astrophysics
Abstract: The origin of intermediate helium (He)-rich hot subdwarfs are still unclear. Previous studies have suggested that some surviving Type Ia supernovae (SNe Ia) companions from the white dwarf~+~main-sequence (WD+MS) channel may contribute to the intermediate He-rich hot subdwarfs. However, previous studies ignored the impact of atomic diffusion on the post-explosion evolution of surviving companion stars of SNe Ia, leading to that they could not explain the observed surface He abundance of intermediate He-rich hot subdwarfs. In this work, by taking the atomic diffusion and stellar wind into account, we trace the surviving companions of SNe Ia from the WD+MS channel using the one-dimensional stellar evolution code \textsc{MESA} until they evolve into hot subdwarfs. We find that the surface He-abundances of our surviving companion models during their core He-burning phases are in a range of $-1 \lesssim {\rm log}(N_{\rm He}/N_{\rm H}) \lesssim 0$, which are consistent with those observed in intermediate He-rich hot subdwarfs. This seems to further support that surviving companions of SNe Ia in the WD+MS channel are possible to form some intermediate He-rich hot subdwarfs., Comment: 10 pages, 5 figures
Published: 2024

35. Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment

Author: Xu, Angchi and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Weakly-supervised action segmentation is a task of learning to partition a long video into several action segments, where training videos are only accompanied by transcripts (ordered list of actions). Most of existing methods need to infer pseudo segmentation for training by serial alignment between all frames and the transcript, which is time-consuming and hard to be parallelized while training. In this work, we aim to escape from this inefficient alignment with massive but redundant frames, and instead to directly localize a few action transitions for pseudo segmentation generation, where a transition refers to the change from an action segment to its next adjacent one in the transcript. As the true transitions are submerged in noisy boundaries due to intra-segment visual variation, we propose a novel Action-Transition-Aware Boundary Alignment (ATBA) framework to efficiently and effectively filter out noisy boundaries and detect transitions. In addition, to boost the semantic learning in the case that noise is inevitably present in the pseudo segmentation, we also introduce video-level losses to utilize the trusted video-level supervision. Extensive experiments show the effectiveness of our approach on both performance and training speed., Comment: Accepted to CVPR 2024
Published: 2024

36. Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

Author: Liang, Tianming, Tan, Chaolei, Xia, Beihao, Zheng, Wei-Shi, and Hu, Jian-Fang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper focuses on open-ended video question answering, which aims to find the correct answers from a large answer set in response to a video-related question. This is essentially a multi-label classification task, since a question may have multiple answers. However, due to annotation costs, the labels in existing benchmarks are always extremely insufficient, typically one answer per question. As a result, existing works tend to directly treat all the unlabeled answers as negative labels, leading to limited ability for generalization. In this work, we introduce a simple yet effective ranking distillation framework (RADI) to mitigate this problem without additional manual annotation. RADI employs a teacher model trained with incomplete labels to generate rankings for potential answers, which contain rich knowledge about label priority as well as label-associated visual cues, thereby enriching the insufficient labeling information. To avoid overconfidence in the imperfect teacher model, we further present two robust and parameter-free ranking distillation approaches: a pairwise approach which introduces adaptive soft margins to dynamically refine the optimization constraints on various pairwise rankings, and a listwise approach which adopts sampling-based partial listwise learning to resist the bias in teacher ranking. Extensive experiments on five popular benchmarks consistently show that both our pairwise and listwise RADIs outperform state-of-the-art methods. Further analysis demonstrates the effectiveness of our methods on the insufficient labeling problem., Comment: Accepted to CVPR 2024
Published: 2024

37. A Control-Recoverable Added-Noise-based Privacy Scheme for LQ Control in Networked Control Systems

Author: Tang, Xuening, Cao, Xianghui, and Zheng, Wei Xing
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: As networked control systems continue to evolve, ensuring the privacy of sensitive data becomes an increasingly pressing concern, especially in situations where the controller is physically separated from the plant. In this paper, we propose a secure control scheme for computing linear quadratic control in a networked control system utilizing two networked controllers, a privacy encoder and a control restorer. Specifically, the encoder generates two state signals blurred with random noise and sends them to the controllers, while the restorer reconstructs the correct control signal. The proposed design effectively preserves the privacy of the control system's state without sacrificing the control performance. We theoretically quantify the privacy-preserving performance in terms of the state estimation error of the controllers and the disclosure probability. Moreover, we extend the proposed privacy-preserving scheme and evaluation method to cases where collusion between two controllers occurs. Finally, we verify the validity of our proposed scheme through simulations.
Published: 2024

38. Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding

Author: Tan, Chaolei, Lai, Jianhuang, Zheng, Wei-Shi, and Hu, Jian-Fang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Video Paragraph Grounding (VPG) is an emerging task in video-language understanding, which aims at localizing multiple sentences with semantic relations and temporal order from an untrimmed video. However, existing VPG approaches are heavily reliant on a considerable number of temporal labels that are laborious and time-consuming to acquire. In this work, we introduce and explore Weakly-Supervised Video Paragraph Grounding (WSVPG) to eliminate the need of temporal annotations. Different from previous weakly-supervised grounding frameworks based on multiple instance learning or reconstruction learning for two-stage candidate ranking, we propose a novel siamese learning framework that jointly learns the cross-modal feature alignment and temporal coordinate regression without timestamp labels to achieve concise one-stage localization for WSVPG. Specifically, we devise a Siamese Grounding TRansformer (SiamGTR) consisting of two weight-sharing branches for learning complementary supervision. An Augmentation Branch is utilized for directly regressing the temporal boundaries of a complete paragraph within a pseudo video, and an Inference Branch is designed to capture the order-guided feature correspondence for localizing multiple sentences in a normal video. We demonstrate by extensive experiments that our paradigm has superior practicability and flexibility to achieve efficient weakly-supervised or semi-supervised learning, outperforming state-of-the-art methods trained with the same or stronger supervision., Comment: Accepted to CVPR 2024. v2: fix a typo in figure 1
Published: 2024

39. Antiferromagnetic Ground State, Charge Density Waves and Oxygen Vacancies Induced Metal-Insulator Transition in Pressurized La$_{3}$Ni$_{2}$O$_{7}$

Author: Yi, Xin-Wei, Meng, Ying, Li, Jia-Wen, Liao, Zheng-Wei, You, Jing-Yang, Gu, Bo, and Su, Gang
Subjects: Condensed Matter - Superconductivity
Abstract: La$_{3}$Ni$_{2}$O$_{7}$ has garnered widespread interest recently due to its high-temperature superconductivity under pressure, accompanied by charge density wave (CDW) ordering and metal-insulator (MI) transitions in the phase diagram. Here, we reveal with comprehensive calculations that La$_{3}$Ni$_{2}$O$_{7}$ possesses an antiferromagnetic ground state under both low and high pressures, with the strong Fermi surface nesting contributed by the flat band that leads to phonon softening and electronic instabilities. Several stable CDW orders with oxygen octahedral distortions are identified, which can trigger the MI transitions. The estimated CDW transition temperature ($\approx$120 K) at ambient pressure agrees nicely with experimental results. In the presence of apical oxygen vacancies, we identify two different phases, say, half distortion and full distortion phases, respectively, and their competition can lead to a pressure-induced MI transition, in good agreement with experimental observations. In addition, we find that the electron-phonon coupling is too small to contribute to superconductivity. These results appear to indicate an unconventional superconducting pairing mechanism mediated by antiferromagnetic fluctuations. A phase diagram that is consistent with the experimental results is given. The present results not only explain the origins of experimentally observed CDW and MI transitions, but also provide insight for deeply understanding the properties like superconductivity, CDW and the role of oxygen vacancies in pressurized La$_{3}$Ni$_{2}$O$_{7}$.
Published: 2024
Full Text: View/download PDF

40. Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

Author: Zheng, Dian, Wu, Xiao-Ming, Yang, Shuzhou, Zhang, Jian, Hu, Jian-Fang, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Universal image restoration is a practical and potential computer vision task for real-world applications. The main challenge of this task is handling the different degradation distributions at once. Existing methods mainly utilize task-specific conditions (e.g., prompt) to guide the model to learn different distributions separately, named multi-partite mapping. However, it is not suitable for universal model learning as it ignores the shared information between different tasks. In this work, we propose an advanced selective hourglass mapping strategy based on diffusion model, termed DiffUIR. Two novel considerations make our DiffUIR non-trivial. Firstly, we equip the model with strong condition guidance to obtain accurate generation direction of diffusion model (selective). More importantly, DiffUIR integrates a flexible shared distribution term (SDT) into the diffusion algorithm elegantly and naturally, which gradually maps different distributions into a shared one. In the reverse process, combined with SDT and strong condition guidance, DiffUIR iteratively guides the shared distribution to the task-specific distribution with high image quality (hourglass). Without bells and whistles, by only modifying the mapping strategy, we achieve state-of-the-art performance on five image restoration tasks, 22 benchmarks in the universal setting and zero-shot generalization setting. Surprisingly, by only using a lightweight model (only 0.89M), we could achieve outstanding performance. The source code and pre-trained models are available at https://github.com/iSEE-Laboratory/DiffUIR, Comment: Accepted to CVPR2024
Published: 2024

41. A Versatile Framework for Multi-scene Person Re-identification

Author: Zheng, Wei-Shi, Yan, Junkai, and Peng, Yi-Xing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality change, and so on. Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges. To our best knowledge, there is no versatile ReID model that can handle various ReID challenges at the same time. This work contributes to the first attempt at learning a versatile ReID model to solve such a problem. Our main idea is to form a two-stage prompt-based twin modeling framework called VersReID. Our VersReID firstly leverages the scene label to train a ReID Bank that contains abundant knowledge for handling various scenes, where several groups of scene-specific prompts are used to encode different scene-specific knowledge. In the second stage, we distill a V-Branch model with versatile prompts from the ReID Bank for adaptively solving the ReID of different scenes, eliminating the demand for scene labels during the inference stage. To facilitate training VersReID, we further introduce the multi-scene properties into self-supervised learning of ReID via a multi-scene prioris data augmentation (MPDA) strategy. Through extensive experiments, we demonstrate the success of learning an effective and versatile ReID model for handling ReID tasks under multi-scene conditions without manual assignment of scene labels in the inference stage, including general, low-resolution, clothing change, occlusion, and cross-modality scenes. Codes and models are available at https://github.com/iSEE-Laboratory/VersReID., Comment: To appear in TPAMI
Published: 2024

42. NECA: Neural Customizable Human Avatar

Author: Xiao, Junjin, Zhang, Qing, Xu, Zhan, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Human avatar has become a novel type of 3D asset with various applications. Ideally, a human avatar should be fully customizable to accommodate different settings and environments. In this work, we introduce NECA, an approach capable of learning versatile human representation from monocular or sparse-view videos, enabling granular customization across aspects such as pose, shadow, shape, lighting and texture. The core of our approach is to represent humans in complementary dual spaces and predict disentangled neural fields of geometry, albedo, shadow, as well as an external lighting, from which we are able to derive realistic rendering with high-frequency details via volumetric rendering. Extensive experiments demonstrate the advantage of our method over the state-of-the-art methods in photorealistic rendering, as well as various editing tasks such as novel pose synthesis and relighting. The code is available at https://github.com/iSEE-Laboratory/NECA., Comment: Accepted to CVPR 2024
Published: 2024

43. Broadband NIR photon upconversion generates NIR persistent luminescence for bioimaging

Author: Yang, Shuting, Qi, Bing, Sun, Mingzi, Dai, Wenjing, Miao, Ziyun, Zheng, Wei, Huang, Bolong, and Wang, Jie
Subjects: Physics - Optics, Physics - Chemical Physics
Abstract: Upconversion persistent luminescence (UCPL) phosphors that can be directly charged by near-infrared (NIR) light have gained considerable attention due to their promising applications ranging from photonics to biomedicine. However, current lanthanide-based UCPL phosphors show small absorption cross-sections and low upconversion charging efficiency. The development of UCPL phosphors faces challenges of lacking flexible upconversion charging pathways and poor design flexibility. Herein, we discovered a new lattice defect-mediated broadband photon upconversion process and the accompanied NIR-to-NIR UCPL in Cr-doped zinc gallate nanoparticles. The zinc gallate nanoparticles can be directly activated by broadband NIR light in the 700-1000 nm range to produce persistent luminescence at about 700 nm, which is also readily enhanced by rationally tailoring the lattice defects in the phosphors. This proposed UCPL phosphors achieved a signal-to-background ratio of over 200 in bioimaging by efficiently avoiding interference from autofluorescence and light scattering. Our findings reported the lattice defect-mediated photon upconversion for the first time, which significantly expanded the horizons for the flexible design of NIR-to-NIR UCPL phosphors toward broad applications.
Published: 2024

44. Quantum Many-body Scar Models in One Dimensional Spin Chains

Author: Wang, Jia-Wei, Zhou, Xiang-Fa, Guo, Guang-Can, and Zhou, Zheng-Wei
Subjects: Quantum Physics, Condensed Matter - Statistical Mechanics, Condensed Matter - Strongly Correlated Electrons
Abstract: The phenomenon of quantum many-body scars has received widespread attention both in theoretical and experimental physics in recent years due to its unique physical properties. In this paper, based on the $su(2)$ algebraic relations, we propose a general method for constructing scar models by combining simple modules.This allows us to investigate many-body scar phenomena in high-spin systems. We numerically verify the thermalization and non-integrability of this model and demonstrate the dynamical properties of the scar states. We also provide a theoretical analysis of the properties of these scar states. For spin-$1$ case, we find that our 1D chain model reduces to the famous PXP model[C. J. Turner et al. Phys. Rev. B 98, 155134(2018)] under special parameter condition. In addition, due to the continuous tunability of the parameters, our model also enables us to investigate the transitions of QMBS from non-integrable to integrable system., Comment: 12 pages, 7 figures
Published: 2024

45. Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

Author: Lin, Kun-Yu, Ding, Henghui, Zhou, Jiaming, Tang, Yu-Ming, Peng, Yi-Xing, Zhao, Zhilin, Loy, Chen Change, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Building upon the impressive success of CLIP (Contrastive Language-Image Pretraining), recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to efficient and effective video learners for open-vocabulary action recognition. Inspired by that humans perform actions in diverse environments, our work delves into an intriguing question: Can CLIP-based video learners effectively generalize to video domains they have not encountered during training? To answer this, we establish a CROSS-domain Open-Vocabulary Action recognition benchmark named XOV-Action, and conduct a comprehensive evaluation of five state-of-the-art CLIP-based video learners under various types of domain gaps. The evaluation demonstrates that previous methods exhibit limited action recognition performance in unseen video domains, revealing potential challenges of the cross-domain open-vocabulary action recognition task. In this paper, we focus on one critical challenge of the task, namely scene bias, and accordingly contribute a novel scene-aware video-text alignment method. Our key idea is to distinguish video representations apart from scene-encoded text representations, aiming to learn scene-agnostic video representations for recognizing actions across domains. Extensive experiments demonstrate the effectiveness of our method. The benchmark and code will be available at https://github.com/KunyuLin/XOV-Action/.
Published: 2024

46. Evaluating the effects of adrenalectomy and mineralocorticoid receptor antagonist on cardiac remodeling and diastolic function in patients with aldosterone-producing adenoma

Author: Chang, Yu-Ching, Wu, Xue-Ming, Chen, Tsung-Yan, Chen, Uei-Lin, Liao, Che-Wei, Lai, Tai-Shuan, Chang, Chin-Chen, Lee, Bo-Ching, Yang, Fang-Yu, Chen, Zheng-Wei, Chang, Yi-Yao, Chueh, Jeff S., Wu, Vin-Cent, Tsai, Cheng-Hsuan, Hung, Chi-Sheng, and Lin, Yen-Hung
Published: 2024
Full Text: View/download PDF

47. Development and application of an intelligent thermal state monitoring system for sintering machine tails based on CNN–LSTM hybrid neural networks

Author: Xiong, Da-lin, Zhang, Xin-yu, Yu, Zheng-wei, Zhang, Xue-feng, Long, Hong-ming, and Chen, Liang-jun
Published: 2024
Full Text: View/download PDF

48. Three dimensional-printed artificial disc replacement for single-level cervical spondylosis: a cohort study

Author: Zhang, Xiao-bo, Gao, Zilin, Yao, Xin, Xu, Zheng-wei, and Hao, Ding-jun
Published: 2024
Full Text: View/download PDF

49. Carbon monoxide-releasing molecule-3 ameliorates traumatic brain injury-induced cardiac dysfunctions via inhibition of pyroptosis and apoptosis

Author: Bai, Jing, Sun, Wen-Bo, Zheng, Wei-Chao, Wang, Xu-Peng, and Bai, Yang
Published: 2024
Full Text: View/download PDF

50. Semi-supervised image semantic segmentation method with semantic regions patching and uncertainty-guided loss

Author: Guo, Dinghao, Chen, Dali, Lin, Xin, Xue, Zheng, Zheng, Wei, and Li, Xianling
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

38,758 results on '"ZHENG, WEI"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources