3,676 results on '"Wang, Zhigang"'
Search Results
2. COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models
- Author
-
Liu, Kehui, Tang, Zixin, Wang, Dong, Wang, Zhigang, Zhao, Bin, and Li, Xuelong
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
Leveraging the powerful reasoning capabilities of large language models (LLMs), recent LLM-based robot task planning methods yield promising results. However, they mainly focus on single or multiple homogeneous robots on simple tasks. Practically, complex long-horizon tasks always require collaborations among multiple heterogeneous robots especially with more complex action spaces, which makes these tasks more challenging. To this end, we propose COHERENT, a novel LLM-based task planning framework for collaboration of heterogeneous multi-robot systems including quadrotors, robotic dogs, and robotic arms. Specifically, a Proposal-Execution-Feedback-Adjustment (PEFA) mechanism is designed to decompose and assign actions for individual robots, where a centralized task assigner makes a task planning proposal to decompose the complex task into subtasks, and then assigns subtasks to robot executors. Each robot executor selects a feasible action to implement the assigned subtask and reports self-reflection feedback to the task assigner for plan adjustment. The PEFA loops until the task is completed. Moreover, we create a challenging heterogeneous multi-robot task planning benchmark encompassing 100 complex long-horizon tasks. The experimental results show that our work surpasses the previous methods by a large margin in terms of success rate and execution efficiency. The experimental videos, code, and benchmark are released at https://github.com/MrKeee/COHERENT., Comment: 7 pages, 5 figures. Submitted to IEEE International Conference on Robotics and Automation (ICRA), 2025
- Published
- 2024
3. Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding
- Author
-
Gao, Xianqiang, Zhang, Pingrui, Qu, Delin, Wang, Dong, Wang, Zhigang, Ding, Yan, Zhao, Bin, and Li, Xuelong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
3D Object Affordance Grounding aims to predict the functional regions on a 3D object and has laid the foundation for a wide range of applications in robotics. Recent advances tackle this problem via learning a mapping between 3D regions and a single human-object interaction image. However, the geometric structure of the 3D object and the object in the human-object interaction image are not always consistent, leading to poor generalization. To address this issue, we propose to learn generalizable invariant affordance knowledge from multiple human-object interaction images within the same affordance category. Specifically, we introduce the \textbf{M}ulti-\textbf{I}mage Guided Invariant-\textbf{F}eature-Aware 3D \textbf{A}ffordance \textbf{G}rounding (\textbf{MIFAG}) framework. It grounds 3D object affordance regions by identifying common interaction patterns across multiple human-object interaction images. First, the Invariant Affordance Knowledge Extraction Module (\textbf{IAM}) utilizes an iterative updating strategy to gradually extract aligned affordance knowledge from multiple images and integrate it into an affordance dictionary. Then, the Affordance Dictionary Adaptive Fusion Module (\textbf{ADM}) learns comprehensive point cloud representations that consider all affordance candidates in multiple images. Besides, the Multi-Image and Point Affordance (\textbf{MIPA}) benchmark is constructed and our method outperforms existing state-of-the-art methods on various experimental comparisons. Project page: \url{https://goxq.github.io/mifag}
- Published
- 2024
4. Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection
- Author
-
Pang, Xincheng, Xia, Wenke, Wang, Zhigang, Zhao, Bin, Hu, Di, Wang, Dong, and Li, Xuelong
- Subjects
Computer Science - Robotics - Abstract
3D perception ability is crucial for generalizable robotic manipulation. While recent foundation models have made significant strides in perception and decision-making with RGB-based input, their lack of 3D perception limits their effectiveness in fine-grained robotic manipulation tasks. To address these limitations, we propose a Depth Information Injection ($\bold{DI}^{\bold{2}}$) framework that leverages the RGB-Depth modality for policy fine-tuning, while relying solely on RGB images for robust and efficient deployment. Concretely, we introduce the Depth Completion Module (DCM) to extract the spatial prior knowledge related to depth information and generate virtual depth information from RGB inputs to aid policy deployment. Further, we propose the Depth-Aware Codebook (DAC) to eliminate noise and reduce the cumulative error from the depth prediction. In the inference phase, this framework employs RGB inputs and accurately predicted depth data to generate the manipulation action. We conduct experiments on simulated LIBERO environments and real-world scenarios, and the experiment results prove that our method could effectively enhance the pre-trained RGB-based policy with 3D perception ability for robotic manipulation. The website is released at https://gewu-lab.github.io/DepthHelps-IROS2024., Comment: accepted by IROS 2024
- Published
- 2024
5. Physical prior guided cooperative learning framework for joint turbulence degradation estimation and infrared video restoration
- Author
-
Zhang, Ziran, Tang, Yuhang, Wang, Zhigang, Chen, Yueting, and Zhao, Bin
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Infrared imaging and turbulence strength measurements are in widespread demand in many fields. This paper introduces a Physical Prior Guided Cooperative Learning (P2GCL) framework to jointly enhance atmospheric turbulence strength estimation and infrared image restoration. P2GCL involves a cyclic collaboration between two models, i.e., a TMNet measures turbulence strength and outputs the refractive index structure constant (Cn2) as a physical prior, a TRNet conducts infrared image sequence restoration based on Cn2 and feeds the restored images back to the TMNet to boost the measurement accuracy. A novel Cn2-guided frequency loss function and a physical constraint loss are introduced to align the training process with physical theories. Experiments demonstrate P2GCL achieves the best performance for both turbulence strength estimation (improving Cn2 MAE by 0.0156, enhancing R2 by 0.1065) and image restoration (enhancing PSNR by 0.2775 dB), validating the significant impact of physical prior guided cooperative learning., Comment: 21
- Published
- 2024
6. KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance
- Author
-
Lu, Jingxian, Xia, Wenke, Wang, Dong, Wang, Zhigang, Zhao, Bin, Hu, Di, and Li, Xuelong
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
Online Imitation Learning methods struggle with the gap between extensive online exploration space and limited expert trajectories, which hinder efficient exploration due to inaccurate task-aware reward estimation. Inspired by the findings from cognitive neuroscience that task decomposition could facilitate cognitive processing for efficient learning, we hypothesize that an agent could estimate precise task-aware imitation rewards for efficient online exploration by decomposing the target task into the objectives of "what to do" and the mechanisms of "how to do". In this work, we introduce the hybrid Key-state guided Online Imitation (KOI) learning approach, which leverages the integration of semantic and motion key states as guidance for task-aware reward estimation. Initially, we utilize the visual-language models to segment the expert trajectory into semantic key states, indicating the objectives of "what to do". Within the intervals between semantic key states, optical flow is employed to capture motion key states to understand the process of "how to do". By integrating a thorough grasp of both semantic and motion key states, we refine the trajectory-matching reward computation, encouraging task-aware exploration for efficient online imitation learning. Our experiment results prove that our method is more sample efficient in the Meta-World and LIBERO environments. We also conduct real-world robotic manipulation experiments to validate the efficacy of our method, demonstrating the practical applicability of our KOI method.
- Published
- 2024
7. Revolutionizing Battery Disassembly: The Design and Implementation of a Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1)
- Author
-
Peng, Yanlong, Wang, Zhigang, Zhang, Yisheng, Zhang, Shengmin, Cai, Nan, Wu, Fan, and Chen, Ming
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
The efficient disassembly of end-of-life electric vehicle batteries(EOL-EVBs) is crucial for green manufacturing and sustainable development. The current pre-programmed disassembly conducted by the Autonomous Mobile Manipulator Robot(AMMR) struggles to meet the disassembly requirements in dynamic environments, complex scenarios, and unstructured processes. In this paper, we propose a Battery Disassembly AMMR(BEAM-1) system based on NeuralSymbolic AI. It detects the environmental state by leveraging a combination of multi-sensors and neural predicates and then translates this information into a quasi-symbolic space. In real-time, it identifies the optimal sequence of action primitives through LLM-heuristic tree search, ensuring high-precision execution of these primitives. Additionally, it employs positional speculative sampling using intuitive networks and achieves the disassembly of various bolt types with a meticulously designed end-effector. Importantly, BEAM-1 is a continuously learning embodied intelligence system capable of subjective reasoning like a human, and possessing intuition. A large number of real scene experiments have proved that it can autonomously perceive, decide, and execute to complete the continuous disassembly of bolts in multiple, multi-category, and complex situations, with a success rate of 98.78%. This research attempts to use NeuroSymbolic AI to give robots real autonomous reasoning, planning, and learning capabilities. BEAM-1 realizes the revolution of battery disassembly. Its framework can be easily ported to any robotic system to realize different application scenarios, which provides a ground-breaking idea for the design and implementation of future embodied intelligent robotic systems.
- Published
- 2024
8. What's Next? Exploring Utilization, Challenges, and Future Directions of AI-Generated Image Tools in Graphic Design
- Author
-
Tang, Yuying, Ciancia, Mariana, Wang, Zhigang, and Gao, Ze
- Subjects
Computer Science - Human-Computer Interaction ,Computer Science - Artificial Intelligence - Abstract
Recent advancements in artificial intelligence, such as computer vision and deep learning, have led to the emergence of numerous generative AI platforms, particularly for image generation. However, the application of AI-generated image tools in graphic design has not been extensively explored. This study conducted semi-structured interviews with seven designers of varying experience levels to understand their current usage, challenges, and future functional needs for AI-generated image tools in graphic design. As our findings suggest, AI tools serve as creative partners in design, enhancing human creativity, offering strategic insights, and fostering team collaboration and communication. The findings provide guiding recommendations for the future development of AI-generated image tools, aimed at helping engineers optimize these tools to better meet the needs of graphic designers.
- Published
- 2024
9. Tunable Fano and Dicke resonant tunneling of double quantum dots sandwiched between topological insulators
- Author
-
Hong, Yuan, Fu, Zhen-Guo, Chen, Zhou-Wei-Yu, Chi, Feng, Wang, Zhigang, Zhang, Wei, and Zhang, Ping
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
We study the resonant tunneling in double quantum dots (DQD) sandwiched between surfaces of topological insulator (TI) Bi$_2$Te$_3$, which possess strong spin-orbit coupling (SOC) and $^{d}C_{3v}$ double group symmetry. Distinct from the spin-conserved case with two-dimensional electron gas (2DEG) electrodes, the conductance displays an asymmetrical double-peak Fano-type lineshape rather than Dicke-type lineshape in the zero-field cases. While a Landau-Zener-like lineshape trajectory, which is identified as a signal of competition effect, could be developed by increasing the strength of interdot hopping. Furthermore, when applying an in-plane Zeeman field, we find that the conductance lineshape crossover between Fano and Dicke type could be driven by tilting the field orientation. Moreover, the rotational symmetry of the system could also be revealed from the lineshape trajectory. Our findings will contribute to a better understanding of the resonant tunneling in the presence of electrode SOC and may be confirmed experimentally in the future., Comment: 6 pages, 4 figures
- Published
- 2024
10. Exploring the Impact of AI-generated Image Tools on Professional and Non-professional Users in the Art and Design Fields
- Author
-
Tang, Yuying, Zhang, Ningning, Ciancia, Mariana, and Wang, Zhigang
- Subjects
Computer Science - Human-Computer Interaction - Abstract
The rapid proliferation of AI-generated image tools is transforming the art and design fields, challenging traditional notions of creativity and impacting both professional and non-professional users. For the purposes of this paper, we define 'professional users' as individuals who self-identified in our survey as 'artists,' 'designers,' 'filmmakers,' or 'art and design students,' and 'non-professional users' as individuals who self-identified as 'others.' This study explores how AI-generated image tools influence these different user groups. Through an online survey (N=380) comprising 173 professional users and 207 non-professional users, we examine differences in the utilization of AI tools, user satisfaction and challenges, applications in creative processes, perceptions and impacts, and acceptance levels. Our findings indicate persistent concerns about image quality, cost, and copyright issues. Additionally, the usage patterns of non-professional users suggest that AI tools have the potential to democratize creative processes, making art and design tasks more accessible to individuals without traditional expertise. This study provides insights into the needs of different user groups and offers recommendations for developing more user-centered AI tools, contributing to the broader discussion on the future of AI in the art and design fields.
- Published
- 2024
11. Deep Learning based Performance Testing for Analog Integrated Circuits
- Author
-
Cao, Jiawei, Guo, Chongtao, Li, Hao, Wang, Zhigang, Wang, Houjun, and Li, Geoffrey Ye
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
In this paper, we propose a deep learning based performance testing framework to minimize the number of required test modules while guaranteeing the accuracy requirement, where a test module corresponds to a combination of one circuit and one stimulus. First, we apply a deep neural network (DNN) to establish the mapping from the response of the circuit under test (CUT) in each module to all specifications to be tested. Then, the required test modules are selected by solving a 0-1 integer programming problem. Finally, the predictions from the selected test modules are combined by a DNN to form the specification estimations. The simulation results validate the proposed approach in terms of testing accuracy and cost.
- Published
- 2024
12. SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
- Author
-
Zhang, Junjie, Bai, Chenjia, He, Haoran, Xia, Wenke, Wang, Zhigang, Zhao, Bin, Li, Xiu, and Li, Xuelong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
Acquiring a multi-task imitation policy in 3D manipulation poses challenges in terms of scene understanding and action prediction. Current methods employ both 3D representation and multi-view 2D representation to predict the poses of the robot's end-effector. However, they still require a considerable amount of high-quality robot trajectories, and suffer from limited generalization in unseen tasks and inefficient execution in long-horizon reasoning. In this paper, we propose SAM-E, a novel architecture for robot manipulation by leveraging a vision-foundation model for generalizable scene understanding and sequence imitation for long-term action reasoning. Specifically, we adopt Segment Anything (SAM) pre-trained on a huge number of images and promptable masks as the foundation model for extracting task-relevant features, and employ parameter-efficient fine-tuning on robot data for a better understanding of embodied scenarios. To address long-horizon reasoning, we develop a novel multi-channel heatmap that enables the prediction of the action sequence in a single pass, notably enhancing execution efficiency. Experimental results from various instruction-following tasks demonstrate that SAM-E achieves superior performance with higher execution efficiency compared to the baselines, and also significantly improves generalization in few-shot adaptation to new tasks., Comment: ICML 2024. Project page: https://sam-embodied.github.io
- Published
- 2024
13. Towards free-response paradigm: a theory on decision-making in spiking neural networks
- Author
-
Zhu, Zhichao, Qi, Yang, Lu, Wenlian, Wang, Zhigang, Cao, Lu, and Feng, Jianfeng
- Subjects
Computer Science - Neural and Evolutionary Computing - Abstract
The energy-efficient and brain-like information processing abilities of Spiking Neural Networks (SNNs) have attracted considerable attention, establishing them as a crucial element of brain-inspired computing. One prevalent challenge encountered by SNNs is the trade-off between inference speed and accuracy, which requires sufficient time to achieve the desired level of performance. Drawing inspiration from animal behavior experiments that demonstrate a connection between decision-making reaction times, task complexity, and confidence levels, this study seeks to apply these insights to SNNs. The focus is on understanding how SNNs make inferences, with a particular emphasis on untangling the interplay between signal and noise in decision-making processes. The proposed theoretical framework introduces a new optimization objective for SNN training, highlighting the importance of not only the accuracy of decisions but also the development of predictive confidence through learning from past experiences. Experimental results demonstrate that SNNs trained according to this framework exhibit improved confidence expression, leading to better decision-making outcomes. In addition, a strategy is introduced for efficient decision-making during inference, which allows SNNs to complete tasks more quickly and can use stopping times as indicators of decision confidence. By integrating neuroscience insights with neuromorphic computing, this study opens up new possibilities to explore the capabilities of SNNs and advance their application in complex decision-making scenarios., Comment: 27 pages, 6 figures, 3 tables
- Published
- 2024
14. Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
- Author
-
Tang, Yiwen, Zhang, Ray, Liu, Jiaming, Guo, Zoey, Wang, Dong, Wang, Zhigang, Zhao, Bin, Zhang, Shanghang, Gao, Peng, Li, Hongsheng, and Li, Xuelong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Large foundation models have recently emerged as a prominent focus of interest, attaining superior performance in widespread scenarios. Due to the scarcity of 3D data, many efforts have been made to adapt pre-trained transformers from vision to 3D domains. However, such 2D-to-3D approaches are still limited, due to the potential loss of spatial geometries and high computation cost. More importantly, their frameworks are mainly designed for 2D models, lacking a general any-to-3D paradigm. In this paper, we introduce Any2Point, a parameter-efficient method to empower any-modality large models (vision, language, audio) for 3D understanding. Given a frozen transformer from any source modality, we propose a 3D-to-any (1D or 2D) virtual projection strategy that correlates the input 3D points to the original 1D or 2D positions within the source modality. This mechanism enables us to assign each 3D token with a positional encoding paired with the pre-trained model, which avoids 3D geometry loss caused by the true projection and better motivates the transformer for 3D learning with 1D/2D positional priors. Then, within each transformer block, we insert an any-to-3D guided adapter module for parameter-efficient fine-tuning. The adapter incorporates prior spatial knowledge from the source modality to guide the local feature aggregation of 3D tokens, compelling the semantic adaption of any-modality transformers. We conduct extensive experiments to showcase the effectiveness and efficiency of our method. Code and models are released at https://github.com/Ivan-Tang-3D/Any2Point., Comment: Code and models are released at https://github.com/Ivan-Tang-3D/Any2Point
- Published
- 2024
15. HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation
- Author
-
Jing, Linglin, Ding, Yiming, Gao, Yunpeng, Wang, Zhigang, Yan, Xu, Wang, Dong, Schaefer, Gerald, Fang, Hui, Zhao, Bin, and Li, Xuelong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Event-based semantic segmentation has gained popularity due to its capability to deal with scenarios under high-speed motion and extreme lighting conditions, which cannot be addressed by conventional RGB cameras. Since it is hard to annotate event data, previous approaches rely on event-to-image reconstruction to obtain pseudo labels for training. However, this will inevitably introduce noise, and learning from noisy pseudo labels, especially when generated from a single source, may reinforce the errors. This drawback is also called confirmation bias in pseudo-labeling. In this paper, we propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels. In particular, we first employ a plain unsupervised domain adaptation framework as our baseline, which can generate a set of pseudo labels through self-training. Then, we incorporate offline event-to-image reconstruction into the framework, and obtain another set of pseudo labels by predicting segmentation maps on the reconstructed images. A noisy label learning strategy is designed to mix the two sets of pseudo labels and enhance the quality. Moreover, we propose a soft prototypical alignment module to further improve the consistency of target domain features. Extensive experiments show that our proposed method outperforms existing state-of-the-art methods by a large margin on the DSEC-Semantic dataset (+5.88% accuracy, +10.32% mIoU), which even surpasses several supervised methods.
- Published
- 2024
16. Towards Agile Robots: Intuitive Robot Position Speculation with Neural Networks
- Author
-
Peng, Yanlong, Wang, Zhigang, Zhang, Yisheng, Zhang, Shengmin, and Chen, Ming
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
The robot position speculation, which determines where the chassis should move, is one key step to control the mobile manipulators. The target position must ensure the feasibility of chassis movement and manipulability, which is guaranteed by randomized sampling and kinematic checking in traditional methods. Addressing the demands of agile robotics, this paper proposes a robot position speculation network(RPSN), a learning-based approach to enhance the agility of mobile manipulators. The RPSN incorporates a differentiable inverse kinematic algorithm and a neural network. Through end-to-end training, the RPSN can speculate positions with a high success rate. We apply the RPSN to mobile manipulators disassembling end-of-life electric vehicle batteries (EOL-EVBs). Extensive experiments on various simulated environments and physical mobile manipulators demonstrate that the probability of the initial position provided by RPSN being the ideal position is 96.67%. From the kinematic constraint perspective, it achieves 100% generation of the ideal position on average within 1.28 attempts. Much lower than that of random sampling, 31.04. Moreover, the proposed method demonstrates superior data efficiency over pure neural network approaches. The proposed RPSN enables the robot to quickly infer feasible target positions by intuition. This work moves towards building agile robots that can act swiftly like humans.
- Published
- 2024
17. LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction
- Author
-
Wang, Zhigang, Yang, Hangyu, Wang, Ning, Xu, Chuanfei, Nie, Jie, Wei, Zhiqiang, Gu, Yu, and Yu, Ge
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Artificial Intelligence - Abstract
In the last decade, Convolutional Neural Network with a multi-layer architecture has advanced rapidly. However, training its complex network is very space-consuming, since a lot of intermediate data are preserved across layers, especially when processing high-dimension inputs with a big batch size. That poses great challenges to the limited memory capacity of current accelerators (e.g., GPUs). Existing efforts mitigate such bottleneck by external auxiliary solutions with additional hardware costs, and internal modifications with potential accuracy penalty. Differently, our analysis reveals that computations intra- and inter-layers exhibit the spatial-temporal weak dependency and even complete independency features. That inspires us to break the traditional layer-by-layer (column) dataflow rule. Now operations are novelly re-organized into rows throughout all convolution layers. This lightweight design allows a majority of intermediate data to be removed without any loss of accuracy. We particularly study the weak dependency between two consecutive rows. For the resulting skewed memory consumption, we give two solutions with different favorite scenarios. Evaluations on two representative networks confirm the effectiveness. We also validate that our middle dataflow optimization can be smoothly embraced by existing works for better memory reduction.
- Published
- 2024
18. Accelerating Heterogeneous Tensor Parallelism via Flexible Workload Control
- Author
-
Wang, Zhigang, Zhang, Xu, Wang, Ning, Xu, Chuanfei, Nie, Jie, Wei, Zhiqiang, Gu, Yu, and Yu, Ge
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Transformer-based models are becoming deeper and larger recently. For better scalability, an underlying training solution in industry is to split billions of parameters (tensors) into many tasks and then run them across homogeneous accelerators (e.g., GPUs). However, such dedicated compute cluster is prohibitively expensive in academia and moderate companies. An economic replacement is to aggregate existing heterogeneous devices and share resources among multi-tenants. Nevertheless, static hardware configurations and dynamic resource contention definitely cause straggling tasks, which heavily slows down the overall training efficiency. Existing works feature contributions mainly tailored for traditional data parallelism. They cannot work well for the new tensor parallelism due to strict communication and correctness constraints. In this paper we first present ZERO-resizing, a novel dynamic workload balancing technique without any data migration. We tune workloads in real-time by temporarily resizing matrices involved in core tensor-related computations. We particularly design data imputation and priority selection policies to respectively satisfy consistency constraint required by normal training and reduce the accuracy loss. We also give a lightweight data migration technique without loss of accuracy, to cope with heavy heterogeneity. Our final SEMI-migration solution is built on top of these two techniques and can adaptively distinguish their respective balancing missions, to achieve an overall success in efficiency and accuracy. Extensive experiments on the representative Colossal-AI platform validate the effectiveness of our proposals., Comment: 13 pages
- Published
- 2024
19. On recognition of the direct squares of the simple groups with abelian Sylow 2-subgroups
- Author
-
Li, Tao, Moghaddamfar, A. R., Vasil'ev, Andrey V., and Wang, Zhigang
- Subjects
Mathematics - Group Theory ,20D60, 20D06 - Abstract
The spectrum of a group is the set of orders of its elements. Finite groups with the same spectra as the direct squares of the finite simple groups with abelian Sylow 2-subgroups are considered. It is proved that the direct square $J_1\times J_1$ of the sporadic Janko group $J_1$ and the direct squares ${^2}G_2(q)\times{^2}G_2(q)$ of the simple small Ree groups ${^2}G_2(q)$ are uniquely characterized by their spectra in the class of finite groups, while for the direct square $PSL_2(q)\times PSL_2(q)$ of a 2-dimensional simple linear group $PSL_2(q)$, there are always infinitely many groups (even solvable groups) with the same spectra.
- Published
- 2023
- Full Text
- View/download PDF
20. X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer
- Author
-
Jing, Linglin, Xue, Ying, Yan, Xu, Zheng, Chaoda, Wang, Dong, Zhang, Ruimao, Wang, Zhigang, Fang, Hui, Zhao, Bin, and Li, Zhen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The field of 4D point cloud understanding is rapidly developing with the goal of analyzing dynamic 3D point cloud sequences. However, it remains a challenging task due to the sparsity and lack of texture in point clouds. Moreover, the irregularity of point cloud poses a difficulty in aligning temporal information within video sequences. To address these issues, we propose a novel cross-modal knowledge transfer framework, called X4D-SceneFormer. This framework enhances 4D-Scene understanding by transferring texture priors from RGB sequences using a Transformer architecture with temporal relationship mining. Specifically, the framework is designed with a dual-branch architecture, consisting of an 4D point cloud transformer and a Gradient-aware Image Transformer (GIT). During training, we employ multiple knowledge transfer techniques, including temporal consistency losses and masked self-attention, to strengthen the knowledge transfer between modalities. This leads to enhanced performance during inference using single-modal 4D point cloud inputs. Extensive experiments demonstrate the superior performance of our framework on various 4D point cloud video understanding tasks, including action recognition, action segmentation and semantic segmentation. The results achieve 1st places, i.e., 85.3% (+7.9%) accuracy and 47.3% (+5.0%) mIoU for 4D action segmentation and semantic segmentation, on the HOI4D challenge\footnote{\url{http://www.hoi4d.top/}.}, outperforming previous state-of-the-art by a large margin. We release the code at https://github.com/jinglinglingling/X4D
- Published
- 2023
21. Calibration-free quantitative phase imaging in multi-core fiber endoscopes using end-to-end deep learning
- Author
-
Sun, Jiawei, Zhao, Bin, Wang, Dong, Wang, Zhigang, Zhang, Jie, Koukourakis, Nektarios, Czarske, Juergen W., and Li, Xuelong
- Subjects
Physics - Optics ,Computer Science - Artificial Intelligence ,Physics - Biological Physics ,Physics - Computational Physics - Abstract
Quantitative phase imaging (QPI) through multi-core fibers (MCFs) has been an emerging in vivo label-free endoscopic imaging modality with minimal invasiveness. However, the computational demands of conventional iterative phase retrieval algorithms have limited their real-time imaging potential. We demonstrate a learning-based MCF phase imaging method, that significantly reduced the phase reconstruction time to 5.5 ms, enabling video-rate imaging at 181 fps. Moreover, we introduce an innovative optical system that automatically generated the first open-source dataset tailored for MCF phase imaging, comprising 50,176 paired speckle and phase images. Our trained deep neural network (DNN) demonstrates robust phase reconstruction performance in experiments with a mean fidelity of up to 99.8\%. Such an efficient fiber phase imaging approach can broaden the applications of QPI in hard-to-reach areas., Comment: 5 pages. 5 figures
- Published
- 2023
22. GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
- Author
-
Yan, Chi, Qu, Delin, Xu, Dan, Zhao, Bin, Wang, Zhigang, Wang, Dong, and Li, Xuelong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we introduce \textbf{GS-SLAM} that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussians in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets. Project page: https://gs-slam.github.io/., Comment: Accepted to CVPR 2024(highlight). Project Page: https://gs-slam.github.io/
- Published
- 2023
23. Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs
- Author
-
Xia, Wenke, Wang, Dong, Pang, Xincheng, Wang, Zhigang, Zhao, Bin, Hu, Di, and Li, Xuelong
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
Generalizable articulated object manipulation is essential for home-assistant robots. Recent efforts focus on imitation learning from demonstrations or reinforcement learning in simulation, however, due to the prohibitive costs of real-world data collection and precise object simulation, it still remains challenging for these works to achieve broad adaptability across diverse articulated objects. Recently, many works have tried to utilize the strong in-context learning ability of Large Language Models (LLMs) to achieve generalizable robotic manipulation, but most of these researches focus on high-level task planning, sidelining low-level robotic control. In this work, building on the idea that the kinematic structure of the object determines how we can manipulate it, we propose a kinematic-aware prompting framework that prompts LLMs with kinematic knowledge of objects to generate low-level motion trajectory waypoints, supporting various object manipulation. To effectively prompt LLMs with the kinematic structure of different objects, we design a unified kinematic knowledge parser, which represents various articulated objects as a unified textual description containing kinematic joints and contact location. Building upon this unified description, a kinematic-aware planner model is proposed to generate precise 3D manipulation waypoints via a designed kinematic-aware chain-of-thoughts prompting method. Our evaluation spanned 48 instances across 16 distinct categories, revealing that our framework not only outperforms traditional methods on 8 seen categories but also shows a powerful zero-shot capability for 8 unseen articulated object categories. Moreover, the real-world experiments on 7 different object categories prove our framework's adaptability in practical scenarios. Code is released at https://github.com/GeWu-Lab/LLM_articulated_object_manipulation/tree/main., Comment: Accepted by ICRA 2024
- Published
- 2023
24. AI Nushu: An Exploration of Language Emergence in Sisterhood -Through the Lens of Computational Linguistics
- Author
-
Sun, Yuqian, Tang, Yuying, Gao, Ze, Pan, Zhijun, Xu, Chuyan, Chen, Yurou, Qian, Kejiang, Wang, Zhigang, Braud, Tristan, Lee, Chang Hee, and Asadipour, Ali
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,14J60 (Primary) 14F05, 14J26 (Secondary) ,F.2.2 ,I.2.7 - Abstract
This paper presents "AI Nushu," an emerging language system inspired by Nushu (women's scripts), the unique language created and used exclusively by ancient Chinese women who were thought to be illiterate under a patriarchal society. In this interactive installation, two artificial intelligence (AI) agents are trained in the Chinese dictionary and the Nushu corpus. By continually observing their environment and communicating, these agents collaborate towards creating a standard writing system to encode Chinese. It offers an artistic interpretation of the creation of a non-western script from a computational linguistics perspective, integrating AI technology with Chinese cultural heritage and a feminist viewpoint., Comment: Accepted for publication at SIGGRAPH Asia 2023
- Published
- 2023
25. Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
- Author
-
Tang, Yiwen, Zhang, Ray, Guo, Zoey, Wang, Dong, Wang, Zhigang, Zhao, Bin, and Li, Xuelong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code is released at https://github.com/Ivan-Tang-3D/Point-PEFT., Comment: The specialized PEFT framework for 3D pre-trained models, which achieves competitive performance to full fine-tuning, and significantly reduces the computational resources. Project page: https://github.com/Ivan-Tang-3D/Point-PEFT
- Published
- 2023
26. Affordance-Driven Next-Best-View Planning for Robotic Grasping
- Author
-
Zhang, Xuechao, Wang, Dong, Han, Sun, Li, Weichuang, Zhao, Bin, Wang, Zhigang, Duan, Xiaoming, Fang, Chongrong, Li, Xuelong, and He, Jianping
- Subjects
Computer Science - Robotics - Abstract
Grasping occluded objects in cluttered environments is an essential component in complex robotic manipulation tasks. In this paper, we introduce an AffordanCE-driven Next-Best-View planning policy (ACE-NBV) that tries to find a feasible grasp for target object via continuously observing scenes from new viewpoints. This policy is motivated by the observation that the grasp affordances of an occluded object can be better-measured under the view when the view-direction are the same as the grasp view. Specifically, our method leverages the paradigm of novel view imagery to predict the grasps affordances under previously unobserved view, and select next observation view based on the highest imagined grasp quality of the target object. The experimental results in simulation and on a real robot demonstrate the effectiveness of the proposed affordance-driven next-best-view planning policy. Project page: https://sszxc.net/ace-nbv/., Comment: Conference on Robot Learning (CoRL) 2023
- Published
- 2023
27. High-energy nitrogen rings stabilized by superatom properties
- Author
-
Gong, Zhen, Wang, Rui, Yu, Famin, Wan, Chenxi, Yang, Xinrui, and Wang, Zhigang
- Subjects
Physics - Chemical Physics ,Physics - Atomic Physics ,Physics - Computational Physics - Abstract
How to stabilize nitrogen-rich high-energy-density molecules under conventional conditions is particularly important for the energy storage and conversion of such systems and has attracted extensive attention. In this work, our theoretical study showed for the first time that the stabilization mechanism of the nitrogen ring conformed to the superatomic properties at the atomic level. This result occurred because the stabilized anionic nitrogen rings generally showed planar high symmetry and the injected electrons occupied the superatomic molecular orbitals (SAMOs) of the nitrogen rings. According to these results, we identified the typical stabilized anionic nitrogen ring structures N64-, N5- and N42-, and their superatomic electronic configurations were 1S21P41D41F22S21P21F21D42P41G41F4, 1S21P41D41P22S21F41D42P4 and 1S21P41D21P21D22S22P41D4, respectively. On this basis, we further designed a pathway to stabilize nitrogen rings by introducing metal atoms as electron donors to form neutral ThN6, LiN5 and MgN4 structures, thereby replacing the anionization of systems. Our study highlights the importance of developing nitrogen-rich energetic materials from the perspective of superatoms., Comment: 6 pages, 3 figures
- Published
- 2023
28. The Compatibility between the Pangu Weather Forecasting Model and Meteorological Operational Data
- Author
-
Cheng, Wencong, Yan, Yan, Xia, Jiangjiang, Liu, Qi, Qu, Chang, and Wang, Zhigang
- Subjects
Computer Science - Machine Learning ,Physics - Atmospheric and Oceanic Physics - Abstract
Recently, multiple data-driven models based on machine learning for weather forecasting have emerged. These models are highly competitive in terms of accuracy compared to traditional numerical weather prediction (NWP) systems. In particular, the Pangu-Weather model, which is open source for non-commercial use, has been validated for its forecasting performance by the European Centre for Medium-Range Weather Forecasts (ECMWF) and has recently been published in the journal "Nature". In this paper, we evaluate the compatibility of the Pangu-Weather model with several commonly used NWP operational analyses through case studies. The results indicate that the Pangu-Weather model is compatible with different operational analyses from various NWP systems as the model initial conditions, and it exhibits a relatively stable forecasting capability. Furthermore, we have verified that improving the quality of global or local initial conditions significantly contributes to enhancing the forecasting performance of the Pangu-Weather model.
- Published
- 2023
29. Sequential Flipping: A Donor-Acceptor Exchange Mechanism in Water Trimer
- Author
-
Yang, Xinrui, Liu, Rui, Xu, Ruiqi, and Wang, Zhigang
- Subjects
Physics - Chemical Physics ,Physics - Atomic and Molecular Clusters ,Physics - Computational Physics - Abstract
The donor-acceptor exchange (DAE) is a significant hydrogen bond network rearrangement (HBNR) mechanism because it can lead to the change of hydrogen bond direction. In this work, we report a new DAE mechanism found in water trimer that is realized by sequential flipping (SF) of all molecules rather than the well-known proton transfer (PT) process. Meanwhile, the SF process has a much smaller potential barrier (0.262 eV) than the previously predicted collective rotation process (about 1.7 eV), implying that SF process is a main flipping process that can lead to DAE. Importantly, high-precision ab initio calculations show that SF-DAE can make the water ring to show a clear chiral difference from PT-DAE, which brings the prospect of distinguishing the two confusing processes based on circular dichroism spectra. The reaction rate analysis including the quantum tunneling indicates an obvious temperature-dependent competitive relationship between SF and PT processes, specifically, the SF process dominates above 65 K, while the PT process dominates below 65 K. Therefore, in most cases, the contribution for DAE mainly comes from the flipping process, rather than the PT process as previously thought. Our work enriches the understanding of the DAE mechanism in water trimer and provides a piece of the jigsaw that has been sought to the HBNR mechanism., Comment: 9 pages, 4 figures
- Published
- 2023
30. Toward stochastic neural computing
- Author
-
Qi, Yang, Zhu, Zhichao, Wei, Yiming, Cao, Lu, Wang, Zhigang, Zhang, Jie, Lu, Wenlian, and Feng, Jianfeng
- Subjects
Computer Science - Neural and Evolutionary Computing ,Physics - Biological Physics ,Quantitative Biology - Neurons and Cognition - Abstract
The highly irregular spiking activity of cortical neurons and behavioral variability suggest that the brain could operate in a fundamentally probabilistic way. Mimicking how the brain implements and learns probabilistic computation could be a key to developing machine intelligence that can think more like humans. In this work, we propose a theory of stochastic neural computing (SNC) in which streams of noisy inputs are transformed and processed through populations of nonlinearly coupled spiking neurons. To account for the propagation of correlated neural variability, we derive from first principles a moment embedding for spiking neural network (SNN). This leads to a new class of deep learning model called the moment neural network (MNN) which naturally generalizes rate-based neural networks to second order. As the MNN faithfully captures the stationary statistics of spiking neural activity, it can serve as a powerful proxy for training SNN with zero free parameters. Through joint manipulation of mean firing rate and noise correlations in a task-driven way, the model is able to learn inference tasks while simultaneously minimizing prediction uncertainty, resulting in enhanced inference speed. We further demonstrate the application of our method to Intel's Loihi neuromorphic hardware. The proposed theory of SNC may open up new opportunities for developing machine intelligence capable of computing uncertainty and for designing unconventional computing architectures.
- Published
- 2023
31. Potentiating dual-directional immunometabolic regulation with nanomedicine to enhance anti-tumor immunotherapy following incomplete photothermal ablation
- Author
-
Jiang, Qinqin, Qiao, Bin, Zheng, Jun, Song, Weixiang, Zhang, Nan, Xu, Jie, Liu, Jia, Zhong, Yixin, Zhang, Qin, Liu, Weiwei, You, Lanlan, Wu, Nianhong, Liu, Yun, Li, Pan, Ran, Haitao, Wang, Zhigang, and Guo, Dajing
- Published
- 2024
- Full Text
- View/download PDF
32. The mitochondria-targeted Kaempferol nanoparticle ameliorates severe acute pancreatitis
- Author
-
Wen, E, Cao, Yi, He, Shiwen, Zhang, Yuezhou, You, Lanlan, Wang, Tingqiu, Wang, Zhigang, He, Jun, and Feng, Yi
- Published
- 2024
- Full Text
- View/download PDF
33. Causes of conflicts in standardization alliances related to the Belt and Road Initiative
- Author
-
Chen, Xiuwen, Zhou, Qing, and Wang, Zhigang
- Published
- 2024
- Full Text
- View/download PDF
34. Association between joint physical activity and healthy dietary patterns and hypertension in US adults: cross-sectional NHANES study
- Author
-
Zhu, Yanzhou and Wang, Zhigang
- Published
- 2024
- Full Text
- View/download PDF
35. Smart responsive Fe/Mn nanovaccine triggers liver cancer immunotherapy via pyroptosis and pyroptosis-boosted cGAS-STING activation
- Author
-
Du, Qianying, Luo, Ying, Xu, Lian, Du, Chier, Zhang, Wenli, Xu, Jie, Liu, Yun, Liu, Bo, Chen, Sijin, Wang, Yi, Wang, Zhigang, Ran, Haitao, Wang, Junrui, and Guo, Dajing
- Published
- 2024
- Full Text
- View/download PDF
36. Chromosome-level genome assembly of humpback grouper using PacBio HiFi reads and Hi-C technologies
- Author
-
Liu, Jinxiang, Sun, Huibang, Tang, Lei, Wang, Yujue, Wang, Zhigang, Mao, Yunxiang, Huang, Hai, and Zhang, Quanqi
- Published
- 2024
- Full Text
- View/download PDF
37. Utilizing transcriptomics and metabolomics to unravel key genes and metabolites of maize seedlings in response to drought stress
- Author
-
Li, Yipu, Su, Zhijun, Lin, Yanan, Xu, Zhenghan, Bao, Haizhu, Wang, Fugui, Liu, Jian, Hu, Shuping, Wang, Zhigang, Yu, Xiaofang, and Gao, Julin
- Published
- 2024
- Full Text
- View/download PDF
38. Shaping immune landscape of colorectal cancer by cholesterol metabolites
- Author
-
Bai, Yibing, Li, Tongzhou, Wang, Qinshu, You, Weiqiang, Yang, Haochen, Xu, Xintian, Li, Ziyi, Zhang, Yu, Yan, Chengsong, Yang, Lei, Qiu, Jiaqian, Liu, Yuanhua, Chen, Shiyang, Wang, Dongfang, Huang, Binlu, Liu, Kexin, Song, Bao- Liang, Wang, Zhuozhong, Li, Kang, Liu, Xin, Wang, Guangchuan, Yang, Weiwei, Chen, Jianfeng, Hao, Pei, Zhang, Zemin, Wang, Zhigang, Zhu, Zheng-Jiang, and Xu, Chenqi
- Published
- 2024
- Full Text
- View/download PDF
39. One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
- Author
-
Li, Weichuang, Zhang, Longhao, Wang, Dong, Zhao, Bin, Wang, Zhigang, Chen, Mulin, Zhang, Bang, Wang, Zhongjian, Bo, Liefeng, and Li, Xuelong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. Most pioneering methods rely primarily on 2D representations and thus will inevitably suffer from face distortion when large head rotations are encountered. Recent works instead employ explicit 3D structural representations or implicit neural rendering to improve performance under large pose changes. Nevertheless, the fidelity of identity and expression is not so desirable, especially for novel-view synthesis. In this paper, we propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis. Drawing on the recently proposed Deformable Neural Radiance Fields, HiDe-NeRF represents the 3D dynamic scene into a canonical appearance field and an implicit deformation field, where the former comprises the canonical source face and the latter models the driving pose and expression. In particular, we improve fidelity from two aspects: (i) to enhance identity expressiveness, we design a generalized appearance module that leverages multi-scale volume features to preserve face shape and details; (ii) to improve expression preciseness, we propose a lightweight deformation module that explicitly decouples the pose and expression to enable precise expression modeling. Extensive experiments demonstrate that our proposed approach can generate better results than previous works. Project page: https://www.waytron.net/hidenerf/, Comment: Accepted by CVPR 2023
- Published
- 2023
40. Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction
- Author
-
Qu, Delin, Lao, Yizhen, Wang, Zhigang, Wang, Dong, Zhao, Bin, and Li, Xuelong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion. Existing methods suffer from two main drawbacks. Firstly, they face challenges in estimating the accurate correction field due to the uniform velocity assumption, leading to significant image correction errors under complex motion. Secondly, the drastic occlusion in dynamic scenes prevents current solutions from achieving better image quality because of the inherent difficulties in aligning and aggregating multiple frames. To tackle these challenges, we model the curvilinear trajectory of pixels analytically and propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixels. Besides, to reconstruct high-quality occlusion frames in dynamic scenes, we present a 3D video architecture that effectively Aligns and Aggregates multi-frame context, namely, RSA2-Net. We evaluate our method across a broad range of cameras and video sequences, demonstrating its significant superiority. Specifically, our method surpasses the state-of-the-art by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively. Code is available at https://github.com/DelinQu/qrsc., Comment: accepted at ICCV 2023
- Published
- 2023
41. ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance
- Author
-
Guo, Zoey, Tang, Yiwen, Zhang, Ray, Wang, Dong, Wang, Zhigang, Zhao, Bin, and Li, Xuelong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Understanding 3D scenes from multi-view inputs has been proven to alleviate the view discrepancy issue in 3D visual grounding. However, existing methods normally neglect the view cues embedded in the text modality and fail to weigh the relative importance of different views. In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities. For the text branch, ViewRefer leverages the diverse linguistic knowledge of large-scale language models, e.g., GPT, to expand a single grounding text to multiple geometry-consistent descriptions. Meanwhile, in the 3D modality, a transformer fusion module with inter-view attention is introduced to boost the interaction of objects across views. On top of that, we further present a set of learnable multi-view prototypes, which memorize scene-agnostic knowledge for different views, and enhance the framework from two perspectives: a view-guided attention module for more robust text features, and a view-guided scoring strategy during the final prediction. With our designed paradigm, ViewRefer achieves superior performance on three benchmarks and surpasses the second-best by +2.8%, +1.5%, and +1.35% on Sr3D, Nr3D, and ScanRefer. Code is released at https://github.com/Ivan-Tang-3D/ViewRefer3D., Comment: Accepted by ICCV 2023. Code is released at https://github.com/Ivan-Tang-3D/ViewRefer3D
- Published
- 2023
42. Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking
- Author
-
Wang, Yihao, Wang, Zhigang, Zhao, Bin, Wang, Dong, Chen, Mulin, and Li, Xuelong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Non-line-of-sight (NLOS) tracking has drawn increasing attention in recent years, due to its ability to detect object motion out of sight. Most previous works on NLOS tracking rely on active illumination, e.g., laser, and suffer from high cost and elaborate experimental conditions. Besides, these techniques are still far from practical application due to oversimplified settings. In contrast, we propose a purely passive method to track a person walking in an invisible room by only observing a relay wall, which is more in line with real application scenarios, e.g., security. To excavate imperceptible changes in videos of the relay wall, we introduce difference frames as an essential carrier of temporal-local motion messages. In addition, we propose PAC-Net, which consists of alternating propagation and calibration, making it capable of leveraging both dynamic and static messages on a frame-level granularity. To evaluate the proposed method, we build and publish the first dynamic passive NLOS tracking dataset, NLOS-Track, which fills the vacuum of realistic NLOS datasets. NLOS-Track contains thousands of NLOS video clips and corresponding trajectories. Both real-shot and synthetic data are included. Our codes and dataset are available at https://againstentropy.github.io/NLOS-Track/., Comment: CVPR 2023 camera-ready version. Codes and dataset are available at https://againstentropy.github.io/NLOS-Track/
- Published
- 2023
43. Fully Self-Supervised Depth Estimation from Defocus Clue
- Author
-
Si, Haozhe, Zhao, Bin, Wang, Dong, Gao, Yunpeng, Chen, Mulin, Wang, Zhigang, and Li, Xuelong
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Depth-from-defocus (DFD), modeling the relationship between depth and defocus pattern in images, has demonstrated promising performance in depth estimation. Recently, several self-supervised works try to overcome the difficulties in acquiring accurate depth ground-truth. However, they depend on the all-in-focus (AIF) images, which cannot be captured in real-world scenarios. Such limitation discourages the applications of DFD methods. To tackle this issue, we propose a completely self-supervised framework that estimates depth purely from a sparse focal stack. We show that our framework circumvents the needs for the depth and AIF image ground-truth, and receives superior predictions, thus closing the gap between the theoretical success of DFD works and their applications in the real world. In particular, we propose (i) a more realistic setting for DFD tasks, where no depth or AIF image ground-truth is available; (ii) a novel self-supervision framework that provides reliable predictions of depth and AIF image under the challenging setting. The proposed framework uses a neural model to predict the depth and AIF image, and utilizes an optical model to validate and refine the prediction. We verify our framework on three benchmark datasets with rendered focal stacks and real focal stacks. Qualitative and quantitative evaluations show that our method provides a strong baseline for self-supervised DFD tasks., Comment: CVPR 2023 camera-ready version. The code is released at https://github.com/Ehzoahis/DEReD
- Published
- 2023
44. The impact of RCEP on agricultural trade among members:From the perspectives of tariff concession and trade facilitation
- Author
-
LIU Ziming, WANG Zhigang
- Subjects
regional comprehensive economic partnership (rcep) ,agricultural trade ,trade facilitation ,tariff concession ,gtap model ,Environmental sciences ,GE1-350 ,Biology (General) ,QH301-705.5 - Abstract
[Objective] The Regional Comprehensive Economic Partnership (RCEP) is another major opening-up achievement after China joined WTO. This paper analyzes the impact of RCEP’s entry into force on the agricultural trade of member countries (regions) (hereinafter referred to as members) from the dual perspectives of tariff concessions and trade facilitation based on theory and evidence. [Methods] Firstly, the mechanism of tariff concession and trade facilitation affecting agricultural trade and welfare of RCEP members was analyzed through the trade theory model and illustrations of heterogeneous firms. Then, the scale of tariff concession is measured based on the tariff schedule, and the degree of trade facilitation is quantified by combining customs clearance time and trade time tariff equivalent data. Finally, in GTAP model introducing trade time cost, the impact of RCEP tariff concession and trade facilitation on the agricultural trade among members is simulated by using GTAP 10th edition data, and the differences between the two effects are compared. [Results] (1) 10 years after RCEP comes into effect, it will significantly promote the economic growth of all members. Compared with the baseline scenario, the GDP of ASEAN, South Korea, New Zealand, Japan, Australia and China will increase by 1.849%, 0.873%, 0.469%, 0.467%, 0.400% and 0.396%, respectively. In China, for example, trade facilitation contributed twice as much to GDP growth (0.265%) as tariff reduction (0.121%). (2) After the implementation of RCEP, it significantly promotes agricultural trade of advantageous products of members, such as the export growth of China’s fruits and vegetables, aquatic products, pig and poultry meat was 13.83%, 8.31%, 7.32%, and import growth of fruits and vegetables, beef and mutton, edible fats was 16.16%, 10.37%, 8.89%. (3) Trade facilitation promotes the agricultural trade, especially perishable agricultural products (fruits and vegetables, meat products, edible fats) of RCEP members. (4) The imports of agricultural products from RCEP members outside the region will be transferred to the region, and the exports of agricultural products without comparative advantages to the region will decline. [Conclusion] Before the implementation of RCEP, bilateral tariffs among other members except between China and Japan, and Japan and South Korea have been relatively low, and the room for tariff concessions is limited. Trade facilitation promotes agricultural trade more effectively. This paper helps grasp the opportunities and risks RCEP brings to members’ agriculture and estimate the alternative role of trade facilitation in context of limited tariff reduction space.
- Published
- 2024
- Full Text
- View/download PDF
45. Microstructure and mechanical properties of Sc2O3-Y2O3 co-doped ZrO2 ceramic materials for thermal barrier coating
- Author
-
WANG Zhigang, LIU Rengqian, XIE Min, ZHANG Yonghe, WANG Xuanli, SONG Xiwen, CHANG Zhendong, LIU Delin, and MU Rende
- Subjects
thermal barrier coating ,rare earth doping ,zirconia ,scandia ,microstructure ,mechanical property ,Materials of engineering and construction. Mechanics of materials ,TA401-492 - Abstract
Partially stabilized zirconia (7YSZ) with a mass fraction of 7±1% yttrium oxide is a widely used ceramic material for thermal barrier coatings. However, its phase stability, sintering resistance, and mechanical properties decreased during the long-time operation above 1200 ℃. A new type of Sc2O3-Y2O3 co-doped ZrO2 thermal barrier coating ceramic material was proposed , and a molar fraction of 7.5%Sc2O3-x%Y2O3-(92.5-x)%ZrO2(x=0,0.1,0.2,0.3) ceramics were prepared by solid-state reaction method. The effects of Y2O3 doping on the microstructural, phase evolution, and mechanical properties(including Vickers hardness, fracture toughness, elastic modulus and three-point bending strength)were explored by XRD, SEM, and other testing methods. The results show that the relative density of Sc2O3-Y2O3 doped ZrO2 ceramics sintered at 1450 °C for 3.5 h is more than 97%, and the phase structure is composed of tetragonal phase. Compared with 6-8YSZ ceramics, Sc2O3-Y2O3 doped ZrO2 ceramics exhibit similar Vickers hardness (13-14 GPa), fracture toughness (6.5-7.0 MPa·m1/2), elastic modulus (211-214 GPa)and three-point bending strength(520-850 MPa). Fracture mechanism shows a mixture of transgranular and intergranular fracture modes, in which transgranular fracture is dominant.This ceramic can be explored as a potential thermal barrier coating material for high-temperature applications.
- Published
- 2024
- Full Text
- View/download PDF
46. Paradoxical constitutive law between donor O-H bond length and its stretching frequency in water dimer
- Author
-
Liu, Rui, Yang, Xinrui, Yu, Famin, and Wang, Zhigang
- Subjects
Physics - Chemical Physics ,Physics - Atomic and Molecular Clusters ,Physics - Computational Physics - Abstract
The constitutive laws of hydrogen bonds (H-bonds) are central to understanding microphysical processes not precisely observed, especially in terms of structural properties. Previous experiences in water H-bonding revealed that as the intermolecular O...O distance shortens, the O-H stretching frequency redshifts; thus, an elongated O-H bond length can be empirically inferred, which is described as the constitutive law under the cooperative effect. Here, using the high-precision CCSD(T) method, we report a violation of the conventional constitutive law in water dimer. That is, when the variation in the O...O distance changes from stretching by 0.06 to contracting by -0.15 angstrom compared to the equilibrium position, the donor O-H bond length decreases from 0.9724 to 0.9717 angstrom, and the O-H stretching frequency redshifts from 3715 to 3708 cm-1. Our work highlights that the O-H bond length decreases simultaneously with its stretching frequency, which is clearly inconsistent with the previously recognized constitutive law., Comment: 10 pages, 2 figures
- Published
- 2022
- Full Text
- View/download PDF
47. Meteorological Satellite Images Prediction Based on Deep Multi-scales Extrapolation Fusion
- Author
-
Huang, Fang, Cheng, Wencong, Wang, PanFeng, Wang, ZhiGang, and He, HongHong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Meteorological satellite imagery is critical for meteorologists. The data have played an important role in monitoring and analyzing weather and climate changes. However, satellite imagery is a kind of observation data and exists a significant time delay when transmitting the data back to Earth. It is important to make accurate predictions for meteorological satellite images, especially the nowcasting prediction up to 2 hours ahead. In recent years, there has been growing interest in the research of nowcasting prediction applications of weather radar images based on deep learning. Compared to the weather radar images prediction problem, the main challenge for meteorological satellite images prediction is the large-scale observation areas and therefore the large sizes of the observation products. Here we present a deep multi-scales extrapolation fusion method, to address the challenge of the meteorological satellite images nowcasting prediction. First, we downsample the original satellite images dataset with large size to several images datasets with smaller resolutions, then we use a deep spatiotemporal sequences prediction method to generate the multi-scales prediction images with different resolutions separately. Second, we fuse the multi-scales prediction results to the targeting prediction images with the original size by a conditional generative adversarial network. The experiments based on the FY-4A meteorological satellite data show that the proposed method can generate realistic prediction images that effectively capture the evolutions of the weather systems in detail. We believe that the general idea of this work can be potentially applied to other spatiotemporal sequence prediction tasks with a large size.
- Published
- 2022
48. Finite Solvable Groups in Which the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma$$\end{document}-Quasinormality of Subgroups is a Transitive Relation
- Author
-
Wang, Zhigang, Guo, Wenbin, Safonova, I. N., and Skiba, A. N.
- Published
- 2023
- Full Text
- View/download PDF
49. UFO: Unified Feature Optimization
- Author
-
Xi, Teng, Sun, Yifan, Yu, Deli, Li, Bi, Peng, Nan, Zhang, Gang, Zhang, Xinyu, Wang, Zhigang, Chen, Jinwen, Wang, Jian, Liu, Lufei, Feng, Haocheng, Han, Junyu, Liu, Jingtuo, Ding, Errui, and Wang, Jingdong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper proposes a novel Unified Feature Optimization (UFO) paradigm for training and deploying deep models under real-world and large-scale scenarios, which requires a collection of multiple AI functions. UFO aims to benefit each single task with a large-scale pretraining on all tasks. Compared with the well known foundation model, UFO has two different points of emphasis, i.e., relatively smaller model size and NO adaptation cost: 1) UFO squeezes a wide range of tasks into a moderate-sized unified model in a multi-task learning manner and further trims the model size when transferred to down-stream tasks. 2) UFO does not emphasize transfer to novel tasks. Instead, it aims to make the trimmed model dedicated for one or more already-seen task. With these two characteristics, UFO provides great convenience for flexible deployment, while maintaining the benefits of large-scale pretraining. A key merit of UFO is that the trimming process not only reduces the model size and inference consumption, but also even improves the accuracy on certain tasks. Specifically, UFO considers the multi-task training and brings two-fold impact on the unified model: some closely related tasks have mutual benefits, while some tasks have conflicts against each other. UFO manages to reduce the conflicts and to preserve the mutual benefits through a novel Network Architecture Search (NAS) method. Experiments on a wide range of deep representation learning tasks (i.e., face recognition, person re-identification, vehicle re-identification and product retrieval) show that the model trimmed from UFO achieves higher accuracy than its single-task-trained counterpart and yet has smaller model size, validating the concept of UFO. Besides, UFO also supported the release of 17 billion parameters computer vision (CV) foundation model which is the largest CV model in the industry., Comment: Accepted in ECCV 2022
- Published
- 2022
50. Learning Symbolic Operators: A Neurosymbolic Solution for Autonomous Disassembly of Electric Vehicle Battery
- Author
-
Du, Yidong, Wang, Wenshuo, Wang, Zhigang, Yang, Hua, Wang, Haitao, Cai, Yinghao, and Chen, Ming
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
The booming of electric vehicles demands efficient battery disassembly for recycling to be environment-friendly. Currently, battery disassembly is still primarily done by humans, probably assisted by robots, due to the unstructured environment and high uncertainties. It is highly desirable to design autonomous solutions to improve work efficiency and lower human risks in high voltage and toxic environments. This paper proposes a novel neurosymbolic method, which augments the traditional Variational Autoencoder (VAE) model to learn symbolic operators based on raw sensory inputs and their relationships. The symbolic operators include a probabilistic state symbol grounding model and a state transition matrix for predicting states after each execution to enable autonomous task and motion planning. At last, the method's feasibility is verified through test results.
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.