15,845 results on '"Zheng, Jun"'
Search Results
2. Nonreciprocal interaction and entanglement between two superconducting qubits
- Author
-
Ren, Yu-Meng, Pan, Xue-Feng, Yao, Xiao-Yu, Huo, Xiao-Wen, Zheng, Jun-Cong, Hei, Xin-Lei, Qiao, Yi-Fan, and Li, Peng-Bo
- Subjects
Quantum Physics - Abstract
Nonreciprocal interaction between two spatially separated subsystems plays a crucial role in signal processing and quantum networks. Here, we propose an efficient scheme to achieve nonreciprocal interaction and entanglement between two qubits by combining coherent and dissipative couplings in a superconducting platform, where two coherently coupled transmon qubits simultaneously interact with a transmission line waveguide. The coherent interaction between the transmon qubits can be achieved via capacitive coupling or via an intermediary cavity mode, while the dissipative interaction is induced by the transmission line via reservoir engineering. With high tunability of superconducting qubits, their positions along the transmission line can be adjusted to tune the dissipative coupling, enabling to tailor reciprocal and nonreciprocal interactions between the qubits. A fully nonreciprocal interaction can be achieved when the separation between the two qubits is $(4n+3)\lambda_{0} /4$, where $n$ is an integer and $\lambda_{0}$ is the photon wavelength. This nonreciprocal interaction enables the generation of nonreciprocal entanglement between the two transmon qubits. Furthermore, applying a drive field to one of the qubit can stabilize the system into a nonreciprocal steady-state entangled state. Remarkably, the nonreciprocal interaction in this work does not rely on the presence of nonlinearity or complex configurations, which has more potential applications in designing nonreciprocal quantum devices, processing quantum information, and building quantum networks., Comment: 11 pages, 7 figures
- Published
- 2024
3. Improved Video VAE for Latent Video Diffusion Model
- Author
-
Wu, Pingyu, Zhu, Kai, Liu, Yu, Zhao, Liming, Zhai, Wei, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Variational Autoencoder (VAE) aims to compress pixel data into low-dimensional latent space, playing an important role in OpenAI's Sora and other latent video diffusion generation models. While most of existing video VAEs inflate a pretrained image VAE into the 3D causal structure for temporal-spatial compression, this paper presents two astonishing findings: (1) The initialization from a well-trained image VAE with the same latent dimensions suppresses the improvement of subsequent temporal compression capabilities. (2) The adoption of causal reasoning leads to unequal information interactions and unbalanced performance between frames. To alleviate these problems, we propose a keyframe-based temporal compression (KTC) architecture and a group causal convolution (GCConv) module to further improve video VAE (IV-VAE). Specifically, the KTC architecture divides the latent space into two branches, in which one half completely inherits the compression prior of keyframes from a lower-dimension image VAE while the other half involves temporal compression through 3D group causal convolution, reducing temporal-spatial conflicts and accelerating the convergence speed of video VAE. The GCConv in above 3D half uses standard convolution within each frame group to ensure inter-frame equivalence, and employs causal logical padding between groups to maintain flexibility in processing variable frame video. Extensive experiments on five benchmarks demonstrate the SOTA video reconstruction and generation capabilities of the proposed IV-VAE (https://wpy1999.github.io/IV-VAE/).
- Published
- 2024
4. EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting
- Author
-
Liao, Bohao, Zhai, Wei, Wan, Zengyu, Zhang, Tianzhu, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF or 3DGS) and camera poses. Despite recent progress, existing methods relying on traditional camera input tend to fail in high-speed (or equivalently low-frame-rate) scenarios. Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution, providing valuable scene and motion information in blind inter-frame intervals. In this paper, we introduce the event camera to aid scene construction from a casually captured video for the first time, and propose Event-Aided Free-Trajectory 3DGS, called EF-3DGS, which seamlessly integrates the advantages of event cameras into 3DGS through three key components. First, we leverage the Event Generation Model (EGM) to fuse events and frames, supervising the rendered views observed by the event stream. Second, we adopt the Contrast Maximization (CMax) framework in a piece-wise manner to extract motion information by maximizing the contrast of the Image of Warped Events (IWE), thereby calibrating the estimated poses. Besides, based on the Linear Event Generation Model (LEGM), the brightness information encoded in the IWE is also utilized to constrain the 3DGS in the gradient domain. Third, to mitigate the absence of color information of events, we introduce photometric bundle adjustment (PBA) to ensure view consistency across events and frames. We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS. Our project page is https://lbh666.github.io/ef-3dgs/., Comment: Project Page: https://lbh666.github.io/ef-3dgs/
- Published
- 2024
5. Visual-Geometric Collaborative Guidance for Affordance Learning
- Author
-
Luo, Hongchen, Zhai, Wei, Wang, Jiao, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Perceiving potential ``action possibilities'' (\ie, affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions. Prevailing affordance learning algorithms often adopt the label assignment paradigm and presume that there is a unique relationship between functional region and affordance label, yielding poor performance when adapting to unseen environments with large appearance variations. In this paper, we propose to leverage interactive affinity for affordance learning, \ie extracting interactive affinity from human-object interaction and transferring it to non-interactive objects. Interactive affinity, which represents the contacts between different parts of the human body and local regions of the target object, can provide inherent cues of interconnectivity between humans and objects, thereby reducing the ambiguity of the perceived action possibilities. To this end, we propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues to excavate interactive affinity from human-object interactions jointly. Besides, a contact-driven affordance learning (CAL) dataset is constructed by collecting and labeling over 55,047 images. Experimental results demonstrate that our method outperforms the representative models regarding objective metrics and visual quality. Project: \href{https://github.com/lhc1224/VCR-Net}{github.com/lhc1224/VCR-Net}.
- Published
- 2024
6. ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization
- Author
-
Li, Jiawei, Zhang, Fanrui, Zhu, Jiaying, Sun, Esther, Zhang, Qiang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of Image Forgery Detection and Localization (IFDL). Moreover, existing IFDL methods are typically limited to the learning of low-level semantic-agnostic clues and merely provide a single outcome judgment. To tackle these issues, we propose ForgeryGPT, a novel framework that advances the IFDL task by capturing high-order forensics knowledge correlations of forged images from diverse linguistic feature spaces, while enabling explainable generation and interactive dialogue through a newly customized Large Language Model (LLM) architecture. Specifically, ForgeryGPT enhances traditional LLMs by integrating the Mask-Aware Forgery Extractor, which enables the excavating of precise forgery mask information from input images and facilitating pixel-level understanding of tampering artifacts. The Mask-Aware Forgery Extractor consists of a Forgery Localization Expert (FL-Expert) and a Mask Encoder, where the FL-Expert is augmented with an Object-agnostic Forgery Prompt and a Vocabulary-enhanced Vision Encoder, allowing for effectively capturing of multi-scale fine-grained forgery details. To enhance its performance, we implement a three-stage training strategy, supported by our designed Mask-Text Alignment and IFDL Task-Specific Instruction Tuning datasets, which align vision-language modalities and improve forgery detection and instruction-following capabilities. Extensive experiments demonstrate the effectiveness of the proposed method., Comment: 16 pages, 14 figures
- Published
- 2024
7. MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
- Author
-
Yang, Jian, Yin, Dacheng, Zhou, Yizhou, Rao, Fengyun, Zhai, Wei, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advancements in multi-modal large language models have propelled the development of joint probabilistic models capable of both image understanding and generation. However, we have identified that recent methods inevitably suffer from loss of image information during understanding task, due to either image discretization or diffusion denoising steps. To address this issue, we propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework. Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss. Differing from diffusion-based approaches, we disentangle the diffusion process from auto-regressive backbone model by employing a light-weight diffusion head on top each auto-regressed image patch embedding. In this way, when the model transits from image generation to understanding through text generation, the backbone model's hidden representation of the image is not limited to the last denoising step. To successfully train our method, we also propose a theoretically proven technique that addresses the numerical stability issue and a training strategy that balances the generation and understanding task goals. Through extensive evaluations on 18 image understanding benchmarks, MMAR demonstrates much more superior performance than other joint multi-modal models, matching the method that employs pretrained CLIP vision encoder, meanwhile being able to generate high quality images at the same time. We also showed that our method is scalable with larger data and model size.
- Published
- 2024
8. LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
- Author
-
Wu, Wei, Zheng, Kecheng, Ma, Shuailei, Lu, Fan, Guo, Yuxin, Zhang, Yifei, Chen, Wei, Guo, Qingpei, Shen, Yujun, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Understanding long text is of great demands in practice but beyond the reach of most language-image pre-training (LIP) models. In this work, we empirically confirm that the key reason causing such an issue is that the training images are usually paired with short captions, leaving certain tokens easily overshadowed by salient tokens. Towards this problem, our initial attempt is to relabel the data with long captions, however, directly learning with which may lead to performance degradation in understanding short text (e.g., in the image classification task). Then, after incorporating corner tokens to aggregate diverse textual information, we manage to help the model catch up to its original level of short text understanding yet greatly enhance its capability of long text understanding. We further look into whether the model can continuously benefit from longer captions and notice a clear trade-off between the performance and the efficiency. Finally, we validate the effectiveness of our approach using a self-constructed large-scale dataset, which consists of 100M long caption oriented text-image pairs. Our method demonstrates superior performance in long-text-image retrieval tasks. The project page is available at https://wuw2019.github.io/lot-lip.
- Published
- 2024
9. Ferrovalley Physics in Stacked Bilayer Altermagnetic Systems
- Author
-
Li, Yun-Qin, Zhang, Yu-Ke, Lu, Xin-Le, Shao, Ya-Ping, Bao, Zhi-qiang, Zheng, Jun-Ding, Tong, Wen-Yi, and Duan, Chun-Gang
- Subjects
Condensed Matter - Materials Science - Abstract
As an emerging magnetic phase, altermagnets with compensated magnetic order and non-relativistic spin-splitting have attracted widespread attention. Currently, strain engineering is considered to be an effective method for inducing valley polarization in altermagnets, however, achieving controllable switching of valley polarization is extremely challenging. Herein, combined with tight-binding model and first-principles calculations, we propose that interlayer sliding can be used to successfully induce and effectively manipulate the large valley polarization in altermagnets. Using Fe2MX4 (M = Mo, W; X = S, Se or Te) family as examples, we predict that sliding induced ferrovalley states in such systems can exhibit many unique properties, including the linearly optical dichroism that is independent of spin-orbit coupling, and the anomalous valley Hall effect. These findings imply the correlation among spin, valley, layer and optical degrees of freedom that makes altermagnets attractive in spintronics, valleytronics and even their crossing areas.
- Published
- 2024
10. Dissipation-accelerated entanglement generation
- Author
-
Zheng, Xiao-Wei, Zheng, Jun-Cong, Pan, Xue-Feng, Lin, Li-Hua, Han, Pei-Rong, and Li, Peng-Bo
- Subjects
Quantum Physics - Abstract
Dissipation is usually considered a negative factor for observing quantum effects and for harnessing them for quantum technologies. Here we propose a scheme for speeding up the generation of quantum entanglement between two coupled qubits by introducing a strong dissipation channel to one of these qubits. The maximal entanglement is conditionally established by evenly distributing a single excitation between these two qubits. When the excitation is initially held by the dissipative qubit, the dissipation accelerates the excitation re-distribution process for the quantum state trajectory without quantum jumps. Our results show that the time needed to conditionally attain the maximal entanglement is monotonously decreased as the dissipative rate is increased. We further show that this scheme can be generalized to accelerate the production of the W state for the three-qubit system, where one NH qubit is symmetrically coupled to two Hermitian qubits., Comment: 2 figures
- Published
- 2024
11. Grounding 3D Scene Affordance From Egocentric Interactions
- Author
-
Liu, Cuiyu, Zhai, Wei, Yang, Yuhang, Luo, Hongchen, Liang, Sen, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Grounding 3D scene affordance aims to locate interactive regions in 3D environments, which is crucial for embodied agents to interact intelligently with their surroundings. Most existing approaches achieve this by mapping semantics to 3D instances based on static geometric structure and visual appearance. This passive strategy limits the agent's ability to actively perceive and engage with the environment, making it reliant on predefined semantic instructions. In contrast, humans develop complex interaction skills by observing and imitating how others interact with their surroundings. To empower the model with such abilities, we introduce a novel task: grounding 3D scene affordance from egocentric interactions, where the goal is to identify the corresponding affordance regions in a 3D scene based on an egocentric video of an interaction. This task faces the challenges of spatial complexity and alignment complexity across multiple sources. To address these challenges, we propose the Egocentric Interaction-driven 3D Scene Affordance Grounding (Ego-SAG) framework, which utilizes interaction intent to guide the model in focusing on interaction-relevant sub-regions and aligns affordance features from different sources through a bidirectional query decoder mechanism. Furthermore, we introduce the Egocentric Video-3D Scene Affordance Dataset (VSAD), covering a wide range of common interaction types and diverse 3D environments to support this task. Extensive experiments on VSAD validate both the feasibility of the proposed task and the effectiveness of our approach.
- Published
- 2024
12. DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion
- Author
-
Huang, Yukun, Wang, Jianan, Zeng, Ailing, Zha, Zheng-Jun, Zhang, Lei, and Liu, Xihui
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics ,Computer Science - Machine Learning - Abstract
Leveraging pretrained 2D diffusion models and score distillation sampling (SDS), recent methods have shown promising results for text-to-3D avatar generation. However, generating high-quality 3D avatars capable of expressive animation remains challenging. In this work, we present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text. The core of this framework lies in Skeleton-guided Score Distillation and Hybrid 3D Gaussian Avatar representation. Specifically, the proposed skeleton-guided score distillation integrates skeleton controls from 3D human templates into 2D diffusion models, enhancing the consistency of SDS supervision in terms of view and human pose. This facilitates the generation of high-quality avatars, mitigating issues such as multiple faces, extra limbs, and blurring. The proposed hybrid 3D Gaussian avatar representation builds on the efficient 3D Gaussians, combining neural implicit fields and parameterized 3D meshes to enable real-time rendering, stable SDS optimization, and expressive animation. Extensive experiments demonstrate that DreamWaltz-G is highly effective in generating and animating 3D avatars, outperforming existing methods in both visual quality and animation expressiveness. Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition., Comment: Project page: https://yukun-huang.github.io/DreamWaltz-G/
- Published
- 2024
13. Finite-time input-to-state stability for infinite-dimensional systems
- Author
-
Sun, Xiaorong, Zheng, Jun, and Zhu, Guchuan
- Subjects
Mathematics - Optimization and Control ,Electrical Engineering and Systems Science - Systems and Control - Abstract
In this paper, we extend the notion of finite-time input-to-state stability (FTISS) for finite-dimensional systems to infinite-dimensional systems. More specifically, we first prove an FTISS Lyapunov theorem for a class of infinite-dimensional systems, namely, the existence of an FTISS Lyapunov functional (FTISS-LF) implies the FTISS of the system, and then, provide a sufficient condition for ensuring the existence of an FTISS-LF for a class of abstract infinite-dimensional systems under the framework of compact semigroup theory and Hilbert spaces. As an application of the FTISS Lyapunov theorem, we verify the FTISS for a class of parabolic PDEs involving sublinear terms and distributed in-domain disturbances. Since the nonlinear terms of the corresponding abstract system are not Lipschitz continuous, the well-posedness is proved based on the application of compact semigroup theory and the FTISS is assessed by using the Lyapunov method with the aid of an interpolation inequality. Numerical simulations are conducted to confirm the theoretical results.
- Published
- 2024
14. QMambaBSR: Burst Image Super-Resolution with Query State Space Model
- Author
-
Di, Xin, Peng, Long, Xia, Peizhe, Li, Wenbo, Pei, Renjing, Cao, Yang, Wang, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Burst super-resolution aims to reconstruct high-resolution images with higher quality and richer details by fusing the sub-pixel information from multiple burst low-resolution frames. In BusrtSR, the key challenge lies in extracting the base frame's content complementary sub-pixel details while simultaneously suppressing high-frequency noise disturbance. Existing methods attempt to extract sub-pixels by modeling inter-frame relationships frame by frame while overlooking the mutual correlations among multi-current frames and neglecting the intra-frame interactions, leading to inaccurate and noisy sub-pixels for base frame super-resolution. Further, existing methods mainly employ static upsampling with fixed parameters to improve spatial resolution for all scenes, failing to perceive the sub-pixel distribution difference across multiple frames and cannot balance the fusion weights of different frames, resulting in over-smoothed details and artifacts. To address these limitations, we introduce a novel Query Mamba Burst Super-Resolution (QMambaBSR) network, which incorporates a Query State Space Model (QSSM) and Adaptive Up-sampling module (AdaUp). Specifically, based on the observation that sub-pixels have consistent spatial distribution while random noise is inconsistently distributed, a novel QSSM is proposed to efficiently extract sub-pixels through inter-frame querying and intra-frame scanning while mitigating noise interference in a single step. Moreover, AdaUp is designed to dynamically adjust the upsampling kernel based on the spatial distribution of multi-frame sub-pixel information in the different burst scenes, thereby facilitating the reconstruction of the spatial arrangement of high-resolution details. Extensive experiments on four popular synthetic and real-world benchmarks demonstrate that our method achieves a new state-of-the-art performance.
- Published
- 2024
15. Synthetic monopole with half-integer magnetic charge in Bose-Einstein condensates
- Author
-
Chen, Xi-Yu, Jiang, Lijia, Bai, Wen-Kai, Yang, Tao, and Zheng, Jun-Hui
- Subjects
Condensed Matter - Quantum Gases ,Nonlinear Sciences - Pattern Formation and Solitons ,Quantum Physics - Abstract
We propose a scheme to create monopoles with half-integer magnetic charges in a spinful cold atom system. With a minimal monopole in the center, we derive the ground-state single-vortex wave function on the sphere and develop the vortex's kinematic equation in the presence of an external electromagnetic field. The vortex's trajectory is generally depicted by the precession of the system. We further formulate the inter-vortex interaction and build up a theory of multi-vortex dynamics in high-charge monopole systems. We predict the vortices'trajectory in the bi-vortex system and figure out stable vortex (line) patterns in multi-vortex systems. Our study provides deep insights into properties of magnetic monopoles and vortices and paves the way for experimental verification., Comment: 6+2+3 pages, 4+1 figures, 1 table
- Published
- 2024
16. Input-to-state stabilization of $1$-D parabolic equations with Dirichlet boundary disturbances under boundary fixed-time control
- Author
-
Zheng, Jun and Zhu, Guchuan
- Subjects
Mathematics - Optimization and Control ,Mathematics - Analysis of PDEs - Abstract
This paper addresses the problem of stabilization of $1$-D parabolic equations with destabilizing terms and Dirichlet boundary disturbances. By using the method of backstepping and the technique of splitting, a boundary feedback controller is designed to ensure the input-to-state stability (ISS) of the closed-loop system with Dirichlet boundary disturbances, while preserving fixed-time stability (FTS) of the corresponding disturbance-free system, for which the fixed time is either determined by the Riemann zeta function or freely prescribed. To overcome the difficulty brought by Dirichlet boundary disturbances, the ISS and FTS properties of the involved systems are assessed by applying the generalized Lyapunov method. Numerical simulations are conducted to illustrate the effectiveness of the proposed scheme of control design.
- Published
- 2024
17. Downstream-Pretext Domain Knowledge Traceback for Active Learning
- Author
-
Zhang, Beichen, Li, Liang, Zha, Zheng-Jun, Luo, Jiebo, and Huang, Qingming
- Subjects
Computer Science - Machine Learning - Abstract
Active learning (AL) is designed to construct a high-quality labeled dataset by iteratively selecting the most informative samples. Such sampling heavily relies on data representation, while recently pre-training is popular for robust feature learning. However, as pre-training utilizes low-level pretext tasks that lack annotation, directly using pre-trained representation in AL is inadequate for determining the sampling score. To address this problem, we propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance for selecting diverse and instructive samples near the decision boundary. DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator. The diversity indicator constructs two feature spaces based on the pre-training pretext model and the downstream knowledge from annotation, by which it locates the neighbors of unlabeled data from the downstream space in the pretext space to explore the interaction of samples. With this mechanism, DOKT unifies the data relations of low-level and high-level representations to estimate traceback diversity. Next, in the uncertainty estimator, domain mixing is designed to enforce perceptual perturbing to unlabeled samples with similar visual patches in the pretext space. Then the divergence of perturbed samples is measured to estimate the domain uncertainty. As a result, DOKT selects the most diverse and important samples based on these two modules. The experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods and generalizes well to various application scenarios such as semantic segmentation and image captioning.
- Published
- 2024
- Full Text
- View/download PDF
18. Unconventional Spin-Orbit Torques from Sputtered MoTe2 Films
- Author
-
Li, Shuchen, Gibbons, Jonathan, Chyczewski, Stasiu, Liu, Zetai, Ni, Hsu-Chih, Qian, Jiangchao, Zuo, Jian-Min, Zheng, Jun-Fei, Zhu, Wenjuan, and Hoffmann, Axel
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Materials with strong spin-orbit coupling and low crystalline symmetry are promising for generating large unconventional spin-orbit torques (SOTs), such as in-plane field-like (FL) torques and out-of-plane damping-like (DL) torques, which can effectively manipulate and deterministically switch an out-of-plane magnetization without the need for additional external in-plane magnetic fields. Here, we report SOTs generated by magnetron-sputtered 1T' MoTe2/Permalloy (Py; Ni80Fe20)/MgO heterostructures using both spin-torque ferromagnetic resonance (ST-FMR) and second harmonic Hall measurements. We observed unconventional FL and DL torques in our samples due to spins polarized normal to the interface of MoTe2 and Py layers, and studied the influence of crystallographic order and MoTe2 layer thickness on the SOTs. By comparing the Raman spectra of 1T' MoTe2 samples prepared in different ways, we found a tensile strain in sputtered MoTe2 films, which might further enhance the generation of unconventional torques by reducing the symmetry of 1T' MoTe2.
- Published
- 2024
19. FC3DNet: A Fully Connected Encoder-Decoder for Efficient Demoir'eing
- Author
-
Du, Zhibo, Peng, Long, Wang, Yang, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Moir\'e patterns are commonly seen when taking photos of screens. Camera devices usually have limited hardware performance but take high-resolution photos. However, users are sensitive to the photo processing time, which presents a hardly considered challenge of efficiency for demoir\'eing methods. To balance the network speed and quality of results, we propose a \textbf{F}ully \textbf{C}onnected en\textbf{C}oder-de\textbf{C}oder based \textbf{D}emoir\'eing \textbf{Net}work (FC3DNet). FC3DNet utilizes features with multiple scales in each stage of the decoder for comprehensive information, which contains long-range patterns as well as various local moir\'e styles that both are crucial aspects in demoir\'eing. Besides, to make full use of multiple features, we design a Multi-Feature Multi-Attention Fusion (MFMAF) module to weigh the importance of each feature and compress them for efficiency. These designs enable our network to achieve performance comparable to state-of-the-art (SOTA) methods in real-world datasets while utilizing only a fraction of parameters, FLOPs, and runtime., Comment: Accepted by ICIP2024
- Published
- 2024
20. Input-to-State Stabilization of 1-D Parabolic PDEs under Output Feedback Control
- Author
-
Bi, Yongchun, Zheng, Jun, and Zhu, Guchuan
- Subjects
Mathematics - Optimization and Control ,Mathematics - Analysis of PDEs - Abstract
This paper addresses the problem of input-to-state stabilization for a class of parabolic equations with time-varying coefficients, as well as Dirichlet and Robin boundary disturbances. By using time-invariant kernel functions, which can reduce the complexity in control design and implementation, an observer-based output feedback controller is designed via backstepping. By using the generalized Lyapunov method, which can be used to handle Dirichlet boundary terms, the input-to-state stability of the closed-loop system under output feedback control, as well as the state estimation error system, is established in the spatial $L^\infty$-norm. Numerical simulations are conducted to confirm the theoretical results and to illustrate the effectiveness of the proposed control scheme.
- Published
- 2024
21. DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera
- Author
-
Xu, Senyan, Sun, Zhijing, Zhu, Jiaying, Zhu, Yurui, Fu, Xueyang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Hybrid Event-Based Vision Sensor (HybridEVS) is a novel sensor integrating traditional frame-based and event-based sensors, offering substantial benefits for applications requiring low-light, high dynamic range, and low-latency environments, such as smartphones and wearable devices. Despite its potential, the lack of Image signal processing (ISP) pipeline specifically designed for HybridEVS poses a significant challenge. To address this challenge, in this study, we propose a coarse-to-fine framework named DemosaicFormer which comprises coarse demosaicing and pixel correction. Coarse demosaicing network is designed to produce a preliminary high-quality estimate of the RGB image from the HybridEVS raw data while the pixel correction network enhances the performance of image restoration and mitigates the impact of defective pixels. Our key innovation is the design of a Multi-Scale Gating Module (MSGM) applying the integration of cross-scale features, which allows feature information to flow between different scales. Additionally, the adoption of progressive training and data augmentation strategies further improves model's robustness and effectiveness. Experimental results show superior performance against the existing methods both qualitatively and visually, and our DemosaicFormer achieves the best performance in terms of all the evaluation metrics in the MIPI 2024 challenge on Demosaic for Hybridevs Camera. The code is available at https://github.com/QUEAHREN/DemosaicFormer.
- Published
- 2024
22. Towards Realistic Data Generation for Real-World Super-Resolution
- Author
-
Peng, Long, Li, Wenbo, Pei, Renjing, Ren, Jingjing, Wang, Yang, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Existing image super-resolution (SR) techniques often fail to generalize effectively in complex real-world settings due to the significant divergence between training data and practical scenarios. To address this challenge, previous efforts have either manually simulated intricate physical-based degradations or utilized learning-based techniques, yet these approaches remain inadequate for producing large-scale, realistic, and diverse data simultaneously. In this paper, we introduce a novel Realistic Decoupled Data Generator (RealDGen), an unsupervised learning data generation framework designed for real-world super-resolution. We meticulously develop content and degradation extraction strategies, which are integrated into a novel content-degradation decoupled diffusion model to create realistic low-resolution images from unpaired real LR and HR images. Extensive experiments demonstrate that RealDGen excels in generating large-scale, high-quality paired data that mirrors real-world degradations, significantly advancing the performance of popular SR models on various real-world benchmarks.
- Published
- 2024
23. Context-aware Difference Distilling for Multi-change Captioning
- Author
-
Tu, Yunbin, Li, Liang, Su, Li, Zha, Zheng-Jun, Yan, Chenggang, and Huang, Qingming
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multi-change captioning aims to describe complex and coupled changes within an image pair in natural language. Compared with single-change captioning, this task requires the model to have higher-level cognition ability to reason an arbitrary number of changes. In this paper, we propose a novel context-aware difference distilling (CARD) network to capture all genuine changes for yielding sentences. Given an image pair, CARD first decouples context features that aggregate all similar/dissimilar semantics, termed common/difference context features. Then, the consistency and independence constraints are designed to guarantee the alignment/discrepancy of common/difference context features. Further, the common context features guide the model to mine locally unchanged features, which are subtracted from the pair to distill locally difference features. Next, the difference context features augment the locally difference features to ensure that all changes are distilled. In this way, we obtain an omni-representation of all changes, which is translated into linguistic sentences by a transformer decoder. Extensive experiments on three public datasets show CARD performs favourably against state-of-the-art methods.The code is available at https://github.com/tuyunbin/CARD., Comment: Accepted by ACL 2024 main conference (long paper)
- Published
- 2024
24. FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining
- Author
-
Li, Dong, Liu, Yidi, Fu, Xueyang, Xu, Senyan, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Image deraining aims to remove rain streaks from rainy images and restore clear backgrounds. Currently, some research that employs the Fourier transform has proved to be effective for image deraining, due to it acting as an effective frequency prior for capturing rain streaks. However, despite there exists dependency of low frequency and high frequency in images, these Fourier-based methods rarely exploit the correlation of different frequencies for conjuncting their learning procedures, limiting the full utilization of frequency information for image deraining. Alternatively, the recently emerged Mamba technique depicts its effectiveness and efficiency for modeling correlation in various domains (e.g., spatial, temporal), and we argue that introducing Mamba into its unexplored Fourier spaces to correlate different frequencies would help improve image deraining. This motivates us to propose a new framework termed FourierMamba, which performs image deraining with Mamba in the Fourier space. Owning to the unique arrangement of frequency orders in Fourier space, the core of FourierMamba lies in the scanning encoding of different frequencies, where the low-high frequency order formats exhibit differently in the spatial dimension (unarranged in axis) and channel dimension (arranged in axis). Therefore, we design FourierMamba that correlates Fourier space information in the spatial and channel dimensions with distinct designs. Specifically, in the spatial dimension Fourier space, we introduce the zigzag coding to scan the frequencies to rearrange the orders from low to high frequencies, thereby orderly correlating the connections between frequencies; in the channel dimension Fourier space with arranged orders of frequencies in axis, we can directly use Mamba to perform frequency correlation and improve the channel information representation.
- Published
- 2024
25. VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers
- Author
-
Zheng, Jun, Zhao, Fuwei, Xu, Youjiang, Dong, Xin, and Liang, Xiaodan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Video try-on stands as a promising area for its tremendous real-world potential. Prior works are limited to transferring product clothing images onto person videos with simple poses and backgrounds, while underperforming on casually captured videos. Recently, Sora revealed the scalability of Diffusion Transformer (DiT) in generating lifelike videos featuring real-world scenarios. Inspired by this, we explore and propose the first DiT-based video try-on framework for practical in-the-wild applications, named VITON-DiT. Specifically, VITON-DiT consists of a garment extractor, a Spatial-Temporal denoising DiT, and an identity preservation ControlNet. To faithfully recover the clothing details, the extracted garment features are fused with the self-attention outputs of the denoising DiT and the ControlNet. We also introduce novel random selection strategies during training and an Interpolated Auto-Regressive (IAR) technique at inference to facilitate long video generation. Unlike existing attempts that require the laborious and restrictive construction of a paired training dataset, severely limiting their scalability, VITON-DiT alleviates this by relying solely on unpaired human dance videos and a carefully designed multi-stage training strategy. Furthermore, we curate a challenging benchmark dataset to evaluate the performance of casual video try-on. Extensive experiments demonstrate the superiority of VITON-DiT in generating spatio-temporal consistent try-on results for in-the-wild videos with complicated human poses., Comment: Project Page: https://zhengjun-ai.github.io/viton-dit-page/
- Published
- 2024
26. EgoChoir: Capturing 3D Human-Object Interaction Regions from Egocentric Views
- Author
-
Yang, Yuhang, Zhai, Wei, Wang, Chengfeng, Yu, Chengjun, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Understanding egocentric human-object interaction (HOI) is a fundamental aspect of human-centric perception, facilitating applications like AR/VR and embodied AI. For the egocentric HOI, in addition to perceiving semantics e.g., ''what'' interaction is occurring, capturing ''where'' the interaction specifically manifests in 3D space is also crucial, which links the perception and operation. Existing methods primarily leverage observations of HOI to capture interaction regions from an exocentric view. However, incomplete observations of interacting parties in the egocentric view introduce ambiguity between visual observations and interaction contents, impairing their efficacy. From the egocentric view, humans integrate the visual cortex, cerebellum, and brain to internalize their intentions and interaction concepts of objects, allowing for the pre-formulation of interactions and making behaviors even when interaction regions are out of sight. In light of this, we propose harmonizing the visual appearance, head motion, and 3D object to excavate the object interaction concept and subject intention, jointly inferring 3D human contact and object affordance from egocentric videos. To achieve this, we present EgoChoir, which links object structures with interaction contexts inherent in appearance and head motion to reveal object affordance, further utilizing it to model human contact. Additionally, a gradient modulation is employed to adopt appropriate clues for capturing interaction regions across various egocentric scenarios. Moreover, 3D contact and affordance are annotated for egocentric videos collected from Ego-Exo4D and GIMO to support the task. Extensive experiments on them demonstrate the effectiveness and superiority of EgoChoir., Comment: NeurIPS2024, project: https://yyvhang.github.io/EgoChoir/
- Published
- 2024
27. ViViD: Video Virtual Try-on using Diffusion Models
- Author
-
Fang, Zixun, Zhai, Wei, Su, Aimin, Song, Hongliang, Zhu, Kai, Wang, Mao, Chen, Yu, Liu, Zhiheng, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Video virtual try-on aims to transfer a clothing item onto the video of a target person. Directly applying the technique of image-based try-on to the video domain in a frame-wise manner will cause temporal-inconsistent outcomes while previous video-based try-on solutions can only generate low visual quality and blurring results. In this work, we present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Specifically, we design the Garment Encoder to extract fine-grained clothing semantic features, guiding the model to capture garment details and inject them into the target video through the proposed attention feature fusion mechanism. To ensure spatial-temporal consistency, we introduce a lightweight Pose Encoder to encode pose signals, enabling the model to learn the interactions between clothing and human posture and insert hierarchical Temporal Modules into the text-to-image stable diffusion model for more coherent and lifelike video synthesis. Furthermore, we collect a new dataset, which is the largest, with the most diverse types of garments and the highest resolution for the task of video virtual try-on to date. Extensive experiments demonstrate that our approach is able to yield satisfactory video try-on results. The dataset, codes, and weights will be publicly available. Project page: https://becauseimbatman0.github.io/ViViD.
- Published
- 2024
28. Efficient Real-world Image Super-Resolution Via Adaptive Directional Gradient Convolution
- Author
-
Peng, Long, Cao, Yang, Pei, Renjing, Li, Wenbo, Guo, Jiaming, Fu, Xueyang, Wang, Yang, and Zha, Zheng-Jun
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Real-SR endeavors to produce high-resolution images with rich details while mitigating the impact of multiple degradation factors. Although existing methods have achieved impressive achievements in detail recovery, they still fall short when addressing regions with complex gradient arrangements due to the intensity-based linear weighting feature extraction manner. Moreover, the stochastic artifacts introduced by degradation cues during the imaging process in real LR increase the disorder of the overall image details, further complicating the perception of intrinsic gradient arrangement. To address these challenges, we innovatively introduce kernel-wise differential operations within the convolutional kernel and develop several learnable directional gradient convolutions. These convolutions are integrated in parallel with a novel linear weighting mechanism to form an Adaptive Directional Gradient Convolution (DGConv), which adaptively weights and fuses the basic directional gradients to improve the gradient arrangement perception capability for both regular and irregular textures. Coupled with DGConv, we further devise a novel equivalent parameter fusion method for DGConv that maintains its rich representational capabilities while keeping computational costs consistent with a single Vanilla Convolution (VConv), enabling DGConv to improve the performance of existing super-resolution networks without incurring additional computational expenses. To better leverage the superiority of DGConv, we further develop an Adaptive Information Interaction Block (AIIBlock) to adeptly balance the enhancement of texture and contrast while meticulously investigating the interdependencies, culminating in the creation of a DGPNet for Real-SR through simple stacking. Comparative results with 15 SOTA methods across three public datasets underscore the effectiveness and efficiency of our proposed approach.
- Published
- 2024
29. MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results
- Author
-
Wu, Yaqi, Fan, Zhihao, Chu, Xiaofeng, Ren, Jimmy S., Li, Xiaoming, Yue, Zongsheng, Li, Chongyi, Zhou, Shangcheng, Feng, Ruicheng, Dai, Yuekun, Yang, Peiqing, Loy, Chen Change, Xu, Senyan, Sun, Zhijing, Zhu, Jiaying, Zhu, Yurui, Fu, Xueyang, Zha, Zheng-Jun, Cao, Jun, Li, Cheng, Chen, Shu, Ma, Liang, Zhou, Shiyang, Zeng, Haijin, Feng, Kai, Chen, Yongyong, Su, Jingyong, Guan, Xianyu, Yu, Hongyuan, Wan, Cheng, Lin, Jiamin, Han, Binnan, Zou, Yajun, Wu, Zhuoyuan, Huang, Yuan, Yu, Yongsheng, Zhang, Daoan, Li, Jizhe, Yin, Xuanwu, Zuo, Kunlong, Lu, Yunfan, Xu, Yijie, Ma, Wenzong, Guo, Weiyu, Xiong, Hui, Yu, Wei, Luo, Bingchun, Nathan, Sabari, and Kansal, Priya
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/., Comment: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/
- Published
- 2024
30. Strong Damping-Like Torques in Wafer-Scale MoTe${}_2$ Grown by MOCVD
- Author
-
Chyczewski, Stasiu Thomas, Lee, Hanwool, Li, Shuchen, Eladl, Marwan, Zheng, Jun-Fei, Hoffmann, Axel, and Zhu, Wenjuan
- Subjects
Condensed Matter - Materials Science - Abstract
The scalable synthesis of strong spin orbit coupling (SOC) materials such as 1T${}^\prime$ phase MoTe${}_2$ is crucial for spintronics development. Here, we demonstrate wafer-scale growth of 1T${}^\prime$ MoTe${}_2$ using metal-organic chemical vapor deposition (MOCVD) with sputtered Mo and (C${}_4$H${}_9$)${}_2$Te. The synthesized films show uniform coverage across the entire sample surface. By adjusting the growth parameters, a synthesis process capable of producing 1T${}^\prime$ and 2H MoTe${}_2$ mixed phase films was achieved. Notably, the developed process is compatible with back-end-of-line (BEOL) applications. The strong spin-orbit coupling of the grown 1T${}^\prime$ MoTe${}_2$ films was demonstrated through spin torque ferromagnetic resonance (ST-FMR) measurements conducted on a 1T${}^\prime$ MoTe${}_2$/permalloy bilayer RF waveguide. These measurements revealed a significant damping-like torque in the wafer-scale 1T${}^\prime$ MoTe${}_2$ film and indicated high spin-charge conversion efficiency. The BEOL compatible process and potent spin orbit torque demonstrate promise in advanced device applications., Comment: 13 pages total, 5 figures. To be submitted
- Published
- 2024
31. MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking
- Author
-
Wang, Zhong, Wan, Zengyu, Han, Han, Liao, Bohao, Wu, Yuliang, Zhai, Wei, Cao, Yang, and Zha, Zheng-jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy provided by the event camera. However, the diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization. To achieve a stable event-based eye-tracking system, this paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information in response to the variability of eye movements. Specifically, the MambaPupil network is proposed, which consists of the multi-layer convolutional encoder to extract features from the event representations, a bidirectional Gated Recurrent Unit (GRU), and a Linear Time-Varying State Space Module (LTV-SSM), to selectively capture contextual correlation from the forward and backward temporal relationship. Furthermore, the Bina-rep is utilized as a compact event representation, and the tailor-made data augmentation, called as Event-Cutout, is proposed to enhance the model's robustness by applying spatial random masking to the event image. The evaluation on the ThreeET-plus benchmark shows the superior performance of the MambaPupil, which secured the 1st place in CVPR'2024 AIS Event-based Eye Tracking challenge., Comment: Accepted by CVPR 2024 Workshop (AIS: Vision, Graphics and AI for Streaming), top solution of challenge Event-based Eye Tracking, see https://www.kaggle.com/competitions/event-based-eye-tracking-ais2024
- Published
- 2024
32. Event-Based Eye Tracking. AIS 2024 Challenge Survey
- Author
-
Wang, Zuowen, Gao, Chang, Wu, Zongwei, Conde, Marcos V., Timofte, Radu, Liu, Shih-Chii, Chen, Qinyu, Zha, Zheng-jun, Zhai, Wei, Han, Han, Liao, Bohao, Wu, Yuliang, Wan, Zengyu, Wang, Zhong, Cao, Yang, Tan, Ganchao, Chen, Jinze, Pei, Yan Ru, Brüers, Sasskia, Crouzet, Sébastien, McLelland, Douglas, Coenen, Oliver, Zhang, Baoheng, Gao, Yizhao, Li, Jingyuan, So, Hayden Kwok-Hay, Bich, Philippe, Boretti, Chiara, Prono, Luciano, Lică, Mircea, Dinucu-Jianu, David, Grîu, Cătălin, Lin, Xiaopeng, Ren, Hongwei, Cheng, Bojun, Zhang, Xinan, Vial, Valentin, Yezzi, Anthony, and Tsai, James
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research., Comment: Qinyu Chen is the corresponding author
- Published
- 2024
33. Superfluid Oscillator Circuit with Quantum Current Regulator
- Author
-
Yang, Xue, Bai, Wenkai, Jiao, Chen, Liu, Wu-Ming, Zheng, Jun-Hui, and Yang, Tao
- Subjects
Quantum Physics ,Condensed Matter - Quantum Gases - Abstract
We examine the properties of atomic current in a superfluid oscillating circuit consisting of a mesoscopic channel that connects two reservoirs of a Bose-Einstein condensate. We investigate the presence of a critical current in the channel and examine how the amplitude of the oscillations in the number imbalance between the two reservoirs varies with system parameters. In addition to highlighting that the dissipative resistance stems from the formation of vortex pairs, we also illustrate the role of these vortex pairs as a quantum current regulator. The dissipation strength is discrete based on the number imbalance, which corresponds to the emergence of vortex pairs in the system. Our findings indicate that the circuit demonstrates characteristics of both voltage-limiting and current-limiting mechanisms. To model the damping behavior of the atomic superfluid circuit, we develop an equivalent LC oscillator circuit with a quantum current regulator., Comment: 6 figures
- Published
- 2024
- Full Text
- View/download PDF
34. Electromagnetic chirality-induced negative refraction with the same amplitude and anti-phase of the two chirality coefficients
- Author
-
Zhao, Shun-Cai, Liu, Zheng-Dong, Zheng, Jun, and Li, Gen
- Subjects
Quantum Physics - Abstract
We suggest a scheme of electromagnetic chirality-induced negative refraction utilizing magneto-electric cross coupling in a four-level atomic system. The negative refraction can be achieved with the two chirality coefficients having the same amplitude but the opposite phase,and without requiring the simultaneous presence of an electric-dipole and a magnetic-dipole transition near the same transition frequency. The simultaneously negative electric permittivity and magnetic permeability does not require, either., Comment: 8 pages, 4 figures
- Published
- 2024
- Full Text
- View/download PDF
35. Hierarchical Information Enhancement Network for Cascade Prediction in Social Networks
- Author
-
Zhang, Fanrui, Liu, Jiawei, Zhang, Qiang, Zhu, Xiaoling, and Zha, Zheng-Jun
- Subjects
Computer Science - Social and Information Networks ,Computer Science - Artificial Intelligence - Abstract
Understanding information cascades in networks is a fundamental issue in numerous applications. Current researches often sample cascade information into several independent paths or subgraphs to learn a simple cascade representation. However, these approaches fail to exploit the hierarchical semantic associations between different modalities, limiting their predictive performance. In this work, we propose a novel Hierarchical Information Enhancement Network (HIENet) for cascade prediction. Our approach integrates fundamental cascade sequence, user social graphs, and sub-cascade graph into a unified framework. Specifically, HIENet utilizes DeepWalk to sample cascades information into a series of sequences. It then gathers path information between users to extract the social relationships of propagators. Additionally, we employ a time-stamped graph convolutional network to aggregate sub-cascade graph information effectively. Ultimately, we introduce a Multi-modal Cascade Transformer to powerfully fuse these clues, providing a comprehensive understanding of cascading process. Extensive experiments have demonstrated the effectiveness of the proposed method., Comment: 7 pages, 2 figures
- Published
- 2024
36. Multi-perspective Memory Enhanced Network for Identifying Key Nodes in Social Networks
- Author
-
Zhang, Qiang, Liu, Jiawei, Zhang, Fanrui, Zhu, Xiaoling, and Zha, Zheng-Jun
- Subjects
Computer Science - Social and Information Networks ,Computer Science - Artificial Intelligence - Abstract
Identifying key nodes in social networks plays a crucial role in timely blocking false information. Existing key node identification methods usually consider node influence only from the propagation structure perspective and have insufficient generalization ability to unknown scenarios. In this paper, we propose a novel Multi-perspective Memory Enhanced Network (MMEN) for identifying key nodes in social networks, which mines key nodes from multiple perspectives and utilizes memory networks to store historical information. Specifically, MMEN first constructs two propagation networks from the perspectives of user attributes and propagation structure and updates node feature representations using graph attention networks. Meanwhile, the memory network is employed to store information of similar subgraphs, enhancing the model's generalization performance in unknown scenarios. Finally, MMEN applies adaptive weights to combine the node influence of the two propagation networks to select the ultimate key nodes. Extensive experiments demonstrate that our method significantly outperforms previous methods., Comment: 7 pages, 1 figures
- Published
- 2024
37. VisualCritic: Making LMMs Perceive Visual Quality Like Humans
- Author
-
Huang, Zhipeng, Zhang, Zhizheng, Lu, Yiting, Zha, Zheng-Jun, Chen, Zhibo, and Guo, Baining
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
At present, large multimodal models (LMMs) have exhibited impressive generalization capabilities in understanding and generating visual signals. However, they currently still lack sufficient capability to perceive low-level visual quality akin to human perception. Can LMMs achieve this and show the same degree of generalization in this regard? If so, not only could the versatility of LMMs be further enhanced, but also the challenge of poor cross-dataset performance in the field of visual quality assessment could be addressed. In this paper, we explore this question and provide the answer "Yes!". As the result of this initial exploration, we present VisualCritic, the first LMM for broad-spectrum image subjective quality assessment. VisualCritic can be used across diverse data right out of box, without any requirements of dataset-specific adaptation operations like conventional specialist models. As an instruction-following LMM, VisualCritic enables new capabilities of (1) quantitatively measuring the perceptual quality of given images in terms of their Mean Opinion Score (MOS), noisiness, colorfulness, sharpness, and other numerical indicators, (2) qualitatively evaluating visual quality and providing explainable descriptions, (3) discerning whether a given image is AI-generated or photographic. Extensive experiments demonstrate the efficacy of VisualCritic by comparing it with other open-source LMMs and conventional specialist models over both AI-generated and photographic images.
- Published
- 2024
38. RelationVLM: Making Large Vision-Language Models Understand Visual Relations
- Author
-
Huang, Zhipeng, Zhang, Zhizheng, Zha, Zheng-Jun, Lu, Yan, and Guo, Baining
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved. Very recent works enable LVLMs to localize object-level visual contents and ground text to them. Nonetheless, current LVLMs still struggle to precisely understand visual relations due to the lack of relevant data. In this work, we present RelationVLM, a large vision-language model capable of comprehending various levels and types of relations whether across multiple images or within a video. Specifically, we devise a multi-stage relation-aware training scheme and a series of corresponding data configuration strategies to bestow RelationVLM with the capabilities of understanding semantic relations, temporal associations and geometric transforms. Extensive case studies and quantitative evaluations show RelationVLM has strong capability in understanding such relations and emerges impressive in-context capability of reasoning from few-shot examples by comparison. This work fosters the advancements of LVLMs by enabling them to support a wider range of downstream applications toward artificial general intelligence.
- Published
- 2024
39. Dielectric response in twisted MoS2 bilayer facilitated by spin-orbit coupling effect
- Author
-
Shen, Yu-Hao, Zheng, Jun-Ding, Tong, Wen-Yi, Bao, Zhi-Qiang, Wan, Xian-Gang, and Duan, Chun-Gang
- Subjects
Condensed Matter - Materials Science - Abstract
Twisted van der Waals bilayers offer ideal two-dimensional (2D) platforms for exploring the intricate interplay between the spin and charge degrees of freedom of electrons. By investigating twisted MoS2 bilayer, featuring two distinct stackings but with identical commensurate supercell sizes, we reveal an unusual dielectric response behavior inherent to this system. Our first-principles calculations demonstrate that the application of an out-of-plane electric field gives different responses in electronic polarization. Upon further analysis, it becomes apparent that this dielectric response comes from the planar charge redistribution associated with spin-orbit coupling (SOC) effect. The underlying mechanism lies in the fact that the external electric field tends to modify the internal pseudo-spin texture \sigma, subsequently generating an out-of-plane (pseudo-) spin current j_s \propto \sigma \times B_R as response to an in-plane pseudomagnetic field B_R through Rashba SOC. It is found that the generated j_s is opposite for the two distinct stackings, resulting in opposite in-plane electric susceptibility. As a consequence, through magnetoelectric coupling within such nonmagnetic system, there give rise to opposite tendency to redistribute charge, ultimately leading to an amplified or suppressed dielectric response.
- Published
- 2024
40. Quantum valley Hall states in low-buckled counterparts of graphene bilayer
- Author
-
Shen, Yu-Hao, Zheng, Jun-Ding, Tong, Wen-Yi, Bao, Zhi-Qiang, Wan, Xian-Gang, and Duan, Chun-Gang
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science - Abstract
With low-buckled structure for each layer in graphene bilayer system, there breaks inversion symmetry (P-symmetry) for one stacking when both A and B sublattices in top layer are aligned with those in bottom layer. In consideration of spin-orbit coupling (SOC), there opens nontrivial topological gap in each monolayer system to achieve quantum spin Hall effect (QSHE). As long as time-reversal symmetry (T-symmetry) is preserved the gapless edge states is robust in each individual layer even for the bilayer absent of PT symmetry. Based on this platform and through tight-binding (TB) model calculations we find it becomes a typical system that can exhibit quantum valley Hall effect (QVHE) when introduced a layer-resolved Rashba SOC that leads to band inversion at each K valley in the hexagonal Brillion zone (BZ). The topological transition comes from that the valley Chern number Cv = CK - CK' switches from 0 to 2, which characterizes the nontrivial QVHE phase transited from two coupled Z2 topological insulators. We also point that the layer-resolved Rashba SOC can be introduced equivalently by twisting two van der Waals touched layers. And through TB calculations, it is shown that the K bands inverts in its corresponding mini BZ when the two layers twisted by a small angle. Our findings advance potential applications for the devices design in topological valleytronics and twistronics.
- Published
- 2024
41. Event-based Asynchronous HDR Imaging by Temporal Incident Light Modulation
- Author
-
Wu, Yuliang, Tan, Ganchao, Chen, Jinze, Zhai, Wei, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Dynamic Range (DR) is a pivotal characteristic of imaging systems. Current frame-based cameras struggle to achieve high dynamic range imaging due to the conflict between globally uniform exposure and spatially variant scene illumination. In this paper, we propose AsynHDR, a Pixel-Asynchronous HDR imaging system, based on key insights into the challenges in HDR imaging and the unique event-generating mechanism of Dynamic Vision Sensors (DVS). Our proposed AsynHDR system integrates the DVS with a set of LCD panels. The LCD panels modulate the irradiance incident upon the DVS by altering their transparency, thereby triggering the pixel-independent event streams. The HDR image is subsequently decoded from the event streams through our temporal-weighted algorithm. Experiments under standard test platform and several challenging scenes have verified the feasibility of the system in HDR imaging task.
- Published
- 2024
42. Digitization of Astronomical Photographic Plate of China and Astrometric Measurement of Single-exposure Plates
- Author
-
Shang, Zheng-Jun, Yu, Yong, Wang, Liang-Liang, Yang, Mei-Ting, Yang, Jing, Shen, Shi-Yin, Liu, Min, Xu, Quan-Feng, Cui, Chen-Zhou, Fan, Dong-Wei, Tang, Zheng-Hong, and Zhao, Jian-Hai
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Astrophysics - Astrophysics of Galaxies - Abstract
From the mid-19th century to the end of the 20th century, photographic plates served as the primary detectors for astronomical observations. Astronomical photographic observations in China began in 1901, and over a century, a total of approximately 30,000 astronomical photographic plates have been captured. These historical plates play an irreplaceable role in conducting long-term, time-domain astronomical research. To preserve and explore these valuable original astronomical observational data, Shanghai Astronomical Observatory has organized the transportation of plates taken at night from various stations across the country to the Sheshan Plate Archive for centralized preservation. For the first time, plate information statistics was performed. On this basis, the plates were cleaned and digitally scanned, and finally digitized images were acquired for 29,314 plates. In this study, using Gaia DR2 as the reference star catalog, astrometric processing has been carried out successfully on 15,696 single-exposure plates, including object extraction, stellar identification, and plate model computation. As a result, for long focal length telescopes, such as the 40cm double-tube refractor telescope and the 1.56m reflector telescope at the Shanghai Astronomical Observatory and the 1m reflector telescope at the Yunnan Astronomical Observatory, the astrometric accuracy obtained for their plates is approximately 0.1" to 0.3". The distribution of astrometric accuracy for medium and short focal length telescopes ranges from 0.3" to 1.0". The relevant data of this batch of plates, including digitized images and stellar catalog of the plates are archived and released by the National Astronomical Data Center. Users can access and download plate data based on keywords such as station, telescope, observation year, and observed celestial coordinates., Comment: Accepted for Research in Astronomy and Astrophysics, 17 pages, 14 figures, 6 tables. Database, https://nadc.china-vo.org/res/r100742/
- Published
- 2024
43. SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
- Author
-
Liu, Hongjian, Xie, Qingsong, Deng, Zhijie, Chen, Chen, Tang, Shixiang, Fu, Fueyang, Zha, Zheng-jun, and Lu, Haonan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The iterative sampling procedure employed by diffusion models (DMs) often leads to significant inference latency. To address this, we propose Stochastic Consistency Distillation (SCott) to enable accelerated text-to-image generation, where high-quality generations can be achieved with just 1-2 sampling steps, and further improvements can be obtained by adding additional steps. In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pretrained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher. SCott is augmented with elaborate strategies to control the noise strength and sampling process of the SDE solver. An adversarial loss is further incorporated to strengthen the sample quality with rare sampling steps. Empirically, on the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID (Frechet Inceptio Distance) of 22.1, surpassing that (23.4) of the 1-step InstaFlow (Liu et al., 2023) and matching that of 4-step UFOGen (Xue et al., 2023b). Moreover, SCott can yield more diverse samples than other consistency models for high-resolution image generation (Luo et al., 2023a), with up to 16% improvement in a qualified metric. The code and checkpoints are coming soon., Comment: 22 pages, 16 figures
- Published
- 2024
44. Premix–spray biomineralization method for anti-disintegration improvement of granite residual soil
- Author
-
Lai, Han-Jiang, Ding, Xing-Zhi, Cui, Ming-Juan, Zhou, Yan-Jun, Zheng, Jun-Jie, and Chen, Zhi-Bo
- Published
- 2024
- Full Text
- View/download PDF
45. Design and dispensing volume estimation of a single piezo-driven dispenser with a rigid–flexible combined transmission mechanism
- Author
-
Wu, Min, Zheng, Jun-Jie, Zhao, Run-Mao, Chen, Jian-Neng, Wang, Qi-Cheng, Wei, Yi-Kun, and Pan, Shao-Fei
- Published
- 2024
- Full Text
- View/download PDF
46. Free strain consolidation of soft ground improved by stone columns under time-dependent loading considering smear effects
- Author
-
Liu, Yang, Wu, Peichen, Yin, Jian-Hua, and Zheng, Jun-Jie
- Published
- 2024
- Full Text
- View/download PDF
47. Towards Generalized UAV Object Detection: A Novel Perspective from Frequency Domain Disentanglement
- Author
-
Wang, Kunyu, Fu, Xueyang, Ge, Chengjie, Cao, Chengzhi, and Zha, Zheng-Jun
- Published
- 2024
- Full Text
- View/download PDF
48. Exert Diversity and Mitigate Bias: Domain Generalizable Person Re-identification with a Comprehensive Benchmark
- Author
-
Hu, Bingyu, Liu, Jiawei, Zheng, Yufei, Zheng, Kecheng, and Zha, Zheng-Jun
- Published
- 2024
- Full Text
- View/download PDF
49. Ethanol Extract of Anacyclus pyrethrum Root Ameliorates Cough-Variant Asthma Through the TLR4/NF-κB Pathway and Wnt/β-Catenin Pathway
- Author
-
Zheng, Jun, Yang, Hao, Liu, Changjiang, Zhang, Rui, Yibulayimu, Nadire, and Jin, Xiaoyue
- Published
- 2024
- Full Text
- View/download PDF
50. Berberine Inhibits Ferroptosis and Stabilizes Atherosclerotic Plaque through NRF2/SLC7A11/GPX4 Pathway
- Author
-
Wang, Ting-ting, Yu, Li-li, Zheng, Jun-meng, Han, Xin-yi, Jin, Bo-yuan, Hua, Cheng-jun, Chen, Yu-shan, Shang, Sha-sha, Liang, Ya-zhou, and Wang, Jian-ru
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.