54,827 results on '"WANG, QIAN"'
Search Results
2. ZIF-90 treats fungal keratitis by promoting macrophage apoptosis and inhibiting inflammatory response
- Author
-
Fu, Xueyun, Lin, Jing, Wang, Qian, Zhang, Lina, Wang, Ziyi, Chi, Menghui, Li, Daohao, Zhao, Guiqiu, and Li, Cui
- Subjects
Quantitative Biology - Subcellular Processes - Abstract
Fungal keratitis is a severe vision-threatening corneal infection with a prognosis influenced by fungal virulence and the host's immune defense mechanisms. The immune system, through its regulation of the inflammatory response, ensures cells and tissues can effectively activate defense mechanisms in response to infection and injury. However, there is still a lack of effective drugs that attenuate fungal virulence while relieving the inflammatory response caused by fungal keratitis. Therefore, finding effective treatments to solve these problems is particularly important. We synthesized ZIF-90 by water-based synthesis and characterized by SEM, XRD etc. In vitro experiments included CCK-8 and ELISA. These evaluations verified the disruptive effects of ZIF-90 on Aspergillus. fumigatus spore adhesion, morphology, cell membrane, and the effect of ZIF-90 on apoptosis. In addition, to investigate whether the metal-ligand zinc and the organic ligand imidazole act as essential factors in ZIF-90, we investigated the in vitro antimicrobial and anti-inflammatory effects of ZIF-8, ZIF-67, and MOF-74 (Zn) by MIC and ELISA experiments. ZIF-90 has therapeutic effects on fungal keratitis, which could break the protective organelles of Aspergillus. fumigatus, such as the cell wall. In addition, ZIF-90 can avoid excessive inflammatory response by promoting apoptosis of inflammatory cells. The results demonstrated that both zinc ions and imidazole possessed antimicrobial and anti-inflammatory effects. In addition, ZIF-90 exhibited better biocompatibility compared to ZIF-8, ZIF-67, and MOF-74 (Zn). ZIF-90 has anti-inflammatory and antifungal effects and preferable biocompatibility, and has great potential for the treatment of fungal keratitis.
- Published
- 2024
3. Age of Information-Oriented Probabilistic Link Scheduling for Device-to-Device Networks
- Author
-
Wang, Lixin, Wang, Qian, Chen, He, and Zhou, Shidong
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
This paper focuses on optimizing the long-term average age of information (AoI) in device-to-device (D2D) networks through age-aware link scheduling. The problem is naturally formulated as a Markov decision process (MDP). However, finding the optimal policy for the formulated MDP in its original form is challenging due to the intertwined AoI dynamics of all D2D links. To address this, we propose an age-aware stationary randomized policy that determines the probability of scheduling each link in each time slot based on the AoI of all links and the statistical channel state information among all transceivers. By employing the Lyapunov optimization framework, our policy aims to minimize the Lyapunov drift in every time slot. Nonetheless, this per-slot minimization problem is nonconvex due to cross-link interference in D2D networks, posing significant challenges for real-time decision-making. After analyzing the permutation equivariance property of the optimal solutions to the per-slot problem, we apply a message passing neural network (MPNN), a type of graph neural network that also exhibits permutation equivariance, to optimize the per-slot problem in an unsupervised learning manner. Simulation results demonstrate the superior performance of the proposed age-aware stationary randomized policy over baselines and validate the scalability of our method., Comment: 8 pages, 7 figures, accepted by IEEE WiOpt24
- Published
- 2024
4. Lead-free Hybrid Perovskite: An Efficient Room Temperature Spin Generator via Large Interfacial Rashba effect
- Author
-
Han, Lei, Wang, Qian, Lu, Ying, Tao, Sheng, Zhu, Wenxuan, Feng, Xiaoyu, Liang, Shixuan, Bai, Hua, Chen, Chong, Wang, Kai, Yang, Zhou, Fan, Xiaolong, Song, Cheng, and Pan, Feng
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Applied Physics ,Physics - Chemical Physics - Abstract
Two-dimensional (2D) hybrid organic-inorganic perovskite (HOIP) demonstates great potential for developing flexible and wearable spintronic devices, by serving as spin sources via the bulk Rashba effect (BRE). However, the practical application of BRE in 2D HOIP faces huge challenges, particularly due to the toxicity of lead, which is crucial for achieving large spin-orbit coupling, and the restrictions in 2D HOIP candidates to meet specific symmetry-breaking requirements. To overcome these obstacles, we design a strategy to exploit the interfacial Rashba effect (IRE) of lead-free 2D HOIP (C6H5CH2CH2NH3)2CuCl4 (PEA-CuCl), manifesting as an efficient spin generator at room temperature. IRE of PEA-CuCl originates from the large orbital hybridization at the interface between PEA-CuCl and adjacent ferromagnetic layers. Spin-torque ferromagnetic resonance measurements further quantify a large Rashba effective field of 14.04 Oe per 10^11 A m-2, surpassing those of lead-based HOIP and traditional all-inorganic heterojunctions with noble metals. Our lead-free 2D HOIP PEA-CuCl, which harnesses large IRE for spin generation, is efficient, nontoxic, and economic, offering huge promise for future flexible and wearable spintronic devices.
- Published
- 2024
- Full Text
- View/download PDF
5. Enhancing LLM Trading Performance with Fact-Subjectivity Aware Reasoning
- Author
-
Wang, Qian, Gao, Yuchen, Tang, Zhenheng, Luo, Bingqiao, and He, Bingsheng
- Subjects
Computer Science - Multiagent Systems - Abstract
While many studies prove more advanced LLMs perform better on tasks such as math and coding, we notice that in cryptocurrency trading, stronger LLMs work worse than weaker LLMs often. To study how this counter-intuitive phenomenon occurs, we examine the LLM reasoning processes on making trading decisions. We find that separating the reasoning process into factual and subjective components can lead to higher profits. Building on this insight, we introduce a multi-agent framework, FS-ReasoningAgent, which enables LLMs to recognize and learn from both factual and subjective reasoning. Extensive experiments demonstrate that this framework enhances LLM trading performance in cryptocurrency markets. Additionally, an ablation study reveals that relying on subjective news tends to generate higher returns in bull markets, whereas focusing on factual information yields better results in bear markets. Our code and data are available at \url{https://anonymous.4open.science/r/FS-ReasoningAgent-B55F/}.
- Published
- 2024
6. REHRSeg: Unleashing the Power of Self-Supervised Super-Resolution for Resource-Efficient 3D MRI Segmentation
- Author
-
Song, Zhiyun, Zhao, Yinjie, Li, Xiaomin, Fei, Manman, Zhao, Xiangyu, Liu, Mengjun, Chen, Cunjian, Yeh, Chung-Hsing, Wang, Qian, Zheng, Guoyan, Ai, Songtao, and Zhang, Lichi
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
High-resolution (HR) 3D magnetic resonance imaging (MRI) can provide detailed anatomical structural information, enabling precise segmentation of regions of interest for various medical image analysis tasks. Due to the high demands of acquisition device, collection of HR images with their annotations is always impractical in clinical scenarios. Consequently, segmentation results based on low-resolution (LR) images with large slice thickness are often unsatisfactory for subsequent tasks. In this paper, we propose a novel Resource-Efficient High-Resolution Segmentation framework (REHRSeg) to address the above-mentioned challenges in real-world applications, which can achieve HR segmentation while only employing the LR images as input. REHRSeg is designed to leverage self-supervised super-resolution (self-SR) to provide pseudo supervision, therefore the relatively easier-to-acquire LR annotated images generated by 2D scanning protocols can be directly used for model training. The main contribution to ensure the effectiveness in self-SR for enhancing segmentation is three-fold: (1) We mitigate the data scarcity problem in the medical field by using pseudo-data for training the segmentation model. (2) We design an uncertainty-aware super-resolution (UASR) head in self-SR to raise the awareness of segmentation uncertainty as commonly appeared on the ROI boundaries. (3) We align the spatial features for self-SR and segmentation through structural knowledge distillation to enable a better capture of region correlations. Experimental results demonstrate that REHRSeg achieves high-quality HR segmentation without intensive supervision, while also significantly improving the baseline performance for LR segmentation.
- Published
- 2024
7. Denoising Variational Autoencoder as a Feature Reduction Pipeline for the diagnosis of Autism based on Resting-state fMRI
- Author
-
Zheng, Xinyuan, Ravid, Orren, Barry, Robert A. J., Kim, Yoojean, Wang, Qian, Kim, Young-geun, Zhu, Xi, and He, Xiaofu
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Machine Learning ,Statistics - Applications ,J.3 ,I.4.9 ,I.4.10 - Abstract
Autism spectrum disorders (ASDs) are developmental conditions characterized by restricted interests and difficulties in communication. The complexity of ASD has resulted in a deficiency of objective diagnostic biomarkers. Deep learning methods have gained recognition for addressing these challenges in neuroimaging analysis, but finding and interpreting such diagnostic biomarkers are still challenging computationally. We propose an ASD feature reduction pipeline using resting-state fMRI (rs-fMRI). We used Ncuts parcellations and Power atlas to extract functional connectivity data, resulting in over 30 thousand features. Then the pipeline further compresses the connectivities into 5 latent Gaussian distributions, providing is a low-dimensional representation of the data, using a denoising variational autoencoder (DVAE). To test the method, we employed the extracted latent features from the DVAE to classify ASD using traditional classifiers such as support vector machine (SVM) on a large multi-site dataset. The 95% confidence interval for the prediction accuracy of the SVM is [0.63, 0.76] after site harmonization using the extracted latent distributions. Without using DVAE, the prediction accuracy is 0.70, which falls within the interval. This implies that the model successfully encodes the diagnostic information in rs-fMRI data to 5 Gaussian distributions (10 features) without sacrificing prediction performance. The runtime for training the DVAE and obtaining classification results from its extracted latent features (37 minutes) was 7 times shorter compared to training classifiers directly on the raw connectivity matrices (5-6 hours). Our findings also suggest that the Power atlas provides more effective brain connectivity insights for diagnosing ASD than Ncuts parcellations. The encoded features can be used for the help of diagnosis and interpretation of the disease.
- Published
- 2024
8. Zeolitic Imidazolate Framework-8 offers an anti-inflammatory and antifungal method in the treatment of Aspergillus fungus keratitis in vitro and in vivo
- Author
-
Fu, Xueyun, Tian, Xue, Lin, Jing, Wang, Qian, Gu, Lingwen, Wang, Ziyi, Chi, Menghui, Yu, Bing, Feng, Zhuhui, Liu, Wenyao, Zhang, Lina, Li, Cui, and Zhao, Guiqiu
- Subjects
Quantitative Biology - Tissues and Organs - Abstract
Background: Fungal keratitis is a serious blinding eye disease. Traditional drugs used to treat fungal keratitis commonly have the disadvantages of low bioavailability, poor dispersion, and limited permeability. Purpose: To develop a new method for the treatment of fungal keratitis with improved bioavailability, dispersion, and permeability. Purpose: To develop a new method for the treatment of fungal keratitis with improved bioavailability, dispersion, and permeability. Methods: Zeolitic Imidazolate Framework-8 (ZIF-8) was formed by zinc ions and 2-methylimidazole linked by coordination bonds and characterized by Scanning electron microscopy (SEM), X-ray diffraction (XRD), and Zeta potential. The safety of ZIF-8 on HCECs and RAW 264.7 cells was detected by Cell Counting Kit-8 (CCK-8). The anti-inflammatory effects of ZIF-8 on RAW 246.7 cells were evaluated by Quantitative Real-Time PCR Experiments (qPCR) and Enzyme-linked immunosorbent assay (ELISA). Clinical score, Colony-Forming Units (CFU). In vivo, treatment with ZIF-8 reduced corneal fungal load and mitigated neutrophil infiltration in fungal keratitis, which effectively reduced the severity of keratitis in mice and alleviated the infiltration of inflammatory factors in the mouse cornea. In addition, ZIF-8 reduces the inflammatory response by downregulating the expression of pro-inflammatory cytokines TNF-{\alpha}, IL-6, and IL-1\b{eta} after Aspergillus fumigatus infection in vivo and in vitro. Conclusion: ZIF-8 has a significant anti-inflammatory and antifungal effect, which provides a new solution for the treatment of fungal keratitis., Comment: 25 pages, 8 figures, this paper has been received by international journal of nanomedicine
- Published
- 2024
9. Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation
- Author
-
Wang, Yulin, Xiong, Honglin, Sun, Kaicong, Bai, Shuwei, Dai, Ling, Ding, Zhongxiang, Liu, Jiameng, Wang, Qian, Liu, Qian, and Shen, Dinggang
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Multimodal brain magnetic resonance (MR) imaging is indispensable in neuroscience and neurology. However, due to the accessibility of MRI scanners and their lengthy acquisition time, multimodal MR images are not commonly available. Current MR image synthesis approaches are typically trained on independent datasets for specific tasks, leading to suboptimal performance when applied to novel datasets and tasks. Here, we present TUMSyn, a Text-guided Universal MR image Synthesis generalist model, which can flexibly generate brain MR images with demanded imaging metadata from routinely acquired scans guided by text prompts. To ensure TUMSyn's image synthesis precision, versatility, and generalizability, we first construct a brain MR database comprising 31,407 3D images with 7 MRI modalities from 13 centers. We then pre-train an MRI-specific text encoder using contrastive learning to effectively control MR image synthesis based on text prompts. Extensive experiments on diverse datasets and physician assessments indicate that TUMSyn can generate clinically meaningful MR images with specified imaging metadata in supervised and zero-shot scenarios. Therefore, TUMSyn can be utilized along with acquired MR scan(s) to facilitate large-scale MRI-based screening and diagnosis of brain diseases., Comment: 23 pages, 9 figures
- Published
- 2024
10. Advancing Video Quality Assessment for AIGC
- Author
-
Yue, Xinli, Sun, Jianhui, Kong, Han, Yao, Liangchao, Wang, Tianyi, Li, Lei, Rao, Fengyun, Lv, Jing, Xia, Fan, Deng, Yuetang, Wang, Qian, and Zhao, Lingchen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In recent years, AI generative models have made remarkable progress across various domains, including text generation, image generation, and video generation. However, assessing the quality of text-to-video generation is still in its infancy, and existing evaluation frameworks fall short when compared to those for natural videos. Current video quality assessment (VQA) methods primarily focus on evaluating the overall quality of natural videos and fail to adequately account for the substantial quality discrepancies between frames in generated videos. To address this issue, we propose a novel loss function that combines mean absolute error with cross-entropy loss to mitigate inter-frame quality inconsistencies. Additionally, we introduce the innovative S2CNet technique to retain critical content, while leveraging adversarial training to enhance the model's generalization capabilities. Experimental results demonstrate that our method outperforms existing VQA techniques on the AIGC Video dataset, surpassing the previous state-of-the-art by 3.1% in terms of PLCC., Comment: 5 pages, 1 figure
- Published
- 2024
11. Effective and Evasive Fuzz Testing-Driven Jailbreaking Attacks against LLMs
- Author
-
Gong, Xueluan, Li, Mingzhe, Zhang, Yilin, Ran, Fengyuan, Chen, Chen, Chen, Yanjiao, Wang, Qian, and Lam, Kwok-Yan
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence - Abstract
Large Language Models (LLMs) have excelled in various tasks but are still vulnerable to jailbreaking attacks, where attackers create jailbreak prompts to mislead the model to produce harmful or offensive content. Current jailbreak methods either rely heavily on manually crafted templates, which pose challenges in scalability and adaptability, or struggle to generate semantically coherent prompts, making them easy to detect. Additionally, most existing approaches involve lengthy prompts, leading to higher query costs.In this paper, to remedy these challenges, we introduce a novel jailbreaking attack framework, which is an automated, black-box jailbreaking attack framework that adapts the black-box fuzz testing approach with a series of customized designs. Instead of relying on manually crafted templates, our method starts with an empty seed pool, removing the need to search for any related jailbreaking templates. We also develop three novel question-dependent mutation strategies using an LLM helper to generate prompts that maintain semantic coherence while significantly reducing their length. Additionally, we implement a two-level judge module to accurately detect genuine successful jailbreaks. We evaluated our method on 7 representative LLMs and compared it with 5 state-of-the-art jailbreaking attack strategies. For proprietary LLM APIs, such as GPT-3.5 turbo, GPT-4, and Gemini-Pro, our method achieves attack success rates of over 90%,80% and 74%, respectively, exceeding existing baselines by more than 60%. Additionally, our method can maintain high semantic coherence while significantly reducing the length of jailbreak prompts. When targeting GPT-4, our method can achieve over 78% attack success rate even with 100 tokens. Moreover, our method demonstrates transferability and is robust to state-of-the-art defenses. We will open-source our codes upon publication.
- Published
- 2024
12. Revisiting Video Quality Assessment from the Perspective of Generalization
- Author
-
Yue, Xinli, Sun, Jianhui, Yao, Liangchao, Xia, Fan, Deng, Yuetang, Wang, Tianyi, Li, Lei, Rao, Fengyun, Lv, Jing, Wang, Qian, and Zhao, Lingchen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The increasing popularity of short video platforms such as YouTube Shorts, TikTok, and Kwai has led to a surge in User-Generated Content (UGC), which presents significant challenges for the generalization performance of Video Quality Assessment (VQA) tasks. These challenges not only affect performance on test sets but also impact the ability to generalize across different datasets. While prior research has primarily focused on enhancing feature extractors, sampling methods, and network branches, it has largely overlooked the generalization capabilities of VQA tasks. In this work, we reevaluate the VQA task from a generalization standpoint. We begin by analyzing the weight loss landscape of VQA models, identifying a strong correlation between this landscape and the generalization gaps. We then investigate various techniques to regularize the weight loss landscape. Our results reveal that adversarial weight perturbations can effectively smooth this landscape, significantly improving the generalization performance, with cross-dataset generalization and fine-tuning performance enhanced by up to 1.8% and 3%, respectively. Through extensive experiments across various VQA methods and datasets, we validate the effectiveness of our approach. Furthermore, by leveraging our insights, we achieve state-of-the-art performance in Image Quality Assessment (IQA) tasks. Our code is available at https://github.com/XinliYue/VQA-Generalization., Comment: 13 pages, 4 figures
- Published
- 2024
13. ID-Guard: A Universal Framework for Combating Facial Manipulation via Breaking Identification
- Author
-
Qu, Zuomin, Lu, Wei, Luo, Xiangyang, Wang, Qian, and Cao, Xiaochun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
The misuse of deep learning-based facial manipulation poses a potential threat to civil rights. To prevent this fraud at its source, proactive defense technology was proposed to disrupt the manipulation process by adding invisible adversarial perturbations into images, making the forged output unconvincing to the observer. However, their non-directional disruption of the output may result in the retention of identity information of the person in the image, leading to stigmatization of the individual. In this paper, we propose a novel universal framework for combating facial manipulation, called ID-Guard. Specifically, this framework requires only a single forward pass of an encoder-decoder network to generate a cross-model universal adversarial perturbation corresponding to a specific facial image. To ensure anonymity in manipulated facial images, a novel Identity Destruction Module (IDM) is introduced to destroy the identifiable information in forged faces targetedly. Additionally, we optimize the perturbations produced by considering the disruption towards different facial manipulations as a multi-task learning problem and design a dynamic weights strategy to improve cross-model performance. The proposed framework reports impressive results in defending against multiple widely used facial manipulations, effectively distorting the identifiable regions in the manipulated facial images. In addition, our experiments reveal the ID-Guard's ability to enable disrupted images to avoid face inpaintings and open-source image recognition systems.
- Published
- 2024
14. One-Shot Learning for Pose-Guided Person Image Synthesis in the Wild
- Author
-
Fan, Dongqi, Chen, Tao, Wang, Mingjie, Ma, Rui, Tang, Qiang, Yi, Zili, Wang, Qian, and Chang, Liang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Current Pose-Guided Person Image Synthesis (PGPIS) methods depend heavily on large amounts of labeled triplet data to train the generator in a supervised manner. However, they often falter when applied to in-the-wild samples, primarily due to the distribution gap between the training datasets and real-world test samples. While some researchers aim to enhance model generalizability through sophisticated training procedures, advanced architectures, or by creating more diverse datasets, we adopt the test-time fine-tuning paradigm to customize a pre-trained Text2Image (T2I) model. However, naively applying test-time tuning results in inconsistencies in facial identities and appearance attributes. To address this, we introduce a Visual Consistency Module (VCM), which enhances appearance consistency by combining the face, text, and image embedding. Our approach, named OnePoseTrans, requires only a single source image to generate high-quality pose transfer results, offering greater stability than state-of-the-art data-driven methods. For each test case, OnePoseTrans customizes a model in around 48 seconds with an NVIDIA V100 GPU.
- Published
- 2024
15. Finite time quantum-classical correspondence in quantum chaotic systems
- Author
-
Wang, Qian and Robnik, Marko
- Subjects
Quantum Physics - Abstract
Although the importance of the quantum-classical correspondence has been recognized in numerous studies of quantum chaos, whether it still holds for finite time dynamics remains less known. We address this question in this work by performing a detailed analysis of how the quantum chaotic measure relates to the chaoticity of the finite time classical trajectories. A good correspondence between them has been revealed in both time dependent and many-body systems. In particular, we show that the dependence of the quantum chaotic measure on the chaoticity of finite time trajectories can be well captured by a function that is independent of the system. This strongly implies the universal validity of the finite time quantum-classical correspondence. Our findings provide a deeper understanding of the quantum-classical correspondence and highlight the role of time for studying quantum ergodicity., Comment: 13pages, 6 figures
- Published
- 2024
16. Evaluating Post-Quantum Cryptography on Embedded Systems: A Performance Analysis
- Author
-
Dong, Ben and Wang, Qian
- Subjects
Computer Science - Cryptography and Security - Abstract
The National Institute of Standards and Technology (NIST) has finalized the selection of post-quantum cryptographic (PQC) algorithms for use in the era of quantum computing. Despite their integration into TLS protocol for key establishment and signature generation, there is limited study on profiling these newly standardized algorithms in resource-constrained communication systems. In this work, we integrate PQC into both TLS servers and clients built upon embedded systems. Additionally, we compare the performance overhead of PQC pairs to currently used non-PQC schemes.
- Published
- 2024
17. Binding of the three-hadron DD^{*}K system from the lattice effective field theory
- Author
-
Zhang, Zhenyu, Hu, Xin-Yue, He, Guangzhao, Liu, Jun, Shi, Jia-Ai, Lu, Bing-Nan, and Wang, Qian
- Subjects
High Energy Physics - Phenomenology - Abstract
We employ the nuclear lattice effective field theory (NLEFT), an efficient tool for nuclear ab initio calculations, to solve the asymmetric multi-hadron systems. We take the $DD^*K$ three-body system as an illustration to demonstrate the capability of the method. Here the two-body chiral interactions between $D$, $D^*$ and $K$ are regulated with a soft lattice regulator and calibrated with the binding energies of the $T_{cc}^+$, $D^{*}_{s0}(2317)$ and $D_{s1}(2460)$ molecular states. We then calculate the three-body binding energy using the NLEFT and analyze the systematic uncertainties due to the finite volume effects, the sliding cutoff and the leading-order three-body forces. Even when the three-body interaction is repulsive (even as large as the infinite repulsive interaction), the three-body system has a bound state unambiguously with binding energy no larger than the $D_{s1}(2460)D$ threshold. To check the renormalization group invariance of our framework, we extract the first excited state. We find that when the ground state is fixed, the first excited states with various cutoffs coincide with each other when the cubic size goes larger. In addition, the standard angular momentum and parity projection technique is implemented for the quantum numbers of the ground and excited states. We find that both of them are S-wave states with quantum number $J^{P}=1^-$. Because the three-body state contains two charm quarks, it is easier to be detected in the Large Hadron Collider., Comment: 17 pages, 6 figures
- Published
- 2024
18. OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
- Author
-
Chen, Liuhan, Li, Zongjian, Lin, Bin, Zhu, Bin, Wang, Qian, Yuan, Shenghai, Zhou, Xing, Cheng, Xinhua, and Yuan, Li
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ignored in the temporal dimension. How to conduct temporal compression for videos in a VAE to obtain more concise latent representations while promising accurate reconstruction is seldom explored. To fill this gap, we propose an omni-dimension compression VAE, named OD-VAE, which can temporally and spatially compress videos. Although OD-VAE's more sufficient compression brings a great challenge to video reconstruction, it can still achieve high reconstructed accuracy by our fine design. To obtain a better trade-off between video reconstruction quality and compression speed, four variants of OD-VAE are introduced and analyzed. In addition, a novel tail initialization is designed to train OD-VAE more efficiently, and a novel inference strategy is proposed to enable OD-VAE to handle videos of arbitrary length with limited GPU memory. Comprehensive experiments on video reconstruction and LVDM-based video generation demonstrate the effectiveness and efficiency of our proposed methods., Comment: https://github.com/PKU-YuanGroup/Open-Sora-Plan
- Published
- 2024
19. The effects of negative pressure on the gene expression of motility related proteins in boar spermatozoa during liquid storage at 170c
- Author
-
Li, Yanbing, Li, Jingchun, Zhang, Qun, Wang, Qian, Guo, Minghui, Li, Qi, and Wei, Guosheng
- Published
- 2021
- Full Text
- View/download PDF
20. DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
- Author
-
Wang, Xinyu and Wang, Qian
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Speech recognition is the technology that enables machines to interpret and process human speech, converting spoken language into text or commands. This technology is essential for applications such as virtual assistants, transcription services, and communication tools. The Audio-Visual Speech Recognition (AVSR) model enhances traditional speech recognition, particularly in noisy environments, by incorporating visual modalities like lip movements and facial expressions. While traditional AVSR models trained on large-scale datasets with numerous parameters can achieve remarkable accuracy, often surpassing human performance, they also come with high training costs and deployment challenges. To address these issues, we introduce an efficient AVSR model that reduces the number of parameters through the integration of a Dual Conformer Interaction Module (DCIM). In addition, we propose a pre-training method that further optimizes model performance by selectively updating parameters, leading to significant improvements in efficiency. Unlike conventional models that require the system to independently learn the hierarchical relationship between audio and visual modalities, our approach incorporates this distinction directly into the model architecture. This design enhances both efficiency and performance, resulting in a more practical and effective solution for AVSR tasks., Comment: Submitted to ICASSP 2025
- Published
- 2024
21. Towards reliable respiratory disease diagnosis based on cough sounds and vision transformers
- Author
-
Wang, Qian, Bu, Zhaoyang, Mao, Jiaxuan, Zhu, Wenyu, Zhao, Jingya, Du, Wei, Shi, Guochao, Zhou, Min, Chen, Si, and Qu, Jieming
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recent advancements in deep learning techniques have sparked performance boosts in various real-world applications including disease diagnosis based on multi-modal medical data. Cough sound data-based respiratory disease (e.g., COVID-19 and Chronic Obstructive Pulmonary Disease) diagnosis has also attracted much attention. However, existing works usually utilise traditional machine learning or deep models of moderate scales. On the other hand, the developed approaches are trained and evaluated on small-scale data due to the difficulty of curating and annotating clinical data on scale. To address these issues in prior works, we create a unified framework to evaluate various deep models from lightweight Convolutional Neural Networks (e.g., ResNet18) to modern vision transformers and compare their performance in respiratory disease classification. Based on the observations from such an extensive empirical study, we propose a novel approach to cough-based disease classification based on both self-supervised and supervised learning on a large-scale cough data set. Experimental results demonstrate our proposed approach outperforms prior arts consistently on two benchmark datasets for COVID-19 diagnosis and a proprietary dataset for COPD/non-COPD classification with an AUROC of 92.5%.
- Published
- 2024
22. The pole structures of the $X(1840)/X(1835)$ and the $X(1880)$
- Author
-
Niu, Peng-Yu, Zhang, Zhen-Yu, Li, Yi-Yao, Wang, Qian, and Zhao, Qiang
- Subjects
High Energy Physics - Phenomenology - Abstract
Whether the $N\bar{N}$ interaction could form a state or not is a long standing question, even before the observation of the $p\bar{p}$ threshold enhancement in 2003. The recent high statistic measurement in the $J/\psi \to \gamma 3(\pi^+\pi^-)$ channel would provide a good opportunity to probe the nature of the peak structures around the $p\bar{p}$ threshold in various processes. By constructing the $N\bar{N}$ interaction respecting chiral symmetry, we extract the pole positions by fitting the $p\bar{p}$ and $3(\pi^+\pi^-)$ invariant mass distributions of the $J/\psi \to \gamma p \bar p$ and $J/\psi \to \gamma 3(\pi^+\pi^-)$ processes. The threshold enhancement in the $p\bar{p}$ invariant mass distribution is from the pole on the third Riemann sheet, which more couples to the isospin triplet channel. The broader structure in the $3(\pi^+\pi^-)$ invariant mass comes from the pole on the physical Riemann sheet, which more couples to the isospin singlet channel. Furthermore, the large compositeness indicates that there should exit $p\bar{p}$ resonance based on the current experimental data. In addition, we also see a clear threshold enhancement in the $n\bar{n}$ channel, but not as significant as that in $p\bar{p}$ channel, which is useful and compared with further experimental measurement., Comment: 16 pages, 8 figures
- Published
- 2024
23. MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems
- Author
-
Wang, Qian, Wang, Tianyu, Li, Qinbin, Liang, Jingsheng, and He, Bingsheng
- Subjects
Computer Science - Multiagent Systems - Abstract
With the emergence of large language models (LLMs), LLM-powered multi-agent systems (LLM-MA systems) have been proposed to tackle real-world tasks. However, their agents mostly follow predefined Standard Operating Procedures (SOPs) that remain unchanged across the whole interaction, lacking autonomy and scalability. Additionally, current solutions often overlook the necessity for effective agent cooperation. To address the above limitations, we propose MegaAgent, a practical framework designed for autonomous cooperation in large-scale LLM Agent systems. MegaAgent leverages the autonomy of agents to dynamically generate agents based on task requirements, incorporating features such as automatically dividing tasks, systematic planning and monitoring of agent activities, and managing concurrent operations. In addition, MegaAgent is designed with a hierarchical structure and employs system-level parallelism to enhance performance and boost communication. We demonstrate the effectiveness of MegaAgent through Gobang game development, showing that it outperforms popular LLM-MA systems; and national policy simulation, demonstrating its high autonomy and potential to rapidly scale up to 590 agents while ensuring effective cooperation among them. Our results indicate that MegaAgent is the first autonomous large-scale LLM-MA system with no pre-defined SOPs, high effectiveness and scalability, paving the way for further research in this field. Our code is at https://anonymous.4open.science/r/MegaAgent-81F3.
- Published
- 2024
24. Barbie: Text to Barbie-Style 3D Avatars
- Author
-
Sun, Xiaokun, Zhang, Zhenyu, Tai, Ying, Wang, Qian, Tang, Hao, Yi, Zili, and Yang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advances in text-guided 3D avatar generation have made substantial progress by distilling knowledge from diffusion models. Despite the plausible generated appearance, existing methods cannot achieve fine-grained disentanglement or high-fidelity modeling between inner body and outfit. In this paper, we propose Barbie, a novel framework for generating 3D avatars that can be dressed in diverse and high-quality Barbie-like garments and accessories. Instead of relying on a holistic model, Barbie achieves fine-grained disentanglement on avatars by semantic-aligned separated models for human body and outfits. These disentangled 3D representations are then optimized by different expert models to guarantee the domain-specific fidelity. To balance geometry diversity and reasonableness, we propose a series of losses for template-preserving and human-prior evolving. The final avatar is enhanced by unified texture refinement for superior texture consistency. Extensive experiments demonstrate that Barbie outperforms existing methods in both dressed human and outfit generation, supporting flexible apparel combination and animation. The code will be released for research purposes. Our project page is: https://xiaokunsun.github.io/Barbie.github.io/., Comment: 9 pages, 7 figures, Project page: https://xiaokunsun.github.io/Barbie.github.io/
- Published
- 2024
25. IDRetracor: Towards Visual Forensics Against Malicious Face Swapping
- Author
-
Cheng, Jikang, Ai, Jiaxin, Han, Zhen, Liang, Chao, Zou, Qin, Wang, Zhongyuan, and Wang, Qian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The face swapping technique based on deepfake methods poses significant social risks to personal identity security. While numerous deepfake detection methods have been proposed as countermeasures against malicious face swapping, they can only output binary labels (Fake/Real) for distinguishing fake content without reliable and traceable evidence. To achieve visual forensics and target face attribution, we propose a novel task named face retracing, which considers retracing the original target face from the given fake one via inverse mapping. Toward this goal, we propose an IDRetracor that can retrace arbitrary original target identities from fake faces generated by multiple face swapping methods. Specifically, we first adopt a mapping resolver to perceive the possible solution space of the original target face for the inverse mappings. Then, we propose mapping-aware convolutions to retrace the original target face from the fake one. Such convolutions contain multiple kernels that can be combined under the control of the mapping resolver to tackle different face swapping mappings dynamically. Extensive experiments demonstrate that the IDRetracor exhibits promising retracing performance from both quantitative and qualitative perspectives.
- Published
- 2024
26. Valence Quark Distributions in Pions: Insights from Tsallis Entropy
- Author
-
Chen, Jingxuan, Wang, Xiaopeng, Cai, Yanbing, Chen, Xurong, and Wang, Qian
- Subjects
High Energy Physics - Phenomenology - Abstract
We investigate the valence quark distributions of pions at a low initial scale ($Q^2_0$) through the application of Tsallis entropy, a non-extensive measure adept at encapsulating long-range correlations among internal constituents. Utilizing the maximum entropy approach, we derive the valence quark distributions at elevated resolution scales via a modified DGLAP equation, which integrates GLR-MQ-ZRS corrections for the $Q^2$ evolution. Our findings indicate that the resulting $Q^2$-dependent valence quark distributions yield an optimal fit to experimental data, with an inferred parameter value of $q$ ($q = 0.91$), diverging from unity. This deviation highlights the significant role that correlations among valence quarks play in shaping our understanding of pion internal structure. Additionally, our computations of the first three moments of pion quark distributions at $ Q^2 = 4 \, \mathrm{GeV}^2$ display consistency with alternative theoretical models, thereby reinforcing the importance of incorporating valence quark correlations within this analytical framework., Comment: 11 pages, 3 figures
- Published
- 2024
27. Multifractality and excited-state quantum phase transition in ferromagnetic spin-$1$ Bose-Einstein condensates
- Author
-
Niu, Zhen-Xia and Wang, Qian
- Subjects
Condensed Matter - Quantum Gases ,Quantum Physics - Abstract
Multifractality of quantum states plays an important role for understanding numerous complex phenomena observed in different branches of physics. The multifractal properties of the eigenstates allow for charactering various phase transitions. In this work, we perform a thoroughly analysis of the impacts of an excited-state quantum phase transition (ESQPT) on the fractal behavior of both static and dynamical wavefunctions in a ferromagentic spin-$1$ Bose-Einstein condensate (BEC).By studying the features of the fractal dimensions, we show how the multifractality of eigenstates and time evolved state are affected by the presence of ESQPT. Specifically, the underlying ESQPT leads to a strong localization effect, which in turn enables us to use it as an indicator of ESQPT. We verify the ability of the fractal dimensions to probe the occurrence of ESQPT through a detailed scaling analysis. We also discuss how the ESQPT manifests itself in the fractal dimensions of the long-time averaged state. Our findings further confirm that the multifractal analysis is a powerful tool for studying of phase transitions in quantum many-body systems and also hint an potential application of ESQPTs in burgeoning field of state preparation engineering., Comment: 16 pages, 11 figures
- Published
- 2024
28. Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
- Author
-
Wu, Junyan, Lu, Wei, Luo, Xiangyang, Yang, Rui, Wang, Qian, and Cao, Xiaochun
- Subjects
Computer Science - Multimedia ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing ,68T07, 68T10 ,I.2 ,I.5 - Abstract
Recently, a novel form of audio partial forgery has posed challenges to its forensics, requiring advanced countermeasures to detect subtle forgery manipulations within long-duration audio. However, existing countermeasures still serve a classification purpose and fail to perform meaningful analysis of the start and end timestamps of partial forgery segments. To address this challenge, we introduce a novel coarse-to-fine proposal refinement framework (CFPRF) that incorporates a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization. Specifically, the FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions. The PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN. To learn robust discriminative features, we devise a difference-aware feature learning (DAFL) module guided by contrastive representation learning to enlarge the sensitive differences between different frames induced by minor manipulations. We further design a boundary-aware feature enhancement (BAFE) module to capture the contextual information of multiple transition boundaries and guide the interaction between boundary information and temporal features via a cross-attention mechanism. Extensive experiments show that our CFPRF achieves state-of-the-art performance on various datasets, including LAV-DF, ASVS2019PS, and HAD., Comment: 9pages, 3figures. This paper has been accepted for ACM MM 2024
- Published
- 2024
- Full Text
- View/download PDF
29. On global dynamics of $3$-D irrotational compressible fluids
- Author
-
Wang, Qian
- Subjects
Mathematics - Analysis of PDEs - Abstract
We consider global-in-time evolution of irrotational, isentropic, compressible Euler flow in $3$-D, for a broad class of $H^4$ classical Cauchy data without assuming symmetry, prescribed on an annulus surrounded by a constant state in the exterior. By giving a sufficient expansion condition on the initial data and using the nonlinear structure of the compressible Euler equations, we show that the decay rate of the first order transversal derivative of the normalized density is better than that of the same derivative of a free wave, provided that the perturbation arising from the tangential derivatives can be properly controlled for all $t$ by using a bootstrap argument. Building on this critical analysis, we construct global exterior solutions in $H^4$ for the broad class of data, with a rather general subclass forming rarefaction at null infinity. Our result does not require smallness on the transversal derivatives of classical data, thus applies to data with a total energy of any size.
- Published
- 2024
30. Evaluating the evolution and inter-individual variability of infant functional module development from 0 to 5 years old
- Author
-
Bian, Lingbin, Wang, Nizhuan, Li, Yuanning, Razi, Adeel, Wang, Qian, Zhang, Han, Shen, Dinggang, and Consortium, the UNC/UMN Baby Connectome Project
- Subjects
Quantitative Biology - Neurons and Cognition ,Statistics - Computation - Abstract
The segregation and integration of infant brain networks undergo tremendous changes due to the rapid development of brain function and organization. Traditional methods for estimating brain modularity usually rely on group-averaged functional connectivity (FC), often overlooking individual variability. To address this, we introduce a novel approach utilizing Bayesian modeling to analyze the dynamic development of functional modules in infants over time. This method retains inter-individual variability and, in comparison to conventional group averaging techniques, more effectively detects modules, taking into account the stationarity of module evolution. Furthermore, we explore gender differences in module development under awake and sleep conditions by assessing modular similarities. Our results show that female infants demonstrate more distinct modular structures between these two conditions, possibly implying relative quiet and restful sleep compared with male infants.
- Published
- 2024
31. A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication
- Author
-
Deng, Jingyi, Lin, Chenhao, Zhao, Zhengyu, Liu, Shuai, Wang, Qian, and Shen, Chao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep generative models have demonstrated impressive performance in various computer vision applications, including image synthesis, video generation, and medical analysis. Despite their significant advancements, these models may be used for malicious purposes, such as misinformation, deception, and copyright violation. In this paper, we provide a systematic and timely review of research efforts on defenses against AI-generated visual media, covering detection, disruption, and authentication. We review existing methods and summarize the mainstream defense-related tasks within a unified passive and proactive framework. Moreover, we survey the derivative tasks concerning the trustworthiness of defenses, such as their robustness and fairness. For each task, we formulate its general pipeline and propose a taxonomy based on methodological strategies that are uniformly applicable to the primary subtasks. Additionally, we summarize the commonly used evaluation datasets, criteria, and metrics. Finally, by analyzing the reviewed studies, we provide insights into current research challenges and suggest possible directions for future research.
- Published
- 2024
32. Momentum and kinetic energy transport in supersonic particle-laden turbulent boundary layers
- Author
-
Yu, Ming, Du, Yibin, Wang, Qian, Dong, Siwei, and Yuan, Xianxu
- Subjects
Physics - Fluid Dynamics - Abstract
In the present study, we conduct direct numerical simulations of two-way force-coupled particle-laden compressible turbulent boundary layers at the free-stream Mach number of 2.0 for the purpose of examining the effects of particles on the transport of momentum and kinetic energy. By analyzing turbulent databases with various particle Stokes numbers and mass loadings, we observe that the presence of particles suppresses turbulent fluctuations and can even laminarize flow under high mass loading conditions. This is reflected by the wider and more coherent near-wall velocity streaks, reduced Reynolds stresses, and diminished contributions to skin friction and turbulent kinetic energy production. Additionally, the particle feedback force becomes more dominant in turbulent production near the wall and at small scales as mass loadings increase, which is found to be caused by the residual velocity fluctuations from particles swept down from the outer region. Furthermore, we identify that particle dissipation, resulting from the relative velocity between the fluid and particles, accounts for less than 1% of mean kinetic energy viscous dissipation and less than 10% of turbulent kinetic energy dissipation in the case with the highest mass loading. This suggests a modest impact on the internal energy variation of the fluid if two-way heat coupling is introduced. The elevated mean temperature is found in the near-wall region and is ascribed to the influence of the particle feedback force and reduced turbulent diffusion in high mass loading cases., Comment: 31 pages, 14 figures
- Published
- 2024
33. A Reflective LLM-based Agent to Guide Zero-shot Cryptocurrency Trading
- Author
-
Li, Yuan, Luo, Bingqiao, Wang, Qian, Chen, Nuo, Liu, Xu, and He, Bingsheng
- Subjects
Quantitative Finance - Trading and Market Microstructure ,Computer Science - Social and Information Networks - Abstract
The utilization of Large Language Models (LLMs) in financial trading has primarily been concentrated within the stock market, aiding in economic and financial decisions. Yet, the unique opportunities presented by the cryptocurrency market, noted for its on-chain data's transparency and the critical influence of off-chain signals like news, remain largely untapped by LLMs. This work aims to bridge the gap by developing an LLM-based trading agent, CryptoTrade, which uniquely combines the analysis of on-chain and off-chain data. This approach leverages the transparency and immutability of on-chain data, as well as the timeliness and influence of off-chain signals, providing a comprehensive overview of the cryptocurrency market. CryptoTrade incorporates a reflective mechanism specifically engineered to refine its daily trading decisions by analyzing the outcomes of prior trading decisions. This research makes two significant contributions. Firstly, it broadens the applicability of LLMs to the domain of cryptocurrency trading. Secondly, it establishes a benchmark for cryptocurrency trading strategies. Through extensive experiments, CryptoTrade has demonstrated superior performance in maximizing returns compared to traditional trading strategies and time-series baselines across various cryptocurrencies and market conditions. Our code and data are available at \url{https://anonymous.4open.science/r/CryptoTrade-Public-92FC/}.
- Published
- 2024
34. Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems
- Author
-
Fang, Zheng, Wang, Tao, Zhao, Lingchen, Zhang, Shenyi, Li, Bowen, Ge, Yunjie, Li, Qi, Shen, Chao, and Wang, Qian
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack on ASR systems in the zero-query black-box setting. Through a comprehensive review and categorization of modern ASR technologies, we first meticulously select surrogate ASRs of diverse types to generate adversarial examples. Following this, ZQ-Attack initializes the adversarial perturbation with a scaled target command audio, rendering it relatively imperceptible while maintaining effectiveness. Subsequently, to achieve high transferability of adversarial perturbations, we propose a sequential ensemble optimization algorithm, which iteratively optimizes the adversarial perturbation on each surrogate model, leveraging collaborative information from other models. We conduct extensive experiments to evaluate ZQ-Attack. In the over-the-line setting, ZQ-Attack achieves a 100% success rate of attack (SRoA) with an average signal-to-noise ratio (SNR) of 21.91dB on 4 online speech recognition services, and attains an average SRoA of 100% and SNR of 19.67dB on 16 open-source ASRs. For commercial intelligent voice control devices, ZQ-Attack also achieves a 100% SRoA with an average SNR of 15.77dB in the over-the-air setting., Comment: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024
- Published
- 2024
35. Diffuse X-ray Explorer: a high-resolution X-ray spectroscopic sky surveyor on the China Space Station
- Author
-
Jin, Hai, Mao, Junjie, Chen, Liubiao, Chen, Naihui, Cui, Wei, Gao, Bo, Li, Jinjin, Li, Xinfeng, Liu, Jiejia, Quan, Jia, Jiang, Chunyang, Wang, Guole, Wang, Le, Wang, Qian, Wang, Sifan, Xiao, Aimin, and Zhang, Shuo
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Astrophysics - High Energy Astrophysical Phenomena - Abstract
DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan a large fraction of the sky. High-resolution X-ray spectroscopy, enabled by superconducting microcalorimeters based on the transition-edge sensor (TES) technology, will probe the physical properties (e.g., temperature, density, elemental abundances, kinematics) of the Galactic hot baryons. This will complement the high-resolution imaging data obtained with the eROSITA mission. Here we present the preliminary design of DIXE. The payload consists mainly of a detector assembly and a cryogenic cooling system. The key components of the detector assembly are a microcalorimeter array and frequency-domain multiplexing readout electronics. To provide a working temperature for the detector assembly, the cooling system consists of an adiabatic demagnetization refrigerator and a mechanical cryocooler system., Comment: 12 pages, 6 figures, the full version is published by Journal of Low Temperature Physics
- Published
- 2024
36. MaIL: Improving Imitation Learning with Mamba
- Author
-
Jia, Xiaogang, Wang, Qian, Donat, Atalay, Xing, Bowen, Li, Ge, Zhou, Hongyi, Celik, Onur, Blessing, Denis, Lioutikov, Rudolf, and Neumann, Gerhard
- Subjects
Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
This work introduces Mamba Imitation Learning (MaIL), a novel imitation learning (IL) architecture that offers a computationally efficient alternative to state-of-the-art (SoTA) Transformer policies. Transformer-based policies have achieved remarkable results due to their ability in handling human-recorded data with inherently non-Markovian behavior. However, their high performance comes with the drawback of large models that complicate effective training. While state space models (SSMs) have been known for their efficiency, they were not able to match the performance of Transformers. Mamba significantly improves the performance of SSMs and rivals against Transformers, positioning it as an appealing alternative for IL policies. MaIL leverages Mamba as a backbone and introduces a formalism that allows using Mamba in the encoder-decoder structure. This formalism makes it a versatile architecture that can be used as a standalone policy or as part of a more advanced architecture, such as a diffuser in the diffusion process. Extensive evaluations on the LIBERO IL benchmark and three real robot experiments show that MaIL: i) outperforms Transformers in all LIBERO tasks, ii) achieves good performance even with small datasets, iii) is able to effectively process multi-modal sensory inputs, iv) is more robust to input noise compared to Transformers.
- Published
- 2024
37. Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning
- Author
-
Wang, Xin, Song, Zhiyun, Zhu, Yitao, Wang, Sheng, Zhang, Lichi, Shen, Dinggang, and Wang, Qian
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
In clinical practice, 2D magnetic resonance (MR) sequences are widely adopted. While individual 2D slices can be stacked to form a 3D volume, the relatively large slice spacing can pose challenges for both image visualization and subsequent analysis tasks, which often require isotropic voxel spacing. To reduce slice spacing, deep-learning-based super-resolution techniques are widely investigated. However, most current solutions require a substantial number of paired high-resolution and low-resolution images for supervised training, which are typically unavailable in real-world scenarios. In this work, we propose a self-supervised super-resolution framework for inter-slice super-resolution of MR images. Our framework is first featured by pre-training on video dataset, as temporal correlation of videos is found beneficial for modeling the spatial relation among MR slices. Then, we use public high-quality MR dataset to fine-tune our pre-trained model, for enhancing awareness of our model to medical data. Finally, given a target dataset at hand, we utilize self-supervised fine-tuning to further ensure our model works well with user-specific super-resolution tasks. The proposed method demonstrates superior performance compared to other self-supervised methods and also holds the potential to benefit various downstream applications., Comment: ISBI 2024
- Published
- 2024
38. ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving
- Author
-
Ma, Chen, Wang, Ningfei, Zhao, Zhengyu, Wang, Qian, Chen, Qi Alfred, and Shen, Chao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent research in adversarial machine learning has focused on visual perception in Autonomous Driving (AD) and has shown that printed adversarial patches can attack object detectors. However, it is important to note that AD visual perception encompasses more than just object detection; it also includes Multiple Object Tracking (MOT). MOT enhances the robustness by compensating for object detection errors and requiring consistent object detection results across multiple frames before influencing tracking results and driving decisions. Thus, MOT makes attacks on object detection alone less effective. To attack such robust AD visual perception, a digital hijacking attack has been proposed to cause dangerous driving scenarios. However, this attack has limited effectiveness. In this paper, we introduce a novel physical-world adversarial patch attack, ControlLoc, designed to exploit hijacking vulnerabilities in entire AD visual perception. ControlLoc utilizes a two-stage process: initially identifying the optimal location for the adversarial patch, and subsequently generating the patch that can modify the perceived location and shape of objects with the optimal location. Extensive evaluations demonstrate the superior performance of ControlLoc, achieving an impressive average attack success rate of around 98.1% across various AD visual perceptions and datasets, which is four times greater effectiveness than the existing hijacking attack. The effectiveness of ControlLoc is further validated in physical-world conditions, including real vehicle tests under different conditions such as outdoor light conditions with an average attack success rate of 77.5%. AD system-level impact assessments are also included, such as vehicle collision, using industry-grade AD systems and production-grade AD simulators with an average vehicle collision rate and unnecessary emergency stop rate of 81.3%.
- Published
- 2024
39. The Dates of the Discovery of the First Peking Man Fossil Teeth
- Author
-
Wang, Qian, Sun, Li, and Ebbestad, Jan Ove R.
- Published
- 2018
- Full Text
- View/download PDF
40. Expression of dehydroshikimate dehydratase in poplar induces transcriptional and metabolic changes in the phenylpropanoid pathway
- Author
-
Turumtay, Emine Akyuz, Turumtay, Halbay, Tian, Yang, Lin, Chien-Yuan, Chai, Yen Ning, Louie, Katherine B, Chen, Yan, Lipzen, Anna, Harwood, Thomas, Kumar, Kavitha Satish, Bowen, Benjamin P, Wang, Qian, Mansfield, Shawn D, Blow, Matthew J, Petzold, Christopher J, Northen, Trent R, Mortimer, Jenny C, Scheller, Henrik V, and Eudes, Aymerick
- Subjects
Biological Sciences ,Industrial Biotechnology ,Genetics ,Biotechnology ,2.1 Biological and endogenous factors ,Populus ,Lignin ,Hydro-Lyases ,Gene Expression Regulation ,Plant ,Plants ,Genetically Modified ,Plant Proteins ,Xylem ,Aromatics ,bioenergy ,cell wall ,lignin ,metabolomics ,RNA-seq ,systems biology ,Plant Biology ,Crop and Pasture Production ,Plant Biology & Botany ,Crop and pasture production ,Biochemistry and cell biology ,Plant biology - Abstract
Modification of lignin in feedstocks via genetic engineering aims to reduce biomass recalcitrance to facilitate efficient conversion processes. These improvements can be achieved by expressing exogenous enzymes that interfere with native biosynthetic pathways responsible for the production of the lignin precursors. In planta expression of a bacterial 3-dehydroshikimate dehydratase in poplar trees reduced lignin content and altered the monomer composition, which enabled higher yields of sugars after cell wall polysaccharide hydrolysis. Understanding how plants respond to such genetic modifications at the transcriptional and metabolic levels is needed to facilitate further improvement and field deployment. In this work, we acquired fundamental knowledge on lignin-modified poplar expressing 3-dehydroshikimate dehydratase using RNA-seq and metabolomics. The data clearly demonstrate that changes in gene expression and metabolite abundance can occur in a strict spatiotemporal fashion, revealing tissue-specific responses in the xylem, phloem, or periderm. In the poplar line that exhibited the strongest reduction in lignin, we found that 3% of the transcripts had altered expression levels and ~19% of the detected metabolites had differential abundance in the xylem from older stems. The changes affected predominantly the shikimate and phenylpropanoid pathways as well as secondary cell wall metabolism, and resulted in significant accumulation of hydroxybenzoates derived from protocatechuate and salicylate.
- Published
- 2024
41. Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models
- Author
-
Wang, Qian, Eldesokey, Abdelrahman, Mendiratta, Mohit, Zhan, Fangneng, Kortylewski, Adam, Theobalt, Christian, and Wonka, Peter
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. A growing research direction attempts to employ diffusion models to perform downstream vision tasks by exploiting their deep understanding of image semantics. Yet, the majority of these approaches have focused on image-related tasks like semantic correspondence and segmentation, with less emphasis on video tasks such as VSS. Ideally, diffusion-based image semantic segmentation approaches can be applied to videos in a frame-by-frame manner. However, we find their performance on videos to be subpar due to the absence of any modeling of temporal information inherent in the video data. To this end, we tackle this problem and introduce a framework tailored for VSS based on pre-trained image and video diffusion models. We propose building a scene context model based on the diffusion features, where the model is autoregressively updated to adapt to scene changes. This context model predicts per-frame coarse segmentation maps that are temporally consistent. To refine these maps further, we propose a correspondence-based refinement strategy that aggregates predictions temporally, resulting in more confident predictions. Finally, we introduce a masked modulation approach to upsample the coarse maps to the full resolution at a high quality. Experiments show that our proposed approach outperforms existing zero-shot image semantic segmentation approaches significantly on various VSS benchmarks without any training or fine-tuning. Moreover, it rivals supervised VSS approaches on the VSPW dataset despite not being explicitly trained for VSS., Comment: Project webpage: https://qianwangx.github.io/VidSeg_diffusion/
- Published
- 2024
42. Detecting Adversarial Data via Perturbation Forgery
- Author
-
Wang, Qian, Li, Chen, Luo, Yuchen, Ling, Hefei, Li, Ping, Chen, Jiazhong, Huang, Shijuan, and Yu, Ning
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
As a defense strategy against adversarial attacks, adversarial detection aims to identify and filter out adversarial data from the data flow based on discrepancies in distribution and noise patterns between natural and adversarial data. Although previous detection methods achieve high performance in detecting gradient-based adversarial attacks, new attacks based on generative models with imbalanced and anisotropic noise patterns evade detection. Even worse, existing techniques either necessitate access to attack data before deploying a defense or incur a significant time cost for inference, rendering them impractical for defending against newly emerging attacks that are unseen by defenders. In this paper, we explore the proximity relationship between adversarial noise distributions and demonstrate the existence of an open covering for them. By learning to distinguish this open covering from the distribution of natural data, we can develop a detector with strong generalization capabilities against all types of adversarial attacks. Based on this insight, we heuristically propose Perturbation Forgery, which includes noise distribution perturbation, sparse mask generation, and pseudo-adversarial data production, to train an adversarial detector capable of detecting unseen gradient-based, generative-model-based, and physical adversarial attacks, while remaining agnostic to any specific models. Comprehensive experiments conducted on multiple general and facial datasets, with a wide spectrum of attacks, validate the strong generalization of our method.
- Published
- 2024
43. Enhancing Adversarial Transferability Through Neighborhood Conditional Sampling
- Author
-
Qiu, Chunlin, Duan, Yiheng, Zhao, Lingchen, and Wang, Qian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Transfer-based attacks craft adversarial examples utilizing a white-box surrogate model to compromise various black-box target models, posing significant threats to many real-world applications. However, existing transfer attacks suffer from either weak transferability or expensive computation. To bridge the gap, we propose a novel sample-based attack, named neighborhood conditional sampling (NCS), which enjoys high transferability with lightweight computation. Inspired by the observation that flat maxima result in better transferability, NCS is formulated as a max-min bi-level optimization problem to seek adversarial regions with high expected adversarial loss and small standard deviations. Specifically, due to the inner minimization problem being computationally intensive to resolve, and affecting the overall transferability, we propose a momentum-based previous gradient inversion approximation (PGIA) method to effectively solve the inner problem without any computation cost. In addition, we prove that two newly proposed attacks, which achieve flat maxima for better transferability, are actually specific cases of NCS under particular conditions. Extensive experiments demonstrate that NCS efficiently generates highly transferable adversarial examples, surpassing the current best method in transferability while requiring only 50% of the computational cost. Additionally, NCS can be seamlessly integrated with other methods to further enhance transferability., Comment: Under review
- Published
- 2024
44. Pre-Trained Vision-Language Models as Partial Annotators
- Author
-
Wang, Qian-Wei, Xie, Yuqiu, Zhang, Letian, Liu, Zimo, and Xia, Shu-Tao
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Pre-trained vision-language models learn massive data to model unified representations of images and natural languages, which can be widely applied to downstream machine learning tasks. In addition to zero-shot inference, in order to better adapt pre-trained models to the requirements of downstream tasks, people usually use methods such as few-shot or parameter-efficient fine-tuning and knowledge distillation. However, annotating samples is laborious, while a large number of unlabeled samples can be easily obtained. In this paper, we investigate a novel "pre-trained annotating - weakly-supervised learning" paradigm for pre-trained model application and experiment on image classification tasks. Specifically, based on CLIP, we annotate image samples with multiple prompt templates to obtain multiple candidate labels to form the noisy partial label dataset, and design a collaborative consistency regularization algorithm to solve this problem. Our method simultaneously trains two neural networks, which collaboratively purify training labels for each other and obtain pseudo-labels for self-training, while adopting prototypical similarity alignment and noisy supervised contrastive learning to optimize model representation. In experiments, our method achieves performances far beyond zero-shot inference without introducing additional label information, and outperforms other weakly supervised learning and few-shot fine-tuning methods, and obtains smaller deployed models. Our code is available at: \url{https://anonymous.4open.science/r/Co-Reg-8CF9}.
- Published
- 2024
45. Gaze-DETR: Using Expert Gaze to Reduce False Positives in Vulvovaginal Candidiasis Screening
- Author
-
Kong, Yan, Wang, Sheng, Cai, Jiangdong, Zhao, Zihao, Shen, Zhenrong, Li, Yonghao, Fei, Manman, and Wang, Qian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Accurate detection of vulvovaginal candidiasis is critical for women's health, yet its sparse distribution and visually ambiguous characteristics pose significant challenges for accurate identification by pathologists and neural networks alike. Our eye-tracking data reveals that areas garnering sustained attention - yet not marked by experts after deliberation - are often aligned with false positives of neural networks. Leveraging this finding, we introduce Gaze-DETR, a pioneering method that integrates gaze data to enhance neural network precision by diminishing false positives. Gaze-DETR incorporates a universal gaze-guided warm-up protocol applicable across various detection methods and a gaze-guided rectification strategy specifically designed for DETR-based models. Our comprehensive tests confirm that Gaze-DETR surpasses existing leading methods, showcasing remarkable improvements in detection accuracy and generalizability., Comment: MICCAI-2024 early accept. Our code is available at https://github.com/YanKong0408/Gaze-DETR
- Published
- 2024
46. Mass spectra of strange double charm pentaquarks with strangeness $S=-1$
- Author
-
Yang, Zi-Yan, Wang, Qian, and Chen, Wei
- Subjects
High Energy Physics - Phenomenology - Abstract
The observation of the $T_{c\bar{s}}(2900)$ indicates the potential existence of strange double charm pentaquarks based on the heavy antidiquark symmetry. We systematically study the mass spectra of strange double charm pentaquarks with strangeness $S=-1$ in both molecular and compact structures for quantum numbers $J^{P}=1/2^{-}$, $3/2^{-}$, $5/2^{-}$. By constructing the interpolating currents, the mass spectra can be extracted from the two-point correlation functions in the framework of QCD sum rule method. In the molecular picture, we find that the $\Xi_c^+D^{\ast +}$, $\Xi_c^{'+}D^{\ast +}$, $\Xi_{c}^{\ast +}D^{\ast +}$, $\Xi_{cc}^{\ast ++}K^{\ast 0}$ and $\Omega_{cc}^{\ast ++}\rho^0$ may form molecular strange double charm pentaquarks. In both pictures, the masses of the $J^P=1/2^-, 3/2^-$ pentaquarks locate within the $4.2-4.6~\mathrm{GeV}$ and $4.2-4.5~\mathrm{GeV}$ regions, respectively. As all of them are above the thresholds of their strong decay channels, they behave as a broad state, making them challenging to be detected in experiment. On the contrary, the mass of the $J^P=5/2^-$ strange double charm pentaquark is located at $4.3~\mathrm{GeV}$ and below its strong decay channel. This makes it as a narrow state and easy to be identified in experiment. The best observed channel is its semi-leptonic decay to double charm baryon. As the result, we strongly suggest experiments to search for $J^P=5/2^-$ strange double charm pentaquarks as a first try., Comment: 15 pages, 8 figures
- Published
- 2024
47. Preliminary Design of Detector Assembly for DIXE
- Author
-
Liu, Jiejia, Wang, Sifan, Jin, Hai, Wang, Qian, and Cui, Wei
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics - Abstract
Diffuse X-ray Explorer (DIXE) is a proposed X-ray spectroscopic survey experiment for the China Space Station. Its detector assembly (DA) contains the transition edge sensor (TES) microcalorimeter and readout electronics based on the superconducting quantum interference device (SQUID) on the cold stage. The cold stage is thermally connected to the ADR stage, and a Kevlar suspension is used to stabilize and isolate it from the 4 K environment. TES and SQUID are both sensitive to the magnetic field, so a hybrid shielding structure consisting of an outer Cryoperm shield and an inner niobium shield is used to attenuate the magnetic field. In addition, IR/optical/UV photons can produce shot noise and thus degrade the energy resolution of the TES microcalorimeter. A blocking filter assembly is designed to minimize the effects. In it, five filters are mounted at different temperature stages, reducing the probability of IR/optical/UV photons reaching the detector through multiple reflections between filters and absorption. This paper will describe the preliminary design of the detector assembly and its optimization., Comment: 13 pages, 6 figures. Submitted version, the full version is published by Journal of Low Temperature Physics
- Published
- 2024
- Full Text
- View/download PDF
48. Interlayer couplings in homobilayer structures of MSi2X4 (M = Mo/W, X = N/P/As)
- Author
-
Wang, Qian, Zhang, Na, and Yu, Hongyi
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
We investigated the interlayer coupling effect in homobilayer structures of MSi2X4 with M = Mo/W and X = N/P/As. Through the combination of first-principles calculations and analytical formulations, the equilibrium interlayer distance, layer energy difference and interlayer hopping strength are obtained for all six MSi2X4 materials, which are found to be insensitive to the type of M atom but differ significantly between X = N and X = P/As. In homobilayers with close to 0{\deg} twist angles, the interlayer charge redistribution introduces a stacking-dependent interlayer electrostatic potential with a magnitude reaching 0.1 eV in MSi2N4, suggesting that it can be an excellent candidate for studying the sliding ferroelectricity. The interlayer hopping strengths are found to be as large as several tens meV at valence band maxima positions K and {\Gamma}, and ~ 1 meV at the conduction band edge K. The resultant layer-hybridizations vary in a large range under different stacking registries, which can be used to simulate honeycomb lattice models with both trivial and non-trivial band topologies.
- Published
- 2024
- Full Text
- View/download PDF
49. MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures
- Author
-
Yang, Han, Hu, Chenxi, Zhou, Yichi, Liu, Xixian, Shi, Yu, Li, Jielan, Li, Guanzhi, Chen, Zekun, Chen, Shuizhou, Zeni, Claudio, Horton, Matthew, Pinsler, Robert, Fowler, Andrew, Zügner, Daniel, Xie, Tian, Smith, Jake, Sun, Lixin, Wang, Qian, Kong, Lingyu, Liu, Chang, Hao, Hongxia, and Lu, Ziheng
- Subjects
Condensed Matter - Materials Science - Abstract
Accurate and fast prediction of materials properties is central to the digital transformation of materials design. However, the vast design space and diverse operating conditions pose significant challenges for accurately modeling arbitrary material candidates and forecasting their properties. We present MatterSim, a deep learning model actively learned from large-scale first-principles computations, for efficient atomistic simulations at first-principles level and accurate prediction of broad material properties across the periodic table, spanning temperatures from 0 to 5000 K and pressures up to 1000 GPa. Out-of-the-box, the model serves as a machine learning force field, and shows remarkable capabilities not only in predicting ground-state material structures and energetics, but also in simulating their behavior under realistic temperatures and pressures, signifying an up to ten-fold enhancement in precision compared to the prior best-in-class. This enables MatterSim to compute materials' lattice dynamics, mechanical and thermodynamic properties, and beyond, to an accuracy comparable with first-principles methods. Specifically, MatterSim predicts Gibbs free energies for a wide range of inorganic solids with near-first-principles accuracy and achieves a 15 meV/atom resolution for temperatures up to 1000K compared with experiments. This opens an opportunity to predict experimental phase diagrams of materials at minimal computational cost. Moreover, MatterSim also serves as a platform for continuous learning and customization by integrating domain-specific data. The model can be fine-tuned for atomistic simulations at a desired level of theory or for direct structure-to-property predictions, achieving high data efficiency with a reduction in data requirements by up to 97%.
- Published
- 2024
50. Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting
- Author
-
Xie, Tianyidan, Ma, Rui, Wang, Qian, Ye, Xiaoqian, Liu, Feixuan, Tai, Ying, Zhang, Zhenyu, and Yi, Zili
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics - Abstract
Recent advancements in image inpainting, particularly through diffusion modeling, have yielded promising outcomes. However, when tested in scenarios involving the completion of images based on the foreground objects, current methods that aim to inpaint an image in an end-to-end manner encounter challenges such as "over-imagination", inconsistency between foreground and background, and limited diversity. In response, we introduce Anywhere, a pioneering multi-agent framework designed to address these issues. Anywhere utilizes a sophisticated pipeline framework comprising various agents such as Visual Language Model (VLM), Large Language Model (LLM), and image generation models. This framework consists of three principal components: the prompt generation module, the image generation module, and the outcome analyzer. The prompt generation module conducts a semantic analysis of the input foreground image, leveraging VLM to predict relevant language descriptions and LLM to recommend optimal language prompts. In the image generation module, we employ a text-guided canny-to-image generation model to create a template image based on the edge map of the foreground image and language prompts, and an image refiner to produce the outcome by blending the input foreground and the template image. The outcome analyzer employs VLM to evaluate image content rationality, aesthetic score, and foreground-background relevance, triggering prompt and image regeneration as needed. Extensive experiments demonstrate that our Anywhere framework excels in foreground-conditioned image inpainting, mitigating "over-imagination", resolving foreground-background discrepancies, and enhancing diversity. It successfully elevates foreground-conditioned image inpainting to produce more reliable and diverse results., Comment: 16 pages, 9 figures, project page: https://anywheremultiagent.github.io
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.