134,675 results on '"Fei, P."'
Search Results
2. Observation of non-Hermitian Dirac cones
- Author
-
Xie, Xinrong, Ma, Fei, Rui, W. B., Dong, Zhaozhen, Du, Yulin, Xie, Wentao, Zhao, Y. X., Chen, Hongsheng, Gao, Fei, and Xue, Haoran
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Applied Physics - Abstract
Relativistic quasiparticle excitations arising from band degeneracies in crystals not only offer exciting chances to test hypotheses in particle physics but also play crucial roles in the transport and topological properties of materials and metamaterials. Quasiparticles are commonly described by low-energy Hamiltonians that are Hermitian, while non-Hermiticity is usually considered detrimental to quasiparticle physics. In this work, we show that such an assumption of Hermiticity can be lifted to bring quasiparticles into non-Hermitian systems. We propose a concrete lattice model containing two non-Hermitian Dirac cones, with one hosting amplifying Dirac quasiparticles and the other hosting decaying ones. The lifetime contrast between the Dirac cones at the two valleys imposes an ultra-strong valley selection rule not seen in any Hermitian systems: only one valley can survive in the long time limit regardless of the excitation, lattice shape and other details. This property leads to an effective parity anomaly with a single Dirac cone and offers a simple way to generate vortex states in the massive case. The non-Hermitian feature of the bulk Dirac cones can also be generalized to the boundary, giving rise to valley kink states with valley-locked lifetimes. This makes the kink states effectively unidirectional and more resistant against inter-valley scattering. All these phenomena are experimentally demonstrated in a non-Hermitian electric circuit lattice.
- Published
- 2024
3. Positivity-preserving truncated Euler and Milstein methods for financial SDEs with super-linear coefficients
- Author
-
Deng, Shounian, Fei, Chen, Fei, Weiyin, and Mao, Xuerong
- Subjects
Mathematics - Numerical Analysis - Abstract
In this paper, we propose two variants of the positivity-preserving schemes, namely the truncated Euler-Maruyama (EM) method and the truncated Milstein scheme, applied to stochastic differential equations (SDEs) with positive solutions and super-linear coefficients. Under some regularity and integrability assumptions we derive the optimal strong convergence rates of the two schemes. Moreover, we demonstrate flexibility of our approaches by applying the truncated methods to approximate SDEs with super-linear coefficients (3/2 and Ai{\i}t-Sahalia models) directly and also with sub-linear coefficients (CIR model) indirectly. Numerical experiments are provided to verify the effectiveness of the theoretical results.
- Published
- 2024
4. GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting
- Author
-
Guo, Zixuan, Xie, Yifan, Xie, Weijing, Huang, Peng, Ma, Fei, and Yu, Fei Richard
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
Dense colored point clouds enhance visual perception and are of significant value in various robotic applications. However, existing learning-based point cloud upsampling methods are constrained by computational resources and batch processing strategies, which often require subdividing point clouds into smaller patches, leading to distortions that degrade perceptual quality. To address this challenge, we propose a novel 2D-3D hybrid colored point cloud upsampling framework (GaussianPU) based on 3D Gaussian Splatting (3DGS) for robotic perception. This approach leverages 3DGS to bridge 3D point clouds with their 2D rendered images in robot vision systems. A dual scale rendered image restoration network transforms sparse point cloud renderings into dense representations, which are then input into 3DGS along with precise robot camera poses and interpolated sparse point clouds to reconstruct dense 3D point clouds. We have made a series of enhancements to the vanilla 3DGS, enabling precise control over the number of points and significantly boosting the quality of the upsampled point cloud for robotic scene understanding. Our framework supports processing entire point clouds on a single consumer-grade GPU, such as the NVIDIA GeForce RTX 3090, eliminating the need for segmentation and thus producing high-quality, dense colored point clouds with millions of points for robot navigation and manipulation tasks. Extensive experimental results on generating million-level point cloud data validate the effectiveness of our method, substantially improving the quality of colored point clouds and demonstrating significant potential for applications involving large-scale point clouds in autonomous robotics and human-robot interaction scenarios., Comment: 7 pages, 5 figures
- Published
- 2024
5. On the Structure of Second Jacobian Ideals
- Author
-
Ye, Fei
- Subjects
Mathematics - Algebraic Geometry ,14B05, 14E15, 14J17, 32S05, 32S25 - Abstract
We show that the second Jacobian ideal of a hypersurface can be decomposed such that a power of the Jacobian ideal becomes a factor. As an application of the decomposition, we present an elementary proof establishing that the second Nash blow-up algebra of a hypersurface singularity is a contact invariant., Comment: 13 pages
- Published
- 2024
6. On Temporal Decay of Compressible Hookean Viscoelastic Fluids with Relatively Large Elasticity Coefficient
- Author
-
Fu, Shengbin, Huang, Wenting, and Jiang, Fei
- Subjects
Mathematics - Analysis of PDEs ,Mathematics - Dynamical Systems - Abstract
Recently, Jiang--Jiang (J. Differential Equations 282, 2021) showed the existence of unique strong solutions in spatial periodic domain (denoted by $\mathbb{T}^3$), whenever the elasticity coefficient is larger than the initial velocity perturbation of the rest state. Motivated by Jiang--Jiang's result, we revisit the Cauchy problem of the compressible viscoelastic fluids in Lagrangian coordinates. Employing an energy method with temporal weights and an additional asymptotic stability condition of initial density in Lagrangian coordinates, we extend the Jiang--Jiang's result with exponential decay-in-time in $\mathbb{T}^3$ to the one with algebraic decay-in-time in the whole space $\mathbb{R}^3$. Thanks to the algebraic decay of solutions established by the energy method with temporal weights, we can further use the spectral analysis to improve the temporal decay rate of solutions. In particular, we find that the $k$-th order spatial derivatives of both the density and deformation perturbations converge to zero in $L^2(\mathbb{R}^3)$ at a rate of $(1+t)^{-\frac{3}{4}-\frac{k+1}{2}}$, which is faster than the decay rate $(1 +t)^{-\frac{3}{4}-\frac{k}{2}}$ obtained by Hu--Wu (SIAM J. Math. Anal. 45, 2013) for $k=0$ and $ 1$. In addition, it's well-known that the decay rate $(1+t)^{-\frac{3}{4}-\frac{k}{2}}$ of the density perturbation is optimal in the compressible Navier--Stokes equations (A.~Matsumura, T.~Nishida, Proc. Jpn. Acad. Ser-A. 55, 1979). Therefore, our faster temporal decay rates indicate that the elasticity accelerates the decay of the density perturbation after the rest state of a compressible viscoelastic fluid being perturbed.
- Published
- 2024
7. The 1st Workshop on Human-Centered Recommender Systems
- Author
-
Zhang, Kaike, Wu, Yunfan, lyu, Yougang, Su, Du, Ge, Yingqiang, Liu, Shuchang, Cao, Qi, Ren, Zhaochun, and Sun, Fei
- Subjects
Computer Science - Information Retrieval - Abstract
Recommender systems are quintessential applications of human-computer interaction. Widely utilized in daily life, they offer significant convenience but also present numerous challenges, such as the information cocoon effect, privacy concerns, fairness issues, and more. Consequently, this workshop aims to provide a platform for researchers to explore the development of Human-Centered Recommender Systems~(HCRS). HCRS refers to the creation of recommender systems that prioritize human needs, values, and capabilities at the core of their design and operation. In this workshop, topics will include, but are not limited to, robustness, privacy, transparency, fairness, diversity, accountability, ethical considerations, and user-friendly design. We hope to engage in discussions on how to implement and enhance these properties in recommender systems. Additionally, participants will explore diverse evaluation methods, including innovative metrics that capture user satisfaction and trust. This workshop seeks to foster a collaborative environment for researchers to share insights and advance the field toward more ethical, user-centric, and socially responsible recommender systems., Comment: Workshop at TheWebConf 2025
- Published
- 2024
8. LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues
- Author
-
Lin, Yalan, Ma, Yingwei, Cao, Rongyu, Li, Binhua, Huang, Fei, Gu, Xiaodong, and Li, Yongbin
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence - Abstract
Reproducing buggy code is the first and crucially important step in issue resolving, as it aids in identifying the underlying problems and validating that generated patches resolve the problem. While numerous approaches have been proposed for this task, they primarily address common, widespread errors and struggle to adapt to unique, evolving errors specific to individual code repositories. To fill this gap, we propose EvoCoder, a multi-agent continuous learning framework for issue code reproduction. EvoCoder adopts a reflection mechanism that allows the LLM to continuously learn from previously resolved problems and dynamically refine its strategies to new emerging challenges. To prevent experience bloating, EvoCoder introduces a novel hierarchical experience pool that enables the model to adaptively update common and repo-specific experiences. Our experimental results show a 20\% improvement in issue reproduction rates over existing SOTA methods. Furthermore, integrating our reproduction mechanism significantly boosts the overall accuracy of the existing issue-resolving pipeline.
- Published
- 2024
9. Mining double-line spectroscopic candidates in the LAMOST medium-resolution spectroscopic survey using human-AI hybrid method
- Author
-
Li, Shan-shan, Li, Chun-qian, Li, Chang-hua, Fan, Dong-wei, Xu, Yun-fei, Mi, Lin-ying, Cui, Chen-zhou, and Shi, Jian-rong
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Astrophysics - Astrophysics of Galaxies ,Astrophysics - Solar and Stellar Astrophysics - Abstract
We utilize a hybrid approach that integrates the traditional cross-correlation function (CCF) and machine learning to detect spectroscopic multi-systems, specifically focusing on double-line spectroscopic binary (SB2). Based on the ninth data release (DR9) of the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST), which includes a medium-resolution survey (MRS) containing 29,920,588 spectra, we identify 27,164 double-line and 3124 triple-line spectra, corresponding to 7096 SB2 candidates and 1903 triple-line spectroscopic binary (SB3) candidates, respectively, representing about 1% of the selection dataset from LAMOST-MRS DR9. Notably, 70.1% of the SB2 candidates and 89.6% of the SB3 candidates are newly identified. Compared to using only the traditional CCF technique, our method significantly improves the efficiency of detecting SB2, saves time on visual inspections by a factor of four., Comment: 18 pages, 11 figures, accepted by ApJS, Data available via China-VO PaperData repository
- Published
- 2024
10. Surface topological quantum criticality: Conformal manifolds and Discrete Strong Coupling Fixed Points
- Author
-
Vijayan, Saran and Zhou, Fei
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Superconductivity - Abstract
In this article, we study quantum critical phenomena in surfaces of symmetry-protected topological matter, i.e. surface topological quantum criticality. A generic phase boundary of gapless surfaces in a symmetry-protected state shall be a co-dimension one manifold in an interaction parameter space of dimension $D_p$ (where $p$ refers to the parameter space) where the value of $D_p$ further depends on bulk topologies. In the context of fermionic topological insulators that we focus on, $D_p$ depends on the number of half-Dirac cones $\mathcal{N}$. We construct such manifolds explicitly for a few interaction parameter spaces with various $D_p$ values. Most importantly, we further illustrate that in cases with $D_p=3$ and $6$, there are sub-manifolds of fixed points that dictate the universalities of surface topological quantum criticality. These infrared stable manifolds are associated with emergent symmetries in the renormalization-group-equation flow naturally appearing in the loop expansion. Unlike in the usual order-disorder quantum critical phenomena, typically governed by an isolated Wilson-Fisher fixed point, we find in the one-loop approximation surface topological quantum criticalities are naturally captured by conformal manifolds where the number of marginal operators uniquely determines their co-dimensions. Isolated strong coupling fixed points also appear, usually as the endpoints in the phase boundary of surface topological quantum phases. However, their extreme infrared instabilities along multiple directions suggest that they shall be related to multi-critical surface topological quantum critical phenomena rather than generic surface topological quantum criticality. We also discuss and classify higher-loop symmetry-breaking effects, which can either distort the conformal manifolds or further break the conformal manifolds down to a few distinct fixed points., Comment: 32 pages, 11 figures, 12 tables
- Published
- 2024
11. Adjoint-based online learning of two-layer quasi-geostrophic baroclinic turbulence
- Author
-
Yan, Fei Er, Frezat, Hugo, Sommer, Julien Le, Mak, Julian, and Otness, Karl
- Subjects
Physics - Atmospheric and Oceanic Physics ,Computer Science - Machine Learning ,Physics - Fluid Dynamics - Abstract
For reasons of computational constraint, most global ocean circulation models used for Earth System Modeling still rely on parameterizations of sub-grid processes, and limitations in these parameterizations affect the modeled ocean circulation and impact on predictive skill. An increasingly popular approach is to leverage machine learning approaches for parameterizations, regressing for a map between the resolved state and missing feedbacks in a fluid system as a supervised learning task. However, the learning is often performed in an `offline' fashion, without involving the underlying fluid dynamical model during the training stage. Here, we explore the `online' approach that involves the fluid dynamical model during the training stage for the learning of baroclinic turbulence and its parameterization, with reference to ocean eddy parameterization. Two online approaches are considered: a full adjoint-based online approach, related to traditional adjoint optimization approaches that require a `differentiable' dynamical model, and an approximately online approach that approximates the adjoint calculation and does not require a differentiable dynamical model. The online approaches are found to be generally more skillful and numerically stable than offline approaches. Others details relating to online training, such as window size, machine learning model set up and designs of the loss functions are detailed to aid in further explorations of the online training methodology for Earth System Modeling., Comment: 25 pages, 1 table, 8 figures
- Published
- 2024
12. Dynamic Trajectory and Power Control in Ultra-Dense UAV Networks: A Mean-Field Reinforcement Learning Approach
- Author
-
Song, Fei, Wang, Zhe, Li, Jun, Shi, Long, Chen, Wen, and Jin, Shi
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
In ultra-dense unmanned aerial vehicle (UAV) networks, it is challenging to coordinate the resource allocation and interference management among large-scale UAVs, for providing flexible and efficient service coverage to the ground users (GUs). In this paper, we propose a learning-based resource allocation scheme in an ultra-dense UAV communication network, where the GUs' service demands are time-varying with unknown distributions. We formulate the non-cooperative game among multiple co-channel UAVs as a stochastic game, where each UAV jointly optimizes its trajectory, user association, and downlink power control to maximize the expectation of its locally cumulative energy efficiency under the interference and energy constraints. To cope with the scalability issue in a large-scale network, we further formulate the problem as a mean-field game (MFG), which simplifies the interactions among the UAVs into a two-player game between a representative UAV and a mean-field. We prove the existence and uniqueness of the equilibrium for the MFG, and propose a model-free mean-field reinforcement learning algorithm named maximum entropy mean-field deep Q network (ME-MFDQN) to solve the mean-field equilibrium in both fully and partially observable scenarios. The simulation results reveal that the proposed algorithm improves the energy efficiency compared with the benchmark algorithms. Moreover, the performance can be further enhanced if the GUs' service demands exhibit higher temporal correlation or if the UAVs have wider observation capabilities over their nearby GUs.
- Published
- 2024
13. Federated Continual Learning for Edge-AI: A Comprehensive Survey
- Author
-
Wang, Zi, Wu, Fei, Yu, Feng, Zhou, Yurui, Hu, Jia, and Min, Geyong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Networking and Internet Architecture - Abstract
Edge-AI, the convergence of edge computing and artificial intelligence (AI), has become a promising paradigm that enables the deployment of advanced AI models at the network edge, close to users. In Edge-AI, federated continual learning (FCL) has emerged as an imperative framework, which fuses knowledge from different clients while preserving data privacy and retaining knowledge from previous tasks as it learns new ones. By so doing, FCL aims to ensure stable and reliable performance of learning models in dynamic and distributed environments. In this survey, we thoroughly review the state-of-the-art research and present the first comprehensive survey of FCL for Edge-AI. We categorize FCL methods based on three task characteristics: federated class continual learning, federated domain continual learning, and federated task continual learning. For each category, an in-depth investigation and review of the representative methods are provided, covering background, challenges, problem formalisation, solutions, and limitations. Besides, existing real-world applications empowered by FCL are reviewed, indicating the current progress and potential of FCL in diverse application domains. Furthermore, we discuss and highlight several prospective research directions of FCL such as algorithm-hardware co-design for FCL and FCL with foundation models, which could provide insights into the future development and practical deployment of FCL in the era of Edge-AI.
- Published
- 2024
14. Persistent but weak magnetic field at Moon's midlife revealed by Chang'e-5 basalt
- Author
-
Cai, Shuhui, Qin, Huafeng, Wang, Huapei, Deng, Chenglong, Yang, Saihong, Xu, Ya, Zhang, Chi, Tang, Xu, Gu, Lixin, Li, Xiaoguang, Shen, Zhongshan, Zhang, Min, He, Kuang, Qi, Kaixian, Fan, Yunchang, Dong, Liang, Hou, Yifei, Shi, Pingyuan, Liu, Shuangchi, Su, Fei, Chen, Yi, Li, Qiuli, Li, Jinhua, Mitchell, Ross N., He, Huaiyu, Li, Chunlai, Pan, Yongxin, and Zhu, Rixiang
- Subjects
Physics - Geophysics - Abstract
The evolution of the lunar magnetic field can reveal the Moon's interior structure, thermal history, and surface environment. The mid-to-late stage evolution of the lunar magnetic field is poorly constrained, and thus the existence of a long-lived lunar dynamo remains controversial. The Chang'e-5 mission returned the heretofore youngest mare basalts from Oceanus Procellarum uniquely positioned at mid-latitude. We recovered weak paleointensities of 2-4 uT from the Chang'e-5 basalt clasts at 2 billion years ago, attestting to the longevity of a lunar dynamo until at least the Moon's midlife. This paleomagnetic result implies the existence of thermal convection in the lunar deep interior at the lunar mid-stage which may have supplied mantle heat flux for the young volcanism.
- Published
- 2024
15. Fact-Level Confidence Calibration and Self-Correction
- Author
-
Yuan, Yige, Xu, Bingbing, Tan, Hexiang, Sun, Fei, Xiao, Teng, Li, Wei, Shen, Huawei, and Cheng, Xueqi
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Confidence calibration in LLMs, i.e., aligning their self-assessed confidence with the actual accuracy of their responses, enabling them to self-evaluate the correctness of their outputs. However, current calibration methods for LLMs typically estimate two scalars to represent overall response confidence and correctness, which is inadequate for long-form generation where the response includes multiple atomic facts and may be partially confident and correct. These methods also overlook the relevance of each fact to the query. To address these challenges, we propose a Fact-Level Calibration framework that operates at a finer granularity, calibrating confidence to relevance-weighted correctness at the fact level. Furthermore, comprehensive analysis under the framework inspired the development of Confidence-Guided Fact-level Self-Correction ($\textbf{ConFix}$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones. Extensive experiments across four datasets and six models demonstrate that ConFix effectively mitigates hallucinations without requiring external knowledge sources such as retrieval systems., Comment: Code is available at https://github.com/yuanyige/fact-calibration
- Published
- 2024
16. Mutual Information-oriented ISAC Beamforming Design under Statistical CSI
- Author
-
Xu, Shanfeng, Cheng, Yanshuo, Wang, Siqiang, Wang, Xinyi, Zheng, Zhong, and Fei, Zesong
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
Existing integrated sensing and communication (ISAC) beamforming design were mostly designed under perfect instantaneous channel state information (CSI), limiting their use in practical dynamic environments. In this paper, we study the beamforming design for multiple-input multiple-output (MIMO) ISAC systems based on statistical CSI, with the weighted mutual information (MI) comprising sensing and communication perspectives adopted as the performance metric. In particular, the operator-valued free probability theory is utilized to derive the closed-form expression for the weighted MI under statistical CSI. Subsequently, an efficient projected gradient ascent (PGA) algorithm is proposed to optimize the transmit beamforming matrix with the aim of maximizing the weighted MI.Numerical results validate that the derived closed-form expression matches well with the Monte Carlo simulation results and the proposed optimization algorithm is able to improve the weighted MI significantly. We also illustrate the trade-off between sensing and communication MI., Comment: 14 pages, 5 figures, submitted to IEEE journal for possible publication
- Published
- 2024
17. Identifying the Galactic Substructures in 5D Space Using All-sky RR Lyrae Stars in Gaia DR3
- Author
-
Sun, Shenglan, Wang, Fei, Zhang, Huawei, Xue, Xiang-Xiang, Huang, Yang, Zhang, Ruizhi, Rix, Hans-Walter, Li, Xinyi, Liu, Gaochao, Zhang, Lan, Yang, Chengqun, and Zhang, Shuo
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
Motivated by the vast gap between photometric and spectroscopic data volumes, there is great potential in using 5D kinematic information to identify and study substructures of the Milky Way. We identify substructures in the Galactic halo using 46,575 RR Lyrae stars (RRLs) from Gaia DR3 with the photometric metallicities and distances newly estimated by Li et al. (2023). Assuming a Gaussian prior distribution of radial velocity, we calculate the orbital distribution characterized by the integrals of motion for each RRL based on its 3D positions, proper motions and corresponding errors, and then apply the friends-of-friends algorithm to identify groups moving along similar orbits. We have identified several known substructures, including Sagittarius (Sgr) Stream, Hercules-Aquila Cloud (HAC), Virgo Overdensity (VOD), Gaia-Enceladus-Sausage (GES), Orphan-Chenab stream, Cetus-Palca, Helmi Streams, Sequoia, Wukong and Large Magellanic Cloud (LMC) leading arm, along with 18 unknown groups. Our findings indicate that HAC and VOD have kinematic and chemical properties remarkably similar to GES, with most HAC and VOD members exhibiting eccentricity as high as GES, suggesting that they may share a common origin with GES. The ability to identify the low mass and spatially dispersed substructures further demonstrates the potential of our method, which breaks the limit of spectroscopic survey and is competent to probe the substructures in the whole Galaxy. Finally, we have also identified 18 unknown groups with good spatial clustering and proper motion consistency, suggesting more excavation of Milky Way substructures in the future with only 5D data., Comment: 23 pages, 19 figures, 4 tables, accepted for publication in ApJ, version before language edition
- Published
- 2024
18. Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
- Author
-
Luo, Yongdong, Zheng, Xiawu, Yang, Xiao, Li, Guilin, Lin, Haojia, Huang, Jinfa, Ji, Jiayi, Chao, Fei, Luo, Jiebo, and Ji, Rongrong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Existing large video-language models (LVLMs) struggle to comprehend long videos correctly due to limited context. To address this problem, fine-tuning long-context LVLMs and employing GPT-based agents have emerged as promising solutions. However, fine-tuning LVLMs would require extensive high-quality data and substantial GPU resources, while GPT-based agents would rely on proprietary models (e.g., GPT-4o). In this paper, we propose Video Retrieval-Augmented Generation (Video-RAG), a training-free and cost-effective pipeline that employs visually-aligned auxiliary texts to help facilitate cross-modality alignment while providing additional information beyond the visual content. Specifically, we leverage open-source external tools to extract visually-aligned information from pure video data (e.g., audio, optical character, and object detection), and incorporate the extracted information into an existing LVLM as auxiliary texts, alongside video frames and queries, in a plug-and-play manner. Our Video-RAG offers several key advantages: (i) lightweight with low computing overhead due to single-turn retrieval; (ii) easy implementation and compatibility with any LVLM; and (iii) significant, consistent performance gains across long video understanding benchmarks, including Video-MME, MLVU, and LongVideoBench. Notably, our model demonstrates superior performance over proprietary models like Gemini-1.5-Pro and GPT-4o when utilized with a 72B model., Comment: 10 pages, 6 figures
- Published
- 2024
19. Large-scale cross-modality pretrained model enhances cardiovascular state estimation and cardiomyopathy detection from electrocardiograms: An AI system development and multi-center validation study
- Author
-
Ding, Zhengyao, Hu, Yujian, Xu, Youyao, Zhao, Chengchen, Li, Ziyu, Mao, Yiheng, Li, Haitao, Li, Qian, Wang, Jing, Chen, Yue, Chen, Mengjia, Wang, Longbo, Chu, Xuesen, Pan, Weichao, Liu, Ziyi, Wu, Fei, Zhang, Hongkun, Chen, Ting, and Huang, Zhengxing
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Cardiovascular diseases (CVDs) present significant challenges for early and accurate diagnosis. While cardiac magnetic resonance imaging (CMR) is the gold standard for assessing cardiac function and diagnosing CVDs, its high cost and technical complexity limit accessibility. In contrast, electrocardiography (ECG) offers promise for large-scale early screening. This study introduces CardiacNets, an innovative model that enhances ECG analysis by leveraging the diagnostic strengths of CMR through cross-modal contrastive learning and generative pretraining. CardiacNets serves two primary functions: (1) it evaluates detailed cardiac function indicators and screens for potential CVDs, including coronary artery disease, cardiomyopathy, pericarditis, heart failure and pulmonary hypertension, using ECG input; and (2) it enhances interpretability by generating high-quality CMR images from ECG data. We train and validate the proposed CardiacNets on two large-scale public datasets (the UK Biobank with 41,519 individuals and the MIMIC-IV-ECG comprising 501,172 samples) as well as three private datasets (FAHZU with 410 individuals, SAHZU with 464 individuals, and QPH with 338 individuals), and the findings demonstrate that CardiacNets consistently outperforms traditional ECG-only models, substantially improving screening accuracy. Furthermore, the generated CMR images provide valuable diagnostic support for physicians of all experience levels. This proof-of-concept study highlights how ECG can facilitate cross-modal insights into cardiac function assessment, paving the way for enhanced CVD screening and diagnosis at a population level., Comment: 23 pages, 8 figures
- Published
- 2024
20. GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning
- Author
-
Liu, Yuze, Liu, Tingjie, Zhang, Tiehua, Xia, Youhua, Wang, Jinze, Shen, Zhishu, Jin, Jiong, and Yu, Fei Richard
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Large language models (LLMs) have demonstrated impressive success in a wide range of natural language processing (NLP) tasks due to their extensive general knowledge of the world. Recent works discovered that the performance of LLMs is heavily dependent on the input prompt. However, prompt engineering is usually done manually in a trial-and-error fashion, which can be labor-intensive and challenging in order to find the optimal prompts. To address these problems and unleash the utmost potential of LLMs, we propose a novel LLMs-agnostic framework for prompt optimization, namely GRL-Prompt, which aims to automatically construct optimal prompts via reinforcement learning (RL) in an end-to-end manner. To provide structured action/state representation for optimizing prompts, we construct a knowledge graph (KG) that better encodes the correlation between the user query and candidate in-context examples. Furthermore, a policy network is formulated to generate the optimal action by selecting a set of in-context examples in a rewardable order to construct the prompt. Additionally, the embedding-based reward shaping is utilized to stabilize the RL training process. The experimental results show that GRL-Prompt outperforms recent state-of-the-art methods, achieving an average increase of 0.10 in ROUGE-1, 0.07 in ROUGE-2, 0.07 in ROUGE-L, and 0.05 in BLEU.
- Published
- 2024
21. IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose
- Author
-
Ren, Fei, Ren, Chao, and Lyu, Tianyi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
This study proposes the IoT-Enhanced Pose Optimization Network (IE-PONet) for high-precision 3D pose estimation and motion optimization of track and field athletes. IE-PONet integrates C3D for spatiotemporal feature extraction, OpenPose for real-time keypoint detection, and Bayesian optimization for hyperparameter tuning. Experimental results on NTURGB+D and FineGYM datasets demonstrate superior performance, with AP\(^p50\) scores of 90.5 and 91.0, and mAP scores of 74.3 and 74.0, respectively. Ablation studies confirm the essential roles of each module in enhancing model accuracy. IE-PONet provides a robust tool for athletic performance analysis and optimization, offering precise technical insights for training and injury prevention. Future work will focus on further model optimization, multimodal data integration, and developing real-time feedback mechanisms to enhance practical applications., Comment: 17 pages
- Published
- 2024
22. Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing
- Author
-
Ding, Ruyi, Zhou, Tong, Su, Lili, Ding, Aidong Adam, Xu, Xiaolin, and Fei, Yunsi
- Subjects
Computer Science - Cryptography and Security - Abstract
Adapting pre-trained deep learning models to customized tasks has become a popular choice for developers to cope with limited computational resources and data volume. More specifically, probing--training a downstream head on a pre-trained encoder--has been widely adopted in transfer learning, which helps to prevent overfitting and catastrophic forgetting. However, such generalizability of pre-trained encoders raises concerns about the potential misuse of probing for harmful intentions, such as discriminatory speculation and warfare applications. In this work, we introduce EncoderLock, a novel applicability authorization method designed to protect pre-trained encoders from malicious probing, i.e., yielding poor performance on specified prohibited domains while maintaining their utility in authorized ones. Achieving this balance is challenging because of the opposite optimization objectives and the variety of downstream heads that adversaries can utilize adaptively. To address these challenges, EncoderLock employs two techniques: domain-aware weight selection and updating to restrict applications on prohibited domains/tasks, and self-challenging training scheme that iteratively strengthens resistance against any potential downstream classifiers that adversaries may apply. Moreover, recognizing the potential lack of data from prohibited domains in practical scenarios, we introduce three EncoderLock variants with different levels of data accessibility: supervised (prohibited domain data with labels), unsupervised (prohibited domain data without labels), and zero-shot (no data or labels available). We verify EncoderLock's effectiveness and practicality with a real-world pre-trained Vision Transformer (ViT) encoder from Facebook. These results underscore the valuable contributions EncoderLock brings to the development of responsible AI., Comment: Accepted by Network and Distributed System Security (NDSS) Symposium 2025
- Published
- 2024
23. Robotic transcatheter tricuspid valve replacement with hybrid enhanced intelligence: a new paradigm and first-in-vivo study
- Author
-
Wang, Shuangyi, Lin, Haichuan, Xie, Yiping, Wang, Ziqi, Chen, Dong, Tan, Longyue, Hou, Xilong, Chen, Chen, Zhou, Xiao-Hu, Lin, Shengtao, Pan, Fei, So, Kent Chak-Yu, and Hou, Zeng-Guang
- Subjects
Computer Science - Robotics ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Transcatheter tricuspid valve replacement (TTVR) is the latest treatment for tricuspid regurgitation and is in the early stages of clinical adoption. Intelligent robotic approaches are expected to overcome the challenges of surgical manipulation and widespread dissemination, but systems and protocols with high clinical utility have not yet been reported. In this study, we propose a complete solution that includes a passive stabilizer, robotic drive, detachable delivery catheter and valve manipulation mechanism. Working towards autonomy, a hybrid augmented intelligence approach based on reinforcement learning, Monte Carlo probabilistic maps and human-robot co-piloted control was introduced. Systematic tests in phantom and first-in-vivo animal experiments were performed to verify that the system design met the clinical requirement. Furthermore, the experimental results confirmed the advantages of co-piloted control over conventional master-slave control in terms of time efficiency, control efficiency, autonomy and stability of operation. In conclusion, this study provides a comprehensive pathway for robotic TTVR and, to our knowledge, completes the first animal study that not only successfully demonstrates the application of hybrid enhanced intelligence in interventional robotics, but also provides a solution with high application value for a cutting-edge procedure.
- Published
- 2024
24. SCIGS: 3D Gaussians Splatting from a Snapshot Compressive Image
- Author
-
Wang, Zixu, Yang, Hao, Guo, Yu, and Wang, Fei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Snapshot Compressive Imaging (SCI) offers a possibility for capturing information in high-speed dynamic scenes, requiring efficient reconstruction method to recover scene information. Despite promising results, current deep learning-based and NeRF-based reconstruction methods face challenges: 1) deep learning-based reconstruction methods struggle to maintain 3D structural consistency within scenes, and 2) NeRF-based reconstruction methods still face limitations in handling dynamic scenes. To address these challenges, we propose SCIGS, a variant of 3DGS, and develop a primitive-level transformation network that utilizes camera pose stamps and Gaussian primitive coordinates as embedding vectors. This approach resolves the necessity of camera pose in vanilla 3DGS and enhances multi-view 3D structural consistency in dynamic scenes by utilizing transformed primitives. Additionally, a high-frequency filter is introduced to eliminate the artifacts generated during the transformation. The proposed SCIGS is the first to reconstruct a 3D explicit scene from a single compressed image, extending its application to dynamic 3D scenes. Experiments on both static and dynamic scenes demonstrate that SCIGS not only enhances SCI decoding but also outperforms current state-of-the-art methods in reconstructing dynamic 3D scenes from a single compressed image. The code will be made available upon publication.
- Published
- 2024
25. ADV2E: Bridging the Gap Between Analogue Circuit and Discrete Frames in the Video-to-Events Simulator
- Author
-
Jiang, Xiao, Zhou, Fei, and Lin, Jiongzhi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
Event cameras operate fundamentally differently from traditional Active Pixel Sensor (APS) cameras, offering significant advantages. Recent research has developed simulators to convert video frames into events, addressing the shortage of real event datasets. Current simulators primarily focus on the logical behavior of event cameras. However, the fundamental analogue properties of pixel circuits are seldom considered in simulator design. The gap between analogue pixel circuit and discrete video frames causes the degeneration of synthetic events, particularly in high-contrast scenes. In this paper, we propose a novel method of generating reliable event data based on a detailed analysis of the pixel circuitry in event cameras. We incorporate the analogue properties of event camera pixel circuits into the simulator design: (1) analogue filtering of signals from light intensity to events, and (2) a cutoff frequency that is independent of video frame rate. Experimental results on two relevant tasks, including semantic segmentation and image reconstruction, validate the reliability of simulated event data, even in high-contrast scenes. This demonstrates that deep neural networks exhibit strong generalization from simulated to real event data, confirming that the synthetic events generated by the proposed method are both realistic and well-suited for effective training., Comment: 10 pages, 6 figures
- Published
- 2024
26. Continuous Speculative Decoding for Autoregressive Image Generation
- Author
-
Wang, Zili, Zhang, Robert, Ding, Kun, Yang, Qi, Li, Fei, and Xiang, Shiming
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Continuous-valued Autoregressive (AR) image generation models have demonstrated notable superiority over their discrete-token counterparts, showcasing considerable reconstruction quality and higher generation fidelity. However, the computational demands of the autoregressive framework result in significant inference overhead. While speculative decoding has proven effective in accelerating Large Language Models (LLMs), their adaptation to continuous-valued visual autoregressive models remains unexplored. This work generalizes the speculative decoding algorithm from discrete tokens to continuous space. By analyzing the intrinsic properties of output distribution, we establish a tailored acceptance criterion for the diffusion distributions prevalent in such models. To overcome the inconsistency that occurred in speculative decoding output distributions, we introduce denoising trajectory alignment and token pre-filling methods. Additionally, we identify the hard-to-sample distribution in the rejection phase. To mitigate this issue, we propose a meticulous acceptance-rejection sampling method with a proper upper bound, thereby circumventing complex integration. Experimental results show that our continuous speculative decoding achieves a remarkable $2.33\times$ speed-up on off-the-shelf models while maintaining the output distribution. Codes will be available at https://github.com/MarkXCloud/CSpD
- Published
- 2024
27. Measuring coherence via Kirkwood-Dirac nonclassicality with respect to mutually unbiased bases
- Author
-
Liu, Yan, Guo, Zhihua, Ma, Zhihao, and Fei, Shao-Ming
- Subjects
Quantum Physics - Abstract
The Kirkwood-Dirac distribution, serving as an informationally complete representation of a quantum state, has recently garnered { increasing} attention. We investigate the Kirkwood-Dirac classicality with respect to mutually unbiased bases. For prime dimensional Hilbert spaces, we demonstrate that quantum states which exhibit Kirkwood-Dirac classicality for two distinct sets of mutually unbiased bases $A$, $B$ and $A$, $B'$ must necessarily be incoherent with respect to $A$. We subsequently introduce a coherence monotone based on Kirkwood-Dirac nonclassicality with respect to mutually unbiased bases. Additionally, we establish that this coherence monotone can be expressed through weak values, suggesting that quantum coherence can be utilized to detect anomalous weak values., Comment: 9 pages, 3 figures
- Published
- 2024
28. On the Incorporation of Stability Constraints into Sequential Operational Scheduling
- Author
-
Xu, Wangkun, Chu, Zhongda, Capitanescu, Florin, and Teng, Fei
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
With the increasing penetration of Inverter-Based Resources (IBRs), power system stability constraints must be incorporated into the operational framework, transforming it into stability-constrained optimization. Currently, there exist parallel research efforts on developing the stability constraints within DC power flow-based unit commitment (UC) and AC Optimal Power Flow (OPF). However, few studies discuss how including such constraints can interact with each other and eventually impact grid stability. In this context, this work simulates a realistic power system decision making framework and provides a thorough analysis on the necessity of incorporating frequency nadir and small signal stability constraints into these sequentially connected two operation stages. The simulation results demonstrate that including both stability constraints in the UC is essential to maintain power system stability, while the inclusion in AC OPF can further improve the stability index.
- Published
- 2024
29. Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning
- Author
-
Yan, Xudong, Feng, Songhe, Zhang, Yang, Yang, Jian, Lin, Yueguan, and Fei, Haojun
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Compositional zero-shot learning (CZSL) aims to recognize novel compositions of attributes and objects learned from seen compositions. Previous works disentangle attribute and object by extracting shared and exclusive parts between image pairs sharing the same attribute (object), as well as aligning them with pretrained word embeddings to improve unseen attribute-object recognition. Despite the significant achievements of existing efforts, they are hampered by three limitations: (1) the efficacy of disentanglement is compromised due to the influence of the background and the intricate entanglement of attribute with object in the same parts. (2) existing word embeddings fail to capture complex multimodal semantic information. (3) overconfidence exhibited by existing models in seen compositions hinders their generalization to novel compositions. Being aware of these, we propose a novel framework named Multimodal Large Language Model (MLLM) embeddings and attribute smoothing guided disentanglement (TRIDENT) for CZSL. First, we leverage feature adaptive aggregation modules to mitigate the impact of background, and utilize learnable condition masks to capture multigranularity features for disentanglement. Then, the last hidden states of MLLM are employed as word embeddings for their superior representation capabilities. Moreover, we propose attribute smoothing with auxiliary attributes generated by Large Language Model (LLM) for seen compositions, addressing the issue of overconfidence by encouraging the model to learn more attributes in one given composition. Extensive experiments demonstrate that TRIDENT achieves state-of-the-art performance on three benchmarks.
- Published
- 2024
30. Quantum Coherence: A Fundamental Resource for Establishing Genuine Multipartite Correlations
- Author
-
Wang, Zong, Guo, Zhihua, Chen, Zhihua, Li, Ming, Zhou, Zihang, Zhang, Chengjie, Fei, Shao-Ming, and Ma, Zhihao
- Subjects
Quantum Physics - Abstract
We establish the profound equivalence between measures of genuine multipartite entanglement(GME) and their corresponding coherence measures. Initially we construct two distinct classes of measures for genuine multipartite entanglement utilizing real symmetric concave functions and the convex roof technique. We then demonstrate that all coherence measures for any qudit states, defined through the convex roof approach, are identical to our two classes of GME measures of the states combined with an incoherent ancilla under a unitary incoherent operation. This relationship implies that genuine multipartite entanglement can be generated from the coherence inherent in an initial state through the unitary incoherent operations. Furthermore, we explore the interplay between coherence and other forms of genuine quantum correlations, specifically genuine multipartite steering and genuine multipartite nonlocality. In the instance of special three-qubit X-states (only nonzero elements of X-state are diagonal or antidiagonal when written in an orthonormal basis), we find that genuine multipartite steering and nonlocality are present if and only if the coherence exists in the corresponding qubit states.
- Published
- 2024
31. Experimental probe of band structures of bilayer valley photonic crystals
- Author
-
Guo, Xiang-Fei, Liu, Jian-Wei, Chen, Hong-Xiang, Shi, Fu-Long, Chen, Xiao-Dong, and Dong, Jian-Wen
- Subjects
Physics - Optics - Abstract
Research on two-dimensional van der Waals materials has demonstrated that the layer degree of freedom can significantly alter the physical properties of materials due to the substantial modification of bulk bands. Inspired by this concept, layered photonic systems have been proposed and realized, revealing novel phenomena absent in their monolayer counterparts. In this work, we experimentally investigate the band structures of bilayer valley photonic crystals. Two typical structures with different stacking configurations are experimentally imaged via the near-field scanning technology, exhibiting distinct bulk band structures. Furthermore, different topological edge modes induced by distinct topology are observed, revealing that the layer degree of freedom can be regarded as a pseudospin and offer further capabilities for controlling the flow of light. Our work not only elucidates the evolution of band structures from monolayer to bilayer topological systems but also provides an experimental platform for the further exploration of bilayer topological insulators., Comment: 20 pages, 5 figures
- Published
- 2024
32. An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems
- Author
-
Li, Jingyu, Chiu, Aemon Yat Fei, and Lee, Tan
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Language mismatch is among the most common and challenging domain mismatches in deploying speaker verification (SV) systems. Adversarial reprogramming has shown promising results in cross-language adaptation for SV. The reprogramming is implemented by padding learnable parameters on the two sides of input speech signals. In this paper, we investigate the relationship between the number of padded parameters and the performance of the reprogrammed models. Sufficient experiments are conducted with different scales of SV models and datasets. The results demonstrate that reprogramming consistently improves the performance of cross-language SV, while the improvement is saturated or even degraded when using larger padding lengths. The performance is mainly determined by the capacity of the original SV models instead of the number of padded parameters. The SV models with larger scales have higher upper bounds in performance and can endure longer padding without performance degradation., Comment: Accepted by ISCSLP 2024
- Published
- 2024
33. Extremal Maximal Entanglement
- Author
-
Zhang, Wanchen, Ning, Yu, Shi, Fei, and Zhang, Xiande
- Subjects
Quantum Physics - Abstract
A pure multipartite quantum state is called absolutely maximally entangled if all reductions of no more than half of the parties are maximally mixed. However, an $n$-qubit absolutely maximally entangled state only exists when $n$ equals $2$, $3$, $5$, and $6$. A natural question arises when it does not exist: which $n$-qubit pure state has the largest number of maximally mixed $\lfloor n/2 \rfloor$-party reductions? Denote this number by $Qex(n)$. It was shown that $Qex(4)=4$ in [Higuchi et al.Phys. Lett. A (2000)] and $Qex(7)=32$ in [Huber et al.Phys. Rev. Lett. (2017)]. In this paper, we give a general upper bound of $Qex(n)$ by linking the well-known Tur\'an's problem in graph theory, and provide lower bounds by constructive and probabilistic methods. In particular, we show that $Qex(8)=56$, which is the third known value for this problem.
- Published
- 2024
34. CCIS-Diff: A Generative Model with Stable Diffusion Prior for Controlled Colonoscopy Image Synthesis
- Author
-
Xie, Yifan, Wang, Jingge, Feng, Tao, Ma, Fei, and Li, Yang
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Colonoscopy is crucial for identifying adenomatous polyps and preventing colorectal cancer. However, developing robust models for polyp detection is challenging by the limited size and accessibility of existing colonoscopy datasets. While previous efforts have attempted to synthesize colonoscopy images, current methods suffer from instability and insufficient data diversity. Moreover, these approaches lack precise control over the generation process, resulting in images that fail to meet clinical quality standards. To address these challenges, we propose CCIS-DIFF, a Controlled generative model for high-quality Colonoscopy Image Synthesis based on a Diffusion architecture. Our method offers precise control over both the spatial attributes (polyp location and shape) and clinical characteristics of polyps that align with clinical descriptions. Specifically, we introduce a blur mask weighting strategy to seamlessly blend synthesized polyps with the colonic mucosa, and a text-aware attention mechanism to guide the generated images to reflect clinical characteristics. Notably, to achieve this, we construct a new multi-modal colonoscopy dataset that integrates images, mask annotations, and corresponding clinical text descriptions. Experimental results demonstrate that our method generates high-quality, diverse colonoscopy images with fine control over both spatial constraints and clinical consistency, offering valuable support for downstream segmentation and diagnostic tasks., Comment: 5 pages, 4 figures
- Published
- 2024
35. Dimension estimate and existence of holomorphic sections with polynomial growth on gradient K\'ahler Ricci shrinkers
- Author
-
He, Fei and Ou, Jianyu
- Subjects
Mathematics - Differential Geometry ,Mathematics - Complex Variables - Abstract
We prove an upper bound for the dimension of the linear space of holomorphic functions with polynomial growth on gradient K\"ahler Ricci shrinkers with bounded curvature. The upper bound is given as a power function of the growth rate. Similar results hold for holomorphic $(p, 0)-$forms, and holomorphic sections of the pluri-anticanonical line bundle $K_M^{-q}$. We also prove the existence of holomorphic sections of $K_M^{-q}$ with polynomial growth when the K\"ahler Ricci shrinker is asymptotically conical, provided $q$ is sufficiently large; as an application, we show that the Kodaira map constructed using such sections is a holomorphic embbedding into a complex projective space.
- Published
- 2024
36. Layered semiconducting electrides in p-block metal oxides
- Author
-
Dai, Jiaqi, Yang, Feng, Wang, Cong, Pang, Fei, Cheng, Zhihai, and Ji, Wei
- Subjects
Condensed Matter - Materials Science - Abstract
In conventional electrides, excess electrons are localized in crystal voids to serve as anions. Most of these electrides are metallic and the metal cations are primarily from the s-block, d-block, or rare-earth elements. Here, we report a class of p-block metal-based electrides found in bilayer SnO and PbO, which are semiconducting and feature electride states in both the valence band (VB) and conduction band (CB), as referred to 2D "bipolar" electrides. These bilayers are hybrid electrides where excess electrons are localized in the interlayer region and hybridize with the orbitals of Sn atoms in the VB, exhibiting strong covalent-like interactions with neighboring metal atoms. Compared to previously studied hybrid electrides, the higher electronegativity of Sn and Pb enhances these covalent-like interactions, leading to largely enhanced semiconducting bandgap of up to 2.5 eV. Moreover, the CBM primarily arises from the overlap between metal states and interstitial charges, denoting a potential electride and forming a free-electron-like (FEL) state with small effective mass. This state offers high carrier mobilities for both electron and hole in bilayer SnO, suggesting its potential as a promising p-type semiconductor material.
- Published
- 2024
37. Self-supervised denoising of visual field data improves detection of glaucoma progression
- Author
-
Wu, Sean, Chen, Jun Yu, Mohammadzadeh, Vahid, Besharati, Sajad, Lee, Jaewon, Nouri-Mahdavi, Kouros, Caprioli, Joseph, Fei, Zhe, and Scalzo, Fabien
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Perimetric measurements provide insight into a patient's peripheral vision and day-to-day functioning and are the main outcome measure for identifying progression of visual damage from glaucoma. However, visual field data can be noisy, exhibiting high variance, especially with increasing damage. In this study, we demonstrate the utility of self-supervised deep learning in denoising visual field data from over 4000 patients to enhance its signal-to-noise ratio and its ability to detect true glaucoma progression. We deployed both a variational autoencoder (VAE) and a masked autoencoder to determine which self-supervised model best smooths the visual field data while reconstructing salient features that are less noisy and more predictive of worsening disease. Our results indicate that including a categorical p-value at every visual field location improves the smoothing of visual field data. Masked autoencoders led to cleaner denoised data than previous methods, such as variational autoencoders. A 4.7% increase in detection of progressing eyes with pointwise linear regression (PLR) was observed. The masked and variational autoencoders' smoothed data predicted glaucoma progression 2.3 months earlier when p-values were included compared to when they were not. The faster prediction of time to progression (TTP) and the higher percentage progression detected support our hypothesis that masking out visual field elements during training while including p-values at each location would improve the task of detection of visual field progression. Our study has clinically relevant implications regarding masking when training neural networks to denoise visual field data, resulting in earlier and more accurate detection of glaucoma progression. This denoising model can be integrated into future models for visual field analysis to enhance detection of glaucoma progression., Comment: 10 pages
- Published
- 2024
38. BianCang: A Traditional Chinese Medicine Large Language Model
- Author
-
Wei, Sibo, Peng, Xueping, Wang, Yi-fei, Si, Jiasheng, Zhang, Weiyu, Lu, Wenpeng, Wu, Xiaoming, and Wang, Yinglong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
The rise of large language models (LLMs) has driven significant progress in medical applications, including traditional Chinese medicine (TCM). However, current medical LLMs struggle with TCM diagnosis and syndrome differentiation due to substantial differences between TCM and modern medical theory, and the scarcity of specialized, high-quality corpora. This paper addresses these challenges by proposing BianCang, a TCM-specific LLM, using a two-stage training process that first injects domain-specific knowledge and then aligns it through targeted stimulation. To enhance diagnostic and differentiation capabilities, we constructed pre-training corpora, instruction-aligned datasets based on real hospital records, and the ChP-TCM dataset derived from the Pharmacopoeia of the People's Republic of China. We compiled extensive TCM and medical corpora for continuous pre-training and supervised fine-tuning, building a comprehensive dataset to refine the model's understanding of TCM. Evaluations across 11 test sets involving 29 models and 4 tasks demonstrate the effectiveness of BianCang, offering valuable insights for future research. Code, datasets, and models are available at https://github.com/QLU-NLP/BianCang.
- Published
- 2024
39. SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
- Author
-
Jia, Hongrui, Jiang, Chaoya, Xu, Haiyang, Ye, Wei, Dong, Mengfan, Yan, Ming, Zhang, Ji, Huang, Fei, and Zhang, Shikun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
As language models continue to scale, Large Language Models (LLMs) have exhibited emerging capabilities in In-Context Learning (ICL), enabling them to solve language tasks by prefixing a few in-context demonstrations (ICDs) as context. Inspired by these advancements, researchers have extended these techniques to develop Large Multimodal Models (LMMs) with ICL capabilities. However, existing LMMs face a critical issue: they often fail to effectively leverage the visual context in multimodal demonstrations and instead simply follow textual patterns. This indicates that LMMs do not achieve effective alignment between multimodal demonstrations and model outputs. To address this problem, we propose Symbol Demonstration Direct Preference Optimization (SymDPO). Specifically, SymDPO aims to break the traditional paradigm of constructing multimodal demonstrations by using random symbols to replace text answers within instances. This forces the model to carefully understand the demonstration images and establish a relationship between the images and the symbols to answer questions correctly. We validate the effectiveness of this method on multiple benchmarks, demonstrating that with SymDPO, LMMs can more effectively understand the multimodal context within examples and utilize this knowledge to answer questions better.
- Published
- 2024
40. SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features
- Author
-
Shi, Yu-Fei, Ai, Yang, Lu, Ye-Xin, Du, Hui-Peng, and Ling, Zhen-Hua
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Assessing the naturalness of speech using mean opinion score (MOS) prediction models has positive implications for the automatic evaluation of speech synthesis systems. Early MOS prediction models took the raw waveform or amplitude spectrum of speech as input, whereas more advanced methods employed self-supervised-learning (SSL) based models to extract semantic representations from speech for MOS prediction. These methods utilized limited aspects of speech information for MOS prediction, resulting in restricted prediction accuracy. Therefore, in this paper, we propose SAMOS, a MOS prediction model that leverages both Semantic and Acoustic information of speech to be assessed. Specifically, the proposed SAMOS leverages a pretrained wav2vec2 to extract semantic representations and uses the feature extractor of a pretrained BiVocoder to extract acoustic features. These two types of features are then fed into the prediction network, which includes multi-task heads and an aggregation layer, to obtain the final MOS score. Experimental results demonstrate that the proposed SAMOS outperforms current state-of-the-art MOS prediction models on the BVCC dataset and performs comparable performance on the BC2019 dataset, according to the results of system-level evaluation metrics.
- Published
- 2024
41. Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion
- Author
-
Shi, Yu-Fei, Ai, Yang, Lu, Ye-Xin, Du, Hui-Peng, and Ling, Zhen-Hua
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
We participated in track 2 of the VoiceMOS Challenge 2024, which aimed to predict the mean opinion score (MOS) of singing samples. Our submission secured the first place among all participating teams, excluding the official baseline. In this paper, we further improve our submission and propose a novel Pitch-and-Spectrum-aware Singing Quality Assessment (PS-SQA) method. The PS-SQA is designed based on the self-supervised-learning (SSL) MOS predictor, incorporating singing pitch and spectral information, which are extracted using pitch histogram and non-quantized neural codec, respectively. Additionally, the PS-SQA introduces a bias correction strategy to address prediction biases caused by low-resource training samples, and employs model fusion technology to further enhance prediction accuracy. Experimental results confirm that our proposed PS-SQA significantly outperforms all competing systems across all system-level metrics, confirming its strong sing quality assessment capabilities.
- Published
- 2024
42. Unified monogamy relations for the generalized $W$-class states beyond qubits
- Author
-
Shen, Zhong-Xi, Zhou, Wen, Xuan, Dong-Ping, Wang, Zhi-Xi, and Fei, Shao-Ming
- Subjects
Quantum Physics - Abstract
The monogamy of entanglement stands as an indispensable feature within multipartite quantum systems. We study monogamy relations with respect to any partitions for the generalized $W$-class (GW) states based on the unified-($q,s$) entanglement (UE). We provide the monogamy relation based on the squared UE for a reduced density matrix of a qudit GW state, as well as tighter monogamy relations based on the $\alpha$th ($\alpha\geq2$) power of UE. Furthermore, for an $n$-qudit system $ABC_1...C_{n-2}$, generalized monogamy relation and upper bound satisfied by the $\beta$th ($0\leq\beta\leq1$) power of UE for the GW states under the partition $AB$ and $C_1...C_{n-2}$ are established. In particular, two partition-dependent residual entanglements for the GW states are analyzed in detail., Comment: 12 pages, 4 figures
- Published
- 2024
- Full Text
- View/download PDF
43. HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization
- Author
-
Zhao, Huaqin, Li, Jiaxi, Pan, Yi, Liang, Shizhe, Yang, Xiaofeng, Liu, Wei, Li, Xiang, Dou, Fei, Liu, Tianming, and Lu, Jin
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Fine-tuning large language models (LLMs) poses significant memory challenges, as the back-propagation process demands extensive resources, especially with growing model sizes. Recent work, MeZO, addresses this issue using a zeroth-order (ZO) optimization method, which reduces memory consumption by matching the usage to the inference phase. However, MeZO experiences slow convergence due to varying curvatures across model parameters. To overcome this limitation, we introduce HELENE, a novel scalable and memory-efficient optimizer that integrates annealed A-GNB gradients with a diagonal Hessian estimation and layer-wise clipping, serving as a second-order pre-conditioner. This combination allows for faster and more stable convergence. Our theoretical analysis demonstrates that HELENE improves convergence rates, particularly for models with heterogeneous layer dimensions, by reducing the dependency on the total parameter space dimension. Instead, the method scales with the largest layer dimension, making it highly suitable for modern LLM architectures. Experimental results on RoBERTa-large and OPT-1.3B across multiple tasks show that HELENE achieves up to a 20x speedup compared to MeZO, with average accuracy improvements of 1.5%. Furthermore, HELENE remains compatible with both full parameter tuning and parameter-efficient fine-tuning (PEFT), outperforming several state-of-the-art optimizers. The codes will be released after reviewing.
- Published
- 2024
44. EDBooks: AI-Enhanced Interactive Narratives for Programming Education
- Author
-
Oney, Steve, Shen, Yue, Wu, Fei, Hong, Young Suh, Wang, Ziang, Khajekar, Yamini, Zhang, Jiacheng, and Wang, April Yi
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Large Language Models (LLMs) have shown the potential to be valuable teaching tools, with the potential of giving every student a personalized tutor. However, one challenge with using LLMs to learn new concepts is that when learning a topic in an unfamiliar domain, it can be difficult to know what questions to ask. Further, language models do not always encourage "active learning" where students can test and assess their understanding. In this paper, we propose ways to combine large language models with "traditional" learning materials (like e-books) to give readers the benefits of working with LLMs (the ability to ask personally interesting questions and receive personalized answers) with the benefits of a traditional e-book (having a structure and content that is pedagogically sound). This work shows one way that LLMs have the potential to improve learning materials and make personalized programming education more accessible to a broader audience., Comment: 21 pages
- Published
- 2024
45. Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts
- Author
-
Long, Jinqiang, Dai, Yanqi, Yang, Guoxing, Lin, Hongpeng, Fei, Nanyi, Gao, Yizhao, and Lu, Zhiwu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
As the research of Multimodal Large Language Models (MLLMs) becomes popular, an advancing MLLM model is typically required to handle various textual and visual tasks (e.g., VQA, Detection, OCR, and ChartQA) simultaneously for real-world applications. However, due to the significant differences in representation and distribution among data from various tasks, simply mixing data of all tasks together leads to the well-known``multi-task conflict" issue, resulting in performance degradation across various tasks. To address this issue, we propose Awaker2.5-VL, a Mixture of Experts~(MoE) architecture suitable for MLLM, which acquires the multi-task capabilities through multiple sparsely activated experts. To speed up the training and inference of Awaker2.5-VL, each expert in our model is devised as a low-rank adaptation (LoRA) structure. Extensive experiments on multiple latest benchmarks demonstrate the effectiveness of Awaker2.5-VL. The code and model weight are released in our Project Page: https://github.com/MetabrainAGI/Awaker.
- Published
- 2024
46. The Effect of Galaxy Interactions on Starbursts in Milky Way-Mass Galaxies in FIRE Simulations
- Author
-
Li, Fei, Rahman, Mubdi, Murray, Norman, Kereš, Dušan, Wetzel, Andrew, Faucher-Giguère, Claude-André, Hopkins, Philip F., and Moreno, Jorge
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
Simulations and observations suggest that galaxy interactions may enhance the star formation rate (SFR) in merging galaxies. One proposed mechanism is the torque exerted on the gas and stars in the larger galaxy by the smaller galaxy. We analyze the interaction torques and star formation activity on six galaxies from the FIRE-2 simulation suite with masses comparable to the Milky Way galaxy at redshift $z=0$. We trace the halos from $z = 3.6$ to $z=0$, calculating the torque exerted by the nearby galaxies on the gas in the central galaxy. We calculate the correlation between the torque and the SFR across the simulations for various mass ratios. For near-equal-stellar-mass-ratio interactions in the galaxy sample, occurring between $z=1.2-3.6$, there is a positive and statistically significant correlation between the torque from nearby galaxies on the gas of the central galaxies and the SFR. For all other samples, no statistically significant correlation is found between the torque and the SFR. Our analysis shows that some, but not all, major interactions cause starbursts in the simulated Milky Way-mass galaxies, and that most starbursts are not caused by galaxy interactions. The transition from `bursty' at high redshift ($z\gtrsim1$) to `steady' star-formation state at later times is independent of the interaction history of the galaxies, and most of the interactions do not leave significant imprints on the overall trend of the star formation history of the galaxies., Comment: Submitted to ApJ
- Published
- 2024
47. Propensity Score Matching: Should We Use It in Designing Observational Studies?
- Author
-
Wan, Fei
- Subjects
Statistics - Methodology ,Statistics - Applications - Abstract
Propensity Score Matching (PSM) stands as a widely embraced method in comparative effectiveness research. PSM crafts matched datasets, mimicking some attributes of randomized designs, from observational data. In a valid PSM design where all baseline confounders are measured and matched, the confounders would be balanced, allowing the treatment status to be considered as if it were randomly assigned. Nevertheless, recent research has unveiled a different facet of PSM, termed "the PSM paradox." As PSM approaches exact matching by progressively pruning matched sets in order of decreasing propensity score distance, it can paradoxically lead to greater covariate imbalance, heightened model dependence, and increased bias, contrary to its intended purpose. Methods: We used analytic formula, simulation, and literature to demonstrate that this paradox stems from the misuse of metrics for assessing chance imbalance and bias. Results: Firstly, matched pairs typically exhibit different covariate values despite having identical propensity scores. However, this disparity represents a "chance" difference and will average to zero over a large number of matched pairs. Common distance metrics cannot capture this ``chance" nature in covariate imbalance, instead reflecting increasing variability in chance imbalance as units are pruned and the sample size diminishes. Secondly, the largest estimate among numerous fitted models, because of uncertainty among researchers over the correct model, was used to determine statistical bias. This cherry-picking procedure ignores the most significant benefit of matching design-reducing model dependence based on its robustness against model misspecification bias. Conclusions: We conclude that the PSM paradox is not a legitimate concern and should not stop researchers from using PSM designs.
- Published
- 2024
48. The Redshift-Space Momentum Power Spectrum III: measuring the growth rate from the SDSSv survey using auto- and cross- power spectrum of the galaxy density and momentum fields
- Author
-
Qin, Fei, Howlett, Cullan, and Parkinson, David
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics - Abstract
The large-scale structure of the Universe and its evolution over time contains an abundance of cosmological information. One way to unlock this is by measuring the density and momentum power spectrum from the positions and peculiar velocities of galaxies, and fitting the cosmological parameters from these power spectrum. In this paper, we will explore the cross power spectrum between the density and momentum fields of galaxies. We derive the estimator of the density-momentum cross power spectrum multipoles. The growth rate of the large-scale-structure, $f\sigma_8$ is measured from fitting the combined density monopole, momentum monopole and cross dipole power spectrum. The estimators and models of power spectrum as well as our fitting method have been tested using mock catalogues, and we find that they perform well in recovering the fiducial values of the cosmological parameters of the simulations, and we also find that the errors of the parameters can be largely reduced by including the cross-power spectrum in the fit. We measure the auto-density, auto-momentum and cross power spectrum using the Sloan Digital Sky Survey Data Release 14 peculiar velocity catalogue. The fit result of the growth rate $f\sigma_8$ is $f\sigma_8=0.413^{+0.050}_{-0.058}$ at effective redshift $z_{\mathrm{eff}}=0.073$, and our measurement is consistent with the prediction of the $\Lambda$ Cold Dark Matter cosmological model assuming General Relativity., Comment: 15 pages, 8 figures, 2 tables. Published in ApJ
- Published
- 2024
49. Your Fixed Watermark is Fragile: Towards Semantic-Aware Watermark for EaaS Copyright Protection
- Author
-
Fei, Zekun, Yi, Biao, Geng, Jianing, He, Ruiqi, Nie, Lihai, and Liu, Zheli
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence - Abstract
Embedding-as-a-Service (EaaS) has emerged as a successful business pattern but faces significant challenges related to various forms of copyright infringement, including API misuse and different attacks. Various studies have proposed backdoor-based watermarking schemes to protect the copyright of EaaS services. In this paper, we reveal that previous watermarking schemes possess semantic-independent characteristics and propose the Semantic Perturbation Attack (SPA). Our theoretical and experimental analyses demonstrate that this semantic-independent nature makes current watermarking schemes vulnerable to adaptive attacks that exploit semantic perturbations test to bypass watermark verification. To address this vulnerability, we propose the Semantic Aware Watermarking (SAW) scheme, a robust defense mechanism designed to resist SPA, by injecting a watermark that adapts to the text semantics. Extensive experimental results across multiple datasets demonstrate that the True Positive Rate (TPR) for detecting watermarked samples under SPA can reach up to more than 95%, rendering previous watermarks ineffective. Meanwhile, our watermarking scheme can resist such attack while ensuring the watermark verification capability. Our code is available at https://github.com/Zk4-ps/EaaS-Embedding-Watermark.
- Published
- 2024
50. JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation
- Author
-
Cao, Xuyang, Wang, Guoxin, Shi, Sheng, Zhao, Jun, Yao, Yang, Fei, Jintao, and Gao, Minyu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Audio-driven portrait animation has made significant advances with diffusion-based models, improving video quality and lipsync accuracy. However, the increasing complexity of these models has led to inefficiencies in training and inference, as well as constraints on video length and inter-frame continuity. In this paper, we propose JoyVASA, a diffusion-based method for generating facial dynamics and head motion in audio-driven facial animation. Specifically, in the first stage, we introduce a decoupled facial representation framework that separates dynamic facial expressions from static 3D facial representations. This decoupling allows the system to generate longer videos by combining any static 3D facial representation with dynamic motion sequences. Then, in the second stage, a diffusion transformer is trained to generate motion sequences directly from audio cues, independent of character identity. Finally, a generator trained in the first stage uses the 3D facial representation and the generated motion sequences as inputs to render high-quality animations. With the decoupled facial representation and the identity-independent motion generation process, JoyVASA extends beyond human portraits to animate animal faces seamlessly. The model is trained on a hybrid dataset of private Chinese and public English data, enabling multilingual support. Experimental results validate the effectiveness of our approach. Future work will focus on improving real-time performance and refining expression control, further expanding the applications in portrait animation. The code is available at: https://github.com/jdh-algo/JoyVASA.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.