60,149 results on '"CHEN, XI"'
Search Results
2. Freely Suspended Nematic and Smectic Films and Free-Standing Smectic Filaments in the Ferroelectric Nematic Realm
- Author
-
Hedlund, Keith G., Martinez, Vikina, Chen, Xi, Park, Cheol S., Maclennan, Joseph E., Glaser, Matthew A., and Clark, Noel A.
- Subjects
Condensed Matter - Soft Condensed Matter - Abstract
We show that stable, freely suspended liquid crystal films can be made from the ferroelectric nematic ($\mathrm{N_F}$) phase and from the recently discovered polar, lamellar $\mathrm{SmZ_A}$ and $\mathrm{SmA_F}$ phases. The $\mathrm{N_F}$ films display two-dimensional, smectic-like parabolic focal conic textures comprising director/polarization bend that are a manifestation of the electrostatic suppression of director splay in the film plane. In the $\mathrm{SmZ_A}$ and $\mathrm{SmA_F}$ phases, the smectic layers orient preferentially normal to the film surfaces, a condition never found in typical thermotropic or lyotropic lamellar LC phases, with the $\mathrm{SmZ_A}$ films exhibiting focal-conic fan textures mimicking the appearance of typical smectics in glass cells when the layers are oriented normal to the plates, and the $\mathrm{SmA_F}$ films showing a texture of plaquettes of uniform in-plane orientation where both bend and splay are suppressed, separated by grain boundaries. The $\mathrm{SmA_F}$ phase can also be drawn into thin filaments, in which X-ray scattering reveals that the smectic layer planes are normal to the filament axis. Remarkably, the filaments are mechanically stable even if they break, forming free-standing, fluid filaments supported only at one end. The unique architectures of these films and filaments are stabilized by the electrostatic self-interaction of the liquid crystal polarization field, which enables the formation of confined, fluid structures that are fundamentally different from those of their counterparts made using previously known liquid crystal phases., Comment: Main paper 25 pages (5 figures); Supplement: 7 pages (7 figures)
- Published
- 2024
3. FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation
- Author
-
Chen, Xi, Yang, Haosen, Jin, Sheng, Zhu, Xiatian, and Yao, Hongxun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Open-vocabulary segmentation poses significant challenges, as it requires segmenting and recognizing objects across an open set of categories in unconstrained environments. Building on the success of powerful vision-language (ViL) foundation models, such as CLIP, recent efforts sought to harness their zero-short capabilities to recognize unseen categories. Despite notable performance improvements, these models still encounter the critical issue of generating precise mask proposals for unseen categories and scenarios, resulting in inferior segmentation performance eventually. To address this challenge, we introduce a novel approach, FrozenSeg, designed to integrate spatial knowledge from a localization foundation model (e.g., SAM) and semantic knowledge extracted from a ViL model (e.g., CLIP), in a synergistic framework. Taking the ViL model's visual encoder as the feature backbone, we inject the space-aware feature into the learnable queries and CLIP features within the transformer decoder. In addition, we devise a mask proposal ensemble strategy for further improving the recall rate and mask quality. To fully exploit pre-trained knowledge while minimizing training overhead, we freeze both foundation models, focusing optimization efforts solely on a lightweight transformer decoder for mask proposal generation-the performance bottleneck. Extensive experiments demonstrate that FrozenSeg advances state-of-the-art results across various segmentation benchmarks, trained exclusively on COCO panoptic data, and tested in a zero-shot manner. Code is available at https://github.com/chenxi52/FrozenSeg., Comment: 14 pages, 9 figures
- Published
- 2024
4. TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations
- Author
-
Gao, Mingze, Liu, Jingyu, Li, Mingda, Xie, Jiangtao, Liu, Qingbin, Zhao, Bo, Chen, Xi, and Xiong, Hui
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Multimodal Large Language Models (MLLMs) have significantly improved performance across various image-language applications. Recently, there has been a growing interest in adapting image pre-trained MLLMs for video-related tasks. However, most efforts concentrate on enhancing the vision encoder and projector components, while the core part, Large Language Models (LLMs), remains comparatively under-explored. In this paper, we propose two strategies to enhance the model's capability in video understanding tasks by improving inter-layer attention computation in LLMs. Specifically, the first approach focuses on the enhancement of Rotary Position Embedding (RoPE) with Temporal-Aware Dual RoPE, which introduces temporal position information to strengthen the MLLM's temporal modeling capabilities while preserving the relative position relationships of both visual and text tokens. The second approach involves enhancing the Attention Mask with the Frame-wise Block Causal Attention Mask, a simple yet effective method that broadens visual token interactions within and across video frames while maintaining the causal inference mechanism. Based on these proposed methods, we adapt LLaVA for video understanding tasks, naming it Temporal-Considered LLaVA (TC-LLaVA). Our TC-LLaVA achieves new state-of-the-art performance across various video understanding benchmarks with only supervised fine-tuning (SFT) on video-related datasets.
- Published
- 2024
5. CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
- Author
-
Zeng, Rui, Chen, Xi, Pu, Yuwen, Zhang, Xuhong, Du, Tianyu, and Ji, Shouling
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Backdoors can be injected into NLP models to induce misbehavior when the input text contains a specific feature, known as a trigger, which the attacker secretly selects. Unlike fixed words, phrases, or sentences used in the static text trigger, NLP dynamic backdoor attacks design triggers associated with abstract and latent text features, making them considerably stealthier than traditional static backdoor attacks. However, existing research on NLP backdoor detection primarily focuses on defending against static backdoor attacks, while detecting dynamic backdoors in NLP models remains largely unexplored. This paper presents CLIBE, the first framework to detect dynamic backdoors in Transformer-based NLP models. CLIBE injects a "few-shot perturbation" into the suspect Transformer model by crafting optimized weight perturbation in the attention layers to make the perturbed model classify a limited number of reference samples as a target label. Subsequently, CLIBE leverages the generalization ability of this few-shot perturbation to determine whether the original model contains a dynamic backdoor. Extensive evaluation on three advanced NLP dynamic backdoor attacks, two widely-used Transformer frameworks, and four real-world classification tasks strongly validates the effectiveness of CLIBE. We also demonstrate the robustness of CLIBE against various adaptive attacks. Furthermore, we employ CLIBE to scrutinize 49 popular Transformer models on Hugging Face and discover one exhibiting a high probability of containing a dynamic backdoor. We have contacted Hugging Face and provided detailed evidence of this model's backdoor behavior. Moreover, we extend CLIBE to detect backdoor text generation models modified to exhibit toxic behavior. To the best of our knowledge, CLIBE is the first framework capable of detecting backdoors in text generation models without access to trigger input test samples., Comment: To appear in the Network and Distributed System Security (NDSS) Symposium, February, 2025
- Published
- 2024
6. Conan-embedding: General Text Embedding with More and Better Negative Samples
- Author
-
Li, Shiyu, Tang, Yang, Chen, Shizhe, and Chen, Xi
- Subjects
Computer Science - Computation and Language - Abstract
With the growing popularity of RAG, the capabilities of embedding models are gaining increasing attention. Embedding models are primarily trained through contrastive loss learning, with negative examples being a key component. Previous work has proposed various hard negative mining strategies, but these strategies are typically employed as preprocessing steps. In this paper, we propose the conan-embedding model, which maximizes the utilization of more and higher-quality negative examples. Specifically, since the model's ability to handle preprocessed negative examples evolves during training, we propose dynamic hard negative mining method to expose the model to more challenging negative examples throughout the training process. Secondly, contrastive learning requires as many negative examples as possible but is limited by GPU memory constraints. Therefore, we use a Cross-GPU balancing Loss to provide more negative examples for embedding training and balance the batch size across multiple tasks. Moreover, we also discovered that the prompt-response pairs from LLMs can be used for embedding training. Our approach effectively enhances the capabilities of embedding models, currently ranking first on the Chinese leaderboard of Massive text embedding benchmark
- Published
- 2024
7. The Impact of Group Discussion and Formation on Student Performance: An Experience Report in a Large CS1 Course
- Author
-
Wu, Tong, Tang, Xiaohang, Wong, Sam, Chen, Xi, Shaffer, Clifford A., and Chen, Yan
- Subjects
Computer Science - Computers and Society - Abstract
Programming instructors often conduct collaborative learning activities, such as Peer Instruction (PI), to enhance student motivation, engagement, and learning gains. However, the impact of group discussion and formation mechanisms on student performance remains unclear. To investigate this, we conducted an 11-session experiment in a large, in-person CS1 course. We employed both random and expertise-balanced grouping methods to examine the efficacy of different group mechanisms and the impact of expert students' presence on collaborative learning. Our observations revealed complex dynamics within the collaborative learning environment. Among 255 groups, 146 actively engaged in discussions, with 96 of these groups demonstrating improvement for poor-performing students. Interestingly, our analysis revealed that different grouping methods (expertise-balanced or random) did not significantly influence discussion engagement or poor-performing students' improvement. In our deeper qualitative analysis, we found that struggling students often derived benefits from interactions with expert peers, but this positive effect was not consistent across all groups. We identified challenges that expert students face in peer instruction interactions, highlighting the complexity of leveraging expertise within group discussions.
- Published
- 2024
8. A Low-dose CT Reconstruction Network Based on TV-regularized OSEM Algorithm
- Author
-
An, Ran, Zhang, Yinghui, Chen, Xi, Li, Lemeng, Chen, Ke, and Li, Hongwei
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,I.4.5 - Abstract
Low-dose computed tomography (LDCT) offers significant advantages in reducing the potential harm to human bodies. However, reducing the X-ray dose in CT scanning often leads to severe noise and artifacts in the reconstructed images, which might adversely affect diagnosis. By utilizing the expectation maximization (EM) algorithm, statistical priors could be combined with artificial priors to improve LDCT reconstruction quality. However, conventional EM-based regularization methods adopt an alternating solving strategy, i.e. full reconstruction followed by image-regularization, resulting in over-smoothing and slow convergence. In this paper, we propose to integrate TV regularization into the ``M''-step of the EM algorithm, thus achieving effective and efficient regularization. Besides, by employing the Chambolle-Pock (CP) algorithm and the ordered subset (OS) strategy, we propose the OSEM-CP algorithm for LDCT reconstruction, in which both reconstruction and regularization are conducted view-by-view. Furthermore, by unrolling OSEM-CP, we propose an end-to-end reconstruction neural network (NN), named OSEM-CPNN, with remarkable performance and efficiency that achieves high-quality reconstructions in just one full-view iteration. Experiments on different models and datasets demonstrate our methods' outstanding performance compared to traditional and state-of-the-art deep-learning methods., Comment: 11 pages, 8 figures
- Published
- 2024
9. Self-Refined Generative Foundation Models for Wireless Traffic Prediction
- Author
-
Hu, Chengming, Zhou, Hao, Wu, Di, Chen, Xi, Yan, Jun, and Liu, Xue
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
With a broad range of emerging applications in 6G networks, wireless traffic prediction has become a critical component of network management. However, the dynamically shifting distribution of wireless traffic in non-stationary 6G networks presents significant challenges to achieving accurate and stable predictions. Motivated by recent advancements in Generative AI (GAI)-enabled 6G networks, this paper proposes a novel self-refined Large Language Model (LLM) for wireless traffic prediction, namely TrafficLLM, through in-context learning without parameter fine-tuning or model training. The proposed TrafficLLM harnesses the powerful few-shot learning abilities of LLMs to enhance the scalability of traffic prediction in dynamically changing wireless environments. Specifically, our proposed TrafficLLM embraces an LLM to iteratively refine its predictions through a three-step process: traffic prediction, feedback generation, and prediction refinement. Initially, the proposed TrafficLLM conducts traffic predictions using task-specific demonstration prompts. Recognizing that LLMs may generate incorrect predictions on the first attempt, we subsequently incorporate feedback demonstration prompts designed to provide multifaceted and valuable feedback related to these initial predictions. Following this comprehensive feedback, our proposed TrafficLLM introduces refinement demonstration prompts, enabling the same LLM to further refine its predictions and thereby enhance prediction performance. The evaluations on two realistic datasets demonstrate that the proposed TrafficLLM outperforms state-of-the-art methods with performance improvements of 23.17% and 17.09%, respectively.
- Published
- 2024
10. Demonstration of Hardware Efficient Photonic Variational Quantum Algorithm
- Author
-
Agresti, Iris, Paul, Koushik, Schiansky, Peter, Steiner, Simon, Yin, Zhengao, Pentangelo, Ciro, Piacentini, Simone, Crespi, Andrea, Ban, Yue, Ceccarelli, Francesco, Osellame, Roberto, Chen, Xi, and Walther, Philip
- Subjects
Quantum Physics - Abstract
Quantum computing has brought a paradigm change in computer science, where non-classical technologies have promised to outperform their classical counterpart. Such an advantage was only demonstrated for tasks without practical applications, still out of reach for the state-of-art quantum technologies. In this context, a promising strategy to find practical use of quantum computers is to exploit hybrid quantum-classical models, where a quantum device estimates a hard-to-compute quantity, while a classical optimizer trains the parameters of the model. In this work, we demonstrate that single photons and linear optical networks are sufficient for implementing Variational Quantum Algorithms, when the problem specification, or ansatz, is tailored to this specific platform. We show this by a proof-of-principle demonstration of a variational approach to tackle an instance of a factorization task, whose solution is encoded in the ground state of a suitable Hamiltonian. This work which combines Variational Quantum Algorithms with hardware efficient ansatzes for linear-optics networks showcases a promising pathway towards practical applications for photonic quantum platforms.
- Published
- 2024
11. Learning Robust Treatment Rules for Censored Data
- Author
-
Cui, Yifan, Liu, Junyi, Shen, Tao, Qi, Zhengling, and Chen, Xi
- Subjects
Statistics - Methodology ,Mathematics - Statistics Theory ,Statistics - Computation ,Statistics - Machine Learning - Abstract
There is a fast-growing literature on estimating optimal treatment rules directly by maximizing the expected outcome. In biomedical studies and operations applications, censored survival outcome is frequently observed, in which case the restricted mean survival time and survival probability are of great interest. In this paper, we propose two robust criteria for learning optimal treatment rules with censored survival outcomes; the former one targets at an optimal treatment rule maximizing the restricted mean survival time, where the restriction is specified by a given quantile such as median; the latter one targets at an optimal treatment rule maximizing buffered survival probabilities, where the predetermined threshold is adjusted to account the restricted mean survival time. We provide theoretical justifications for the proposed optimal treatment rules and develop a sampling-based difference-of-convex algorithm for learning them. In simulation studies, our estimators show improved performance compared to existing methods. We also demonstrate the proposed method using AIDS clinical trial data.
- Published
- 2024
12. MatterGPT: A Generative Transformer for Multi-Property Inverse Design of Solid-State Materials
- Author
-
Chen, Yan, Wang, Xueru, Deng, Xiaobin, Liu, Yilun, Chen, Xi, Zhang, Yunwei, Wang, Lei, and Xiao, Hang
- Subjects
Condensed Matter - Materials Science ,Physics - Computational Physics - Abstract
Inverse design of solid-state materials with desired properties represents a formidable challenge in materials science. Although recent generative models have demonstrated potential, their adoption has been hindered by limitations such as inefficiency, architectural constraints and restricted open-source availability. The representation of crystal structures using the SLICES (Simplified Line-Input Crystal-Encoding System) notation as a string of characters enables the use of state-of-the-art natural language processing models, such as Transformers, for crystal design. Drawing inspiration from the success of GPT models in generating coherent text, we trained a generative Transformer on the next-token prediction task to generate solid-state materials with targeted properties. We demonstrate MatterGPT's capability to generate de novo crystal structures with targeted single properties, including both lattice-insensitive (formation energy) and lattice-sensitive (band gap) properties. Furthermore, we extend MatterGPT to simultaneously target multiple properties, addressing the complex challenge of multi-objective inverse design of crystals. Our approach showcases high validity, uniqueness, and novelty in generated structures, as well as the ability to generate materials with properties beyond the training data distribution. This work represents a significant step forward in computational materials discovery, offering a powerful and open tool for designing materials with tailored properties for various applications in energy, electronics, and beyond., Comment: 20 pages, 6 figures
- Published
- 2024
13. FAST detection of OH emission in the carbon-rich planetary nebula NGC 7027
- Author
-
Ouyang, Xu-Jia, Zhang, Yong, Zhang, Chuan-Peng, Jiang, Peng, Nakashima, Jun-ichi, Chen, Xi, Qiao, Hai-Hua, Zhang, Xu-Ying, Sun, Hao-Min, Li, Xiao-Hu, and Zijlstra, Albert
- Subjects
Astrophysics - Astrophysics of Galaxies ,Astrophysics - Solar and Stellar Astrophysics - Abstract
We present the first detection of the ground-state OH emission line at 1612 MHz toward the prototypical carbon-rich planetary nebula (PN) NGC 7027, utilizing the newly installed ultra-wideband (UWB) receiver of the Five-hundred-meter Aperture Spherical radio Telescope (FAST). This emission is likely to originate from the interface of the neutral shell and the ionized region. The other three ground-state OH lines at 1665, 1667, and 1721 MHz are observed in absorption and have velocities well matched with that of HCO$^+$ absorption. We infer that the OH absorption is from the outer shell of NGC 7027, although the possibility that they are associated with a foreground cloud cannot be completely ruled out. All the OH lines exhibit a single blue-shifted component with respect to the central star. The formation of OH in carbon-rich environments might be via photodissociation-induced chemical processes. Our observations offer significant constraints for chemical simulations, and they underscore the potent capability of the UWB receiver of FAST to search for nascent PNe., Comment: 17 pages, 3 figures, accepted for publication in ApJ
- Published
- 2024
14. Conformal Trajectory Prediction with Multi-View Data Integration in Cooperative Driving
- Author
-
Chen, Xi, Bhadani, Rahul, and Head, Larry
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Current research on trajectory prediction primarily relies on data collected by onboard sensors of an ego vehicle. With the rapid advancement in connected technologies, such as vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication, valuable information from alternate views becomes accessible via wireless networks. The integration of information from alternative views has the potential to overcome the inherent limitations associated with a single viewpoint, such as occlusions and limited field of view. In this work, we introduce V2INet, a novel trajectory prediction framework designed to model multi-view data by extending existing single-view models. Unlike previous approaches where the multi-view data is manually fused or formulated as a separate training stage, our model supports end-to-end training, enhancing both flexibility and performance. Moreover, the predicted multimodal trajectories are calibrated by a post-hoc conformal prediction module to get valid and efficient confidence regions. We evaluated the entire framework using the real-world V2I dataset V2X-Seq. Our results demonstrate superior performance in terms of Final Displacement Error (FDE) and Miss Rate (MR) using a single GPU. The code is publicly available at: \url{https://github.com/xichennn/V2I_trajectory_prediction}.
- Published
- 2024
15. MSMA: Multi-agent Trajectory Prediction in Connected and Autonomous Vehicle Environment with Multi-source Data Integration
- Author
-
Chen, Xi, Bhadani, Rahul, Sun, Zhanbo, and Head, Larry
- Subjects
Computer Science - Robotics ,Computer Science - Machine Learning - Abstract
The prediction of surrounding vehicle trajectories is crucial for collision-free path planning. In this study, we focus on a scenario where a connected and autonomous vehicle (CAV) serves as the central agent, utilizing both sensors and communication technologies to perceive its surrounding traffics consisting of autonomous vehicles (AVs), connected vehicles (CVs), and human-driven vehicles (HDVs). Our trajectory prediction task is aimed at all the detected surrounding vehicles. To effectively integrate the multi-source data from both sensor and communication technologies, we propose a deep learning framework called MSMA utilizing a cross-attention module for multi-source data fusion. Vector map data is utilized to provide contextual information. The trajectory dataset is collected in CARLA simulator with synthesized data errors introduced. Numerical experiments demonstrate that in a mixed traffic flow scenario, the integration of data from different sources enhances our understanding of the environment. This notably improves trajectory prediction accuracy, particularly in situations with a high CV market penetration rate. The code is available at: https://github.com/xichennn/MSMA.
- Published
- 2024
16. Shortcuts for Adiabatic and Variational Algorithms in Molecular Simulation
- Author
-
Ferreiro-Vélez, Julián, Iriarte-Zendoia, Iñaki, Ban, Yue, and Chen, Xi
- Subjects
Quantum Physics - Abstract
Quantum algorithms are prominent in the pursuit of achieving quantum advantage in various computational tasks. However, addressing challenges, such as limited qubit coherence and high error rate in near-term devices, requires extensive efforts. In this paper, we present a substantial stride in quantum chemistry by integrating shortcuts-to-adiabaticity techniques into adiabatic and variational algorithms for calculating the molecular ground state. Our approach includes the counter-diabatic driving that accelerates adiabatic evolution by mitigating adiabatic errors. Additionally, we introduce the counter-diabatic terms as the adiabatic gauge ansatz for the variational quantum eigensolver, which exhibits favorable convergence properties with a fewer number of parameters, thereby reducing the circuit depth. Our approach achieves comparable accuracy to other established ansatzes, while enhancing the potential for applications in material science, drug discovery, and molecular simulations., Comment: 10 pages, 3 figures
- Published
- 2024
17. Stronger sum uncertainty relations for non-Hermitian operators
- Author
-
Song, Xiao-Feng, Ren, Yi-Fang, Liu, Shuang, Chen, Xi-Hao, and Turek, Yusuf
- Subjects
Quantum Physics ,Physics - Optics - Abstract
Unlike the uncertainty relationships of two arbitrary incompatible observables represented by the product of variances in the past, representing them by the sum of variances is better as it guarantees to be nontrivial for two incompatible operators in some special cases. Although the uncertainty relation is formulated as the sum of variances for unitary operators has been confirmed, its general forms for arbitrary non-Hermitian operators have not been yet investigated in detail. Thus, this study develops four sum uncertainty relations for arbitrary non-Hermitian operators acting on system states by utilizing an appropriate Hilbert-space metric. The compatible forms of our sum inequalities with the conventional quantum mechanics are also provided via $G$-metric formalism. Concrete examples demonstrate the validity of the purposed sum uncertainty relations in both $\mathcal{PT}$-symmetric and $\mathcal{PT}$-broken phases. The proposed methods and results can help the reader to understand in-depth the usefulness of $G$-metric formalism in non-Hermitian quantum mechanics and the sum uncertainty relations of incompatible operators within.
- Published
- 2024
18. Synthetic monopole with half-integer magnetic charge in Bose-Einstein condensates
- Author
-
Chen, Xi-Yu, Jiang, Lijia, Bai, Wen-Kai, Yang, Tao, and Zheng, Jun-Hui
- Subjects
Condensed Matter - Quantum Gases ,Nonlinear Sciences - Pattern Formation and Solitons ,Quantum Physics - Abstract
We propose a scheme to create monopoles with half-integer magnetic charges in a spinful cold atom system. With a minimal monopole in the center, we derive the ground-state single-vortex wave function on the sphere and develop the vortex's kinematic equation in the presence of an external electromagnetic field. The vortex's trajectory is generally depicted by the precession of the system. We further formulate the inter-vortex interaction and build up a theory of multi-vortex dynamics in high-charge monopole systems. We predict the vortices'trajectory in the bi-vortex system and figure out stable vortex (line) patterns in multi-vortex systems. Our study provides deep insights into properties of magnetic monopoles and vortices and paves the way for experimental verification., Comment: 6+2+3 pages, 4+1 figures, 1 table
- Published
- 2024
19. DTFormer: A Transformer-Based Method for Discrete-Time Dynamic Graph Representation Learning
- Author
-
Chen, Xi, Xiong, Yun, Zhang, Siwei, Zhang, Jiawei, Zhang, Yao, Zhou, Shiyang, Wu, Xixi, Zhang, Mingyang, Liu, Tengfei, and Wang, Weiqiang
- Subjects
Computer Science - Machine Learning - Abstract
Discrete-Time Dynamic Graphs (DTDGs), which are prevalent in real-world implementations and notable for their ease of data acquisition, have garnered considerable attention from both academic researchers and industry practitioners. The representation learning of DTDGs has been extensively applied to model the dynamics of temporally changing entities and their evolving connections. Currently, DTDG representation learning predominantly relies on GNN+RNN architectures, which manifest the inherent limitations of both Graph Neural Networks (GNNs) and Recurrent Neural Networks (RNNs). GNNs suffer from the over-smoothing issue as the models architecture goes deeper, while RNNs struggle to capture long-term dependencies effectively. GNN+RNN architectures also grapple with scaling to large graph sizes and long sequences. Additionally, these methods often compute node representations separately and focus solely on individual node characteristics, thereby overlooking the behavior intersections between the two nodes whose link is being predicted, such as instances where the two nodes appear together in the same context or share common neighbors. This paper introduces a novel representation learning method DTFormer for DTDGs, pivoting from the traditional GNN+RNN framework to a Transformer-based architecture. Our approach exploits the attention mechanism to concurrently process topological information within the graph at each timestamp and temporal dynamics of graphs along the timestamps, circumventing the aforementioned fundamental weakness of both GNNs and RNNs. Moreover, we enhance the model's expressive capability by incorporating the intersection relationships among nodes and integrating a multi-patching module. Extensive experiments conducted on six public dynamic graph benchmark datasets confirm our model's efficacy, achieving the SOTA performance., Comment: 11 pages, 3 figures
- Published
- 2024
20. Self-Reasoning Assistant Learning for non-Abelian Gauge Fields Design
- Author
-
Sun, Jinyang, Chen, Xi, Wang, Xiumei, Zhu, Dandan, and Zhou, Xingping
- Subjects
Computer Science - Machine Learning ,Condensed Matter - Mesoscale and Nanoscale Physics ,Computer Science - Artificial Intelligence - Abstract
Non-Abelian braiding has attracted substantial attention because of its pivotal role in describing the exchange behaviour of anyons, in which the input and outcome of non-Abelian braiding are connected by a unitary matrix. Implementing braiding in a classical system can assist the experimental investigation of non-Abelian physics. However, the design of non-Abelian gauge fields faces numerous challenges stemmed from the intricate interplay of group structures, Lie algebra properties, representation theory, topology, and symmetry breaking. The extreme diversity makes it a powerful tool for the study of condensed matter physics. Whereas the widely used artificial intelligence with data-driven approaches has greatly promoted the development of physics, most works are limited on the data-to-data design. Here we propose a self-reasoning assistant learning framework capable of directly generating non-Abelian gauge fields. This framework utilizes the forward diffusion process to capture and reproduce the complex patterns and details inherent in the target distribution through continuous transformation. Then the reverse diffusion process is used to make the generated data closer to the distribution of the original situation. Thus, it owns strong self-reasoning capabilities, allowing to automatically discover the feature representation and capture more subtle relationships from the dataset. Moreover, the self-reasoning eliminates the need for manual feature engineering and simplifies the process of model building. Our framework offers a disruptive paradigm shift to parse complex physical processes, automatically uncovering patterns from massive datasets.
- Published
- 2024
21. LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
- Author
-
Chen, Xi, Zhang, Songyang, Bai, Qibing, Chen, Kai, and Nakamura, Satoshi
- Subjects
Computer Science - Computation and Language - Abstract
We introduces LLaST, a framework for building high-performance Large Language model based Speech-to-text Translation systems. We address the limitations of end-to-end speech translation(E2E ST) models by exploring model architecture design and optimization techniques tailored for LLMs. Our approach includes LLM-based speech translation architecture design, ASR-augmented training, multilingual data augmentation, and dual-LoRA optimization. Our approach demonstrates superior performance on the CoVoST-2 benchmark and showcases exceptional scaling capabilities powered by LLMs. We believe this effective method will serve as a strong baseline for speech translation and provide insights for future improvements of the LLM-based speech translation framework. We release the data, code and models in https://github.com/openaudiolab/LLaST.
- Published
- 2024
22. Gaussian Process Model with Tensorial Inputs and Its Application to the Design of 3D Printed Antennas
- Author
-
Chen, Xi, Sharma, Yashika, Zhang, Hao Helen, Hao, Xin, and Zhou, Qiang
- Subjects
Computer Science - Machine Learning - Abstract
In simulation-based engineering design with time-consuming simulators, Gaussian process (GP) models are widely used as fast emulators to speed up the design optimization process. In its most commonly used form, the input of GP is a simple list of design parameters. With rapid development of additive manufacturing (also known as 3D printing), design inputs with 2D/3D spatial information become prevalent in some applications, for example, neighboring relations between pixels/voxels and material distributions in heterogeneous materials. Such spatial information, vital to 3D printed designs, is hard to incorporate into existing GP models with common kernels such as squared exponential or Mat\'ern. In this work, we propose to embed a generalized distance measure into a GP kernel, offering a novel and convenient technique to incorporate spatial information from freeform 3D printed designs into the GP framework. The proposed method allows complex design problems for 3D printed objects to take advantage of a plethora of tools available from the GP surrogate-based simulation optimization such as designed experiments and GP-based optimizations including Bayesian optimization. We investigate the properties of the proposed method and illustrate its performance by several numerical examples of 3D printed antennas. The dataset is publicly available at: https://github.com/xichennn/GP_dataset.
- Published
- 2024
23. ViLLa: Video Reasoning Segmentation with Large Language Model
- Author
-
Zheng, Rongkun, Qi, Lu, Chen, Xi, Wang, Yi, Wang, Kun, Qiao, Yu, and Zhao, Hengshuang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Although video perception models have made remarkable advancements in recent years, they still heavily rely on explicit text descriptions or pre-defined categories to identify target instances before executing video perception tasks. These models, however, fail to proactively comprehend and reason the user's intentions via textual input. Even though previous works attempt to investigate solutions to incorporate reasoning with image segmentation, they fail to reason with videos due to the video's complexity in object motion. To bridge the gap between image and video, in this work, we propose a new video segmentation task - video reasoning segmentation. The task is designed to output tracklets of segmentation masks given a complex input text query. What's more, to promote research in this unexplored area, we construct a reasoning video segmentation benchmark. Finally, we present ViLLa: Video reasoning segmentation with a Large Language Model, which incorporates the language generation capabilities of multimodal Large Language Models (LLMs) while retaining the capabilities of detecting, segmenting, and tracking multiple instances. We use a temporal-aware context aggregation module to incorporate contextual visual cues to text embeddings and propose a video-frame decoder to build temporal correlations across segmentation tokens. Remarkably, our ViLLa demonstrates capability in handling complex reasoning and referring video segmentation. Also, our model shows impressive ability in different temporal understanding benchmarks. Both quantitative and qualitative experiments show our method effectively unlocks new video reasoning segmentation capabilities for multimodal LLMs. The code and dataset will be available at https://github.com/rkzheng99/ViLLa., Comment: 15 pages,6 figures
- Published
- 2024
24. Composable Generation Strategy Framework Enabled Bidirectional Design on Topological Circuits
- Author
-
Chen, Xi, Sun, Jinyang, Wang, Xiumei, Chen, Maoxin, Lin, Qingyuan, Xia, Minggang, and Zhou, Xingping
- Subjects
Physics - Applied Physics ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Topological insulators show important properties, such as topological phase transitions and topological edge states. Although these properties and phenomena can be simulated by well-designed circuits, it is undoubtedly difficult to design such topological circuits due to the complex physical principles and calculations involved. Therefore, achieving a framework that can automatically to complete bidirectional design of topology circuits is very significant. Here, we propose an effective bidirectional collaborative design framework with strong task adaptability, which can automatically generate specific results according to our requirements. In the framework, a composable generation strategy is employed, which involves building a shared multimodal space by bridging alignment in the diffusion process. For simplicity, a series of two-dimensional (2D) Su-Schrieffer-Heeger (SSH) circuits are constructed with different structural parameters. The framework at first is applied to find the relationship between the structural information and topological features. Then, the correctness of the results through experimental measurements can be verified by the automatically generated circuit diagram following the manufacture of Printed Circuit Board (PCB). The framework is demonstrated by achieving good results in the reverse design of circuit structures and forward prediction of topological edge states, reaching an accuracy of 94%. Overall, our research demonstrates the enormous potential of the proposed bidirectional deep learning framework in complex tasks and provides insights for collaborative design tasks.
- Published
- 2024
25. LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
- Author
-
Zhu, Mingkang, Chen, Xi, Wang, Zhongdao, Zhao, Hengshuang, and Jia, Jiaya
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent advances in text-to-image model customization have underscored the importance of integrating new concepts with a few examples. Yet, these progresses are largely confined to widely recognized subjects, which can be learned with relative ease through models' adequate shared prior knowledge. In contrast, logos, characterized by unique patterns and textual elements, are hard to establish shared knowledge within diffusion models, thus presenting a unique challenge. To bridge this gap, we introduce the task of logo insertion. Our goal is to insert logo identities into diffusion models and enable their seamless synthesis in varied contexts. We present a novel two-phase pipeline LogoSticker to tackle this task. First, we propose the actor-critic relation pre-training algorithm, which addresses the nontrivial gaps in models' understanding of the potential spatial positioning of logos and interactions with other objects. Second, we propose a decoupled identity learning algorithm, which enables precise localization and identity extraction of logos. LogoSticker can generate logos accurately and harmoniously in diverse contexts. We comprehensively validate the effectiveness of LogoSticker over customization methods and large models such as DALLE~3. \href{https://mingkangz.github.io/logosticker}{Project page}., Comment: ECCV2024
- Published
- 2024
26. On the global complexity of a derivative-free Levenberg-Marquardt algorithm via orthogonal spherical smoothing
- Author
-
Chen, Xi and Fan, Jinyan
- Subjects
Mathematics - Numerical Analysis ,Mathematics - Optimization and Control - Abstract
In this paper, we propose a derivative-free Levenberg-Marquardt algorithm for nonlinear least squares problems, where the Jacobian matrices are approximated via orthogonal spherical smoothing. It is shown that the gradient models which use the approximate Jacobian matrices are probabilistically first-order accurate, and the high probability complexity bound of the algorithm is also given.
- Published
- 2024
27. Pulse-based variational quantum optimization and metalearning in superconducting circuits
- Author
-
Wang, Yapeng, Ding, Yongcheng, Cárdenas-López, Francisco Andrés, and Chen, Xi
- Subjects
Quantum Physics - Abstract
Solving optimization problems using variational algorithms stands out as a crucial application for noisy intermediate-scale devices. Instead of constructing gate-based quantum computers, our focus centers on designing variational quantum algorithms within the analog paradigm. This involves optimizing parameters that directly control pulses, driving quantum states towards target states without the necessity of compiling a quantum circuit. In this work, we introduce pulse-based variational quantum optimization (PBVQO) as a hardware-level framework. We illustrate the framework by optimizing external fluxes on superconducting quantum interference devices, effectively driving the wave function of this specific quantum architecture to the ground state of an encoded problem Hamiltonian. Given that the performance of variational algorithms heavily relies on appropriate initial parameters, we introduce a global optimizer as a meta-learning technique to tackle a simple problem. The synergy between PBVQO and meta-learning provides an advantage over conventional gate-based variational algorithms., Comment: 9 pages, 4 figures
- Published
- 2024
28. Trace reconstruction from local statistical queries
- Author
-
Chen, Xi, De, Anindya, Lee, Chin Ho, and Servedio, Rocco A.
- Subjects
Computer Science - Data Structures and Algorithms - Abstract
The goal of trace reconstruction is to reconstruct an unknown $n$-bit string $x$ given only independent random traces of $x$, where a random trace of $x$ is obtained by passing $x$ through a deletion channel. A Statistical Query (SQ) algorithm for trace reconstruction is an algorithm which can only access statistical information about the distribution of random traces of $x$ rather than individual traces themselves. Such an algorithm is said to be $\ell$-local if each of its statistical queries corresponds to an $\ell$-junta function over some block of $\ell$ consecutive bits in the trace. Since several -- but not all -- known algorithms for trace reconstruction fall under the local statistical query paradigm, it is interesting to understand the abilities and limitations of local SQ algorithms for trace reconstruction. In this paper we establish nearly-matching upper and lower bounds on local Statistical Query algorithms for both worst-case and average-case trace reconstruction. For the worst-case problem, we show that there is an $\tilde{O}(n^{1/5})$-local SQ algorithm that makes all its queries with tolerance $\tau \geq 2^{-\tilde{O}(n^{1/5})}$, and also that any $\tilde{O}(n^{1/5})$-local SQ algorithm must make some query with tolerance $\tau \leq 2^{-\tilde{\Omega}(n^{1/5})}$. For the average-case problem, we show that there is an $O(\log n)$-local SQ algorithm that makes all its queries with tolerance $\tau \geq 1/\mathrm{poly}(n)$, and also that any $O(\log n)$-local SQ algorithm must make some query with tolerance $\tau \leq 1/\mathrm{poly}(n).$, Comment: RANDOM 2024
- Published
- 2024
29. Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce
- Author
-
Lin, Zhe, Tan, Jiwei, Ou, Dan, Chen, Xi, Yao, Shaowei, and Zheng, Bo
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these models perform well on the offline test dataset, there are still obstacles to deploy the pre-trained language model to the online system as their high latency. The two-tower model is extensively employed in industrial scenarios, owing to its ability to harmonize performance with computational efficiency. Regrettably, such models present an opaque ``black box'' nature, which prevents developers from making special optimizations. In this paper, we raise deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce. Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs. The weight means the important or the relevant score between the corresponding word and the raw text. The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product. Compared to popular dense distributed representation that usually suffers from the drawback of black-box, the most advantage of the proposed representation model is highly explainable and interventionable, which is a superior advantage to the deployment and operation of online search engines. Moreover, the online efficiency of the proposed model is even better than the most efficient inner product form of dense representation ..., Comment: KDD'24 accepted paper
- Published
- 2024
- Full Text
- View/download PDF
30. PaliGemma: A versatile 3B VLM for transfer
- Author
-
Beyer, Lucas, Steiner, Andreas, Pinto, André Susano, Kolesnikov, Alexander, Wang, Xiao, Salz, Daniel, Neumann, Maxim, Alabdulmohsin, Ibrahim, Tschannen, Michael, Bugliarello, Emanuele, Unterthiner, Thomas, Keysers, Daniel, Koppula, Skanda, Liu, Fangyu, Grycner, Adam, Gritsenko, Alexey, Houlsby, Neil, Kumar, Manoj, Rong, Keran, Eisenschlos, Julian, Kabra, Rishabh, Bauer, Matthias, Bošnjak, Matko, Chen, Xi, Minderer, Matthias, Voigtlaender, Paul, Bica, Ioana, Balazevic, Ivana, Puigcerver, Joan, Papalampidi, Pinelopi, Henaff, Olivier, Xiong, Xi, Soricut, Radu, Harmsen, Jeremiah, and Zhai, Xiaohua
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more specialized tasks such as remote-sensing and segmentation.
- Published
- 2024
31. A Re-solving Heuristic for Dynamic Assortment Optimization with Knapsack Constraints
- Author
-
Chen, Xi, Liu, Mo, Wang, Yining, and Zhou, Yuan
- Subjects
Mathematics - Optimization and Control ,Statistics - Machine Learning - Abstract
In this paper, we consider a multi-stage dynamic assortment optimization problem with multi-nomial choice modeling (MNL) under resource knapsack constraints. Given the current resource inventory levels, the retailer makes an assortment decision at each period, and the goal of the retailer is to maximize the total profit from purchases. With the exact optimal dynamic assortment solution being computationally intractable, a practical strategy is to adopt the re-solving technique that periodically re-optimizes deterministic linear programs (LP) arising from fluid approximation. However, the fractional structure of MNL makes the fluid approximation in assortment optimization highly non-linear, which brings new technical challenges. To address this challenge, we propose a new epoch-based re-solving algorithm that effectively transforms the denominator of the objective into the constraint. Theoretically, we prove that the regret (i.e., the gap between the resolving policy and the optimal objective of the fluid approximation) scales logarithmically with the length of time horizon and resource capacities.
- Published
- 2024
32. 2.4-THz Bandwidth Optical Coherent Receiver Based on a Photonic Crystal Microcomb
- Author
-
Deakin, Callum, Zang, Jizhao, Chen, Xi, Che, Di, Dallachiesa, Lauren, Stern, Brian, Fontaine, Nicolas K., and Papp, Scott
- Subjects
Physics - Optics ,Electrical Engineering and Systems Science - Signal Processing - Abstract
We demonstrate a spectrally-sliced single-polarization optical coherent receiver with a record 2.4-THz bandwidth, using a 200-GHz tantalum pentoxide photonic crystal microring resonator as the local oscillator frequency comb., Comment: 2024 European Conference on Optical Communication (ECOC)
- Published
- 2024
33. To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
- Author
-
Tian, Bozhong, Liang, Xiaozhuan, Cheng, Siyuan, Liu, Qingbin, Wang, Mengru, Sui, Dianbo, Chen, Xi, Chen, Huajun, and Zhang, Ningyu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Multimedia - Abstract
Large Language Models (LLMs) trained on extensive corpora inevitably retain sensitive data, such as personal privacy information and copyrighted material. Recent advancements in knowledge unlearning involve updating LLM parameters to erase specific knowledge. However, current unlearning paradigms are mired in vague forgetting boundaries, often erasing knowledge indiscriminately. In this work, we introduce KnowUnDo, a benchmark containing copyrighted content and user privacy domains to evaluate if the unlearning process inadvertently erases essential knowledge. Our findings indicate that existing unlearning methods often suffer from excessive unlearning. To address this, we propose a simple yet effective method, MemFlex, which utilizes gradient information to precisely target and unlearn sensitive parameters. Experimental results show that MemFlex is superior to existing methods in both precise knowledge unlearning and general knowledge retaining of LLMs. Code and dataset will be released at https://github.com/zjunlp/KnowUnDo., Comment: Work in progress
- Published
- 2024
34. Evaluation of Bias Towards Medical Professionals in Large Language Models
- Author
-
Chen, Xi, Xu, Yang, You, MingKe, Wang, Li, Liu, WeiZhi, and Li, Jian
- Subjects
Computer Science - Computers and Society ,Computer Science - Artificial Intelligence - Abstract
This study evaluates whether large language models (LLMs) exhibit biases towards medical professionals. Fictitious candidate resumes were created to control for identity factors while maintaining consistent qualifications. Three LLMs (GPT-4, Claude-3-haiku, and Mistral-Large) were tested using a standardized prompt to evaluate resumes for specific residency programs. Explicit bias was tested by changing gender and race information, while implicit bias was tested by changing names while hiding race and gender. Physician data from the Association of American Medical Colleges was used to compare with real-world demographics. 900,000 resumes were evaluated. All LLMs exhibited significant gender and racial biases across medical specialties. Gender preferences varied, favoring male candidates in surgery and orthopedics, while preferring females in dermatology, family medicine, obstetrics and gynecology, pediatrics, and psychiatry. Claude-3 and Mistral-Large generally favored Asian candidates, while GPT-4 preferred Black and Hispanic candidates in several specialties. Tests revealed strong preferences towards Hispanic females and Asian males in various specialties. Compared to real-world data, LLMs consistently chose higher proportions of female and underrepresented racial candidates than their actual representation in the medical workforce. GPT-4, Claude-3, and Mistral-Large showed significant gender and racial biases when evaluating medical professionals for residency selection. These findings highlight the potential for LLMs to perpetuate biases and compromise healthcare workforce diversity if used without proper bias mitigation strategies., Comment: 36 pages, 6 figures
- Published
- 2024
35. Non-modal growth analysis of high-speed flows over an inclined cone
- Author
-
Chen, Xi, Wan, Bingbing, Tu, Guohua, Duan, Maochang, Li, Xiaohu, and Chen, Jianqiang
- Subjects
Physics - Fluid Dynamics - Abstract
Spatial optimal responses to both inlet disturbances and harmonic external forcing for hypersonic flows over a blunt cone at nonzero angles of attack are obtained by efficiently solving the direct-adjoint equations with a parabolic approach. In either case, the most amplified disturbances initially take the form of localized streamwise vortices on the windward side and will undergo a two-stage evolution process when propagating downstream: they first experience a substantial algebraic growth by exploiting the Orr and lift-up mechanisms, and then smoothly transition to a quasi exponential-growth stage driven by the crossflow-instability mechanism, accompanied by an azimuthal advection of the disturbance structure towards the leeward side. The algebraic-growth phase is most receptive to the external forcing, whereas the exponential-growth stage relies on the disturbance frequency and can be significantly strengthened by increasing the angle of attack. The wavemaker delineating the structural sensitivity region for the optimal gain is shown to lie on the windward side immediately downstream of the inlet, implying a potent control strategy. Additionally, considerable non-modal growth is also observed for broadband high-frequency disturbances residing in the entropy layer.
- Published
- 2024
36. Bounded asymptotics for high-order moments in wall turbulence
- Author
-
Chen, Xi and Sreenivasan, Katepalli R.
- Subjects
Physics - Fluid Dynamics - Abstract
Turbulent wall-flows are the most important means for understanding the effects of boundary conditions and fluid viscosity on turbulent fluctuations. There has been considerable recent research on mean-square fluctuations. Here, we present expressions for high-order moments of streamwise velocity fluctuation $u$, in the form $ \langle u^{+2q} \rangle^{1/q}=\alpha_q-\beta_q y^{\ast1/4}$; $q$ is an integer, $\alpha_q$ and $\beta_q$ are constants independent of the friction Reynolds number $Re_\tau$, and $y^{\ast} = y/\delta$ is the distance away from the wall, normalized by the flow thickness $\delta$; in particular, $\alpha_q =\mu+\sigma q$ according to the `linear q-norm Gaussian' process, where $\mu$ and $\sigma$ are flow-independent constants. Excellent agreement is found between these formulae and available data in pipes, channels and boundary layers for $1 \leq q \leq 5$. For fixed $y^+ = y^*Re_\tau$, the present {formulation} leads to the bounded state $\langle u^{+2q} \rangle^{1/q}=\alpha_q$ as $Re_\tau\rightarrow\infty$ while the attached eddy model predicts that {the} moments continually grow as log Reynolds number., Comment: 5 pages, 4 figures
- Published
- 2024
37. Towards LLM-Powered Ambient Sensor Based Multi-Person Human Activity Recognition
- Author
-
Chen, Xi, Cumin, Julien, Ramparany, Fano, and Vaufreydaz, Dominique
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Human Activity Recognition (HAR) is one of the central problems in fields such as healthcare, elderly care, and security at home. However, traditional HAR approaches face challenges including data scarcity, difficulties in model generalization, and the complexity of recognizing activities in multi-person scenarios. This paper proposes a system framework called LAHAR, based on large language models. Utilizing prompt engineering techniques, LAHAR addresses HAR in multi-person scenarios by enabling subject separation and action-level descriptions of events occurring in the environment. We validated our approach on the ARAS dataset, and the results demonstrate that LAHAR achieves comparable accuracy to the state-of-the-art method at higher resolutions and maintains robustness in multi-person scenarios.
- Published
- 2024
38. Shortcut to adiabaticity improvement of STIRAP based qubit rotation
- Author
-
Black, Khayla, Chen, Xi, and Byrnes, Tim
- Subjects
Quantum Physics - Abstract
Robust quantum control is essential for the development of quantum computers, which rely on precise manipulation of qubits. One form of quantum control is stimulated Raman adiabatic passage (STIRAP), which ordinarily is a state transfer protocol but was extended by Kis and Renzoni (Phys. Rev. A 65, 032318 (2002)) to perform qubit rotations. Shortcut methods to adiabaticity for STIRAP have been shown to speed up adiabatic processes, beyond the adiabatic criterion, with high fidelity. Here, we apply shortcut to adiabaticity methods to the STIRAP qubit rotation scheme to improve the performance of quantum logic gates. The scheme can be implemented via direct connections between ground states in a 4-level $\Lambda$ system or effective connections in a 5-level $\Lambda$ system with modified pulses that implement transitionless quantum driving via the addition of a counterdiabatic driving term. We show that the extended shortcut to adiabaticity method serves to improve the fidelity of qubit rotations in the diabatic regime., Comment: 11 pages, 6 figures
- Published
- 2024
39. An Active Search Strategy with Multiple Unmanned Aerial Systems for Multiple Targets
- Author
-
Gao, Chuanxiang, Wang, Xinyi, Chen, Xi, and Chen, Ben M.
- Subjects
Computer Science - Robotics - Abstract
The challenge of efficient target searching in vast natural environments has driven the need for advanced multi-UAV active search strategies. This paper introduces a novel method in which global and local information is adeptly merged to avoid issues such as myopia and redundant back-and-forth movements. In addition, a trajectory generation method is used to ensure the search pattern within continuous space. To further optimize multi-agent cooperation, the Voronoi partition technique is employed, ensuring a reduction in repetitive flight patterns and making the control of multiple agents in a decentralized way. Through a series of experiments, the evaluation and comparison results demonstrate the efficiency of our approach in various environments. The primary application of this innovative approach is demonstrated in the search for horseshoe crabs within their wild habitats, showcasing its potential to revolutionize ecological survey and conservation efforts.
- Published
- 2024
40. Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
- Author
-
Liu, Deyuan, Qin, Zhanyue, Wang, Hairu, Yang, Zhao, Wang, Zecheng, Rong, Fangying, Liu, Qingbin, Hao, Yanchao, Chen, Xi, Fan, Cunhang, Lv, Zhao, Tu, Zhiying, Chu, Dianhui, Li, Bo, and Sui, Dianbo
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
While large language models (LLMs) excel in many domains, their complexity and scale challenge deployment in resource-limited environments. Current compression techniques, such as parameter pruning, often fail to effectively utilize the knowledge from pruned parameters. To address these challenges, we propose Manifold-Based Knowledge Alignment and Layer Merging Compression (MKA), a novel approach that uses manifold learning and the Normalized Pairwise Information Bottleneck (NPIB) measure to merge similar layers, reducing model size while preserving essential performance. We evaluate MKA on multiple benchmark datasets and various LLMs. Our findings show that MKA not only preserves model performance but also achieves substantial compression ratios, outperforming traditional pruning methods. Moreover, when coupled with quantization, MKA delivers even greater compression. Specifically, on the MMLU dataset using the Llama3-8B model, MKA achieves a compression ratio of 43.75% with a minimal performance decrease of only 2.82\%. The proposed MKA method offers a resource-efficient and performance-preserving model compression technique for LLMs.
- Published
- 2024
41. Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking
- Author
-
Chen, Xi, Qin, Chuan, Fang, Chuyu, Wang, Chao, Zhu, Chen, Zhuang, Fuzhen, Zhu, Hengshu, and Xiong, Hui
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
In a rapidly evolving job market, skill demand forecasting is crucial as it enables policymakers and businesses to anticipate and adapt to changes, ensuring that workforce skills align with market needs, thereby enhancing productivity and competitiveness. Additionally, by identifying emerging skill requirements, it directs individuals towards relevant training and education opportunities, promoting continuous self-learning and development. However, the absence of comprehensive datasets presents a significant challenge, impeding research and the advancement of this field. To bridge this gap, we present Job-SDF, a dataset designed to train and benchmark job-skill demand forecasting models. Based on 10.35 million public job advertisements collected from major online recruitment platforms in China between 2021 and 2023, this dataset encompasses monthly recruitment demand for 2,324 types of skills across 521 companies. Our dataset uniquely enables evaluating skill demand forecasting models at various granularities, including occupation, company, and regional levels. We benchmark a range of models on this dataset, evaluating their performance in standard scenarios, in predictions focused on lower value ranges, and in the presence of structural breaks, providing new insights for further research. Our code and dataset are publicly accessible via the https://github.com/Job-SDF/benchmark.
- Published
- 2024
42. Towards Adaptive Neighborhood for Advancing Temporal Interaction Graph Modeling
- Author
-
Zhang, Siwei, Chen, Xi, Xiong, Yun, Wu, Xixi, Zhang, Yao, Fu, Yongrui, Zhao, Yinglong, and Zhang, Jiawei
- Subjects
Computer Science - Social and Information Networks ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Temporal Graph Networks (TGNs) have demonstrated their remarkable performance in modeling temporal interaction graphs. These works can generate temporal node representations by encoding the surrounding neighborhoods for the target node. However, an inherent limitation of existing TGNs is their reliance on fixed, hand-crafted rules for neighborhood encoding, overlooking the necessity for an adaptive and learnable neighborhood that can accommodate both personalization and temporal evolution across different timestamps. In this paper, we aim to enhance existing TGNs by introducing an adaptive neighborhood encoding mechanism. We present SEAN, a flexible plug-and-play model that can be seamlessly integrated with existing TGNs, effectively boosting their performance. To achieve this, we decompose the adaptive neighborhood encoding process into two phases: (i) representative neighbor selection, and (ii) temporal-aware neighborhood information aggregation. Specifically, we propose the Representative Neighbor Selector component, which automatically pinpoints the most important neighbors for the target node. It offers a tailored understanding of each node's unique surrounding context, facilitating personalization. Subsequently, we propose a Temporal-aware Aggregator, which synthesizes neighborhood aggregation by selectively determining the utilization of aggregation routes and decaying the outdated information, allowing our model to adaptively leverage both the contextually significant and current information during aggregation. We conduct extensive experiments by integrating SEAN into three representative TGNs, evaluating their performance on four public datasets and one financial benchmark dataset introduced in this paper. The results demonstrate that SEAN consistently leads to performance improvements across all models, achieving SOTA performance and exceptional robustness., Comment: KDD'2024 Research Track Paper
- Published
- 2024
43. Anomalous Enhancement of the Electrocatalytic Hydrogen Evolution Reaction in AuPt Nanoclusters
- Author
-
Kang, Jiahui, Kloppenburg, Jan, Sheng, Jiali, Xu, Zhenyu, Meinander, Kristoffer, Jiang, Hua, Lv, Zhong-Peng, Kauppinen, Esko I., Zhang, Qiang, Chen, Xi, Ikkala, Olli, Caro, Miguel A., and Peng, Bo
- Subjects
Physics - Chemical Physics - Abstract
Energy- and resource-efficient electrocatalytic water splitting is of paramount importance to enable sustainable hydrogen production. The best bulk catalyst for the hydrogen evolution reaction (HER), i.e., platinum, is one of the scarcest elements on Earth. The use of raw material for HER can be dramatically reduced by utilizing nanoclusters. In addition, nanoalloying can further improve the performance of these nanoclusters. In this paper, we present results for HER on nanometer-sized ligand-free AuPt nanoclusters grafted on carbon nanotubes. These results demonstrate excellent monodispersity and a significant reduction of the overpotential for the electrocatalytic HER. We utilize atomistic machine learning techniques to elucidate the atomic-scale origin of the synergistic effect between Pt and Au. We show that the presence of surface Au atoms, known to be poor HER catalysts, in a Pt(core)/AuPt(shell) nanocluster structure, drives an anomalous enhancement of the inherently high catalytic activity of Pt atoms.
- Published
- 2024
44. Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver
- Author
-
Chen, Hegan, Yang, Jichang, Chen, Jia, Wang, Songqi, Wang, Shaocong, Wang, Dingchen, Tian, Xinyu, Yu, Yifei, Chen, Xi, Lin, Yinan, He, Yangu, Wu, Xiaoshan, Li, Yi, Zhang, Xinyuan, Lin, Ning, Xu, Meng, Zhang, Xumeng, Wang, Zhongrui, Wang, Han, Shang, Dashan, Liu, Qi, Cheng, Kwang-Ting, and Liu, Ming
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Artificial Intelligence ,Computer Science - Emerging Technologies ,Computer Science - Neural and Evolutionary Computing - Abstract
Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for developing digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underlying continuous dynamics and struggles with modelling complex system behaviour. Additionally, the architecture of digital computers, with separate storage and processing units, necessitates frequent data transfers and Analogue-Digital (A/D) conversion, thereby significantly increasing both time and energy costs. Here, we introduce a memristive neural ordinary differential equation (ODE) solver for digital twins, which is capable of capturing continuous-time dynamics and facilitates the modelling of complex systems using an infinite-depth model. By integrating storage and computation within analogue memristor arrays, we circumvent the von Neumann bottleneck, thus enhancing both speed and energy efficiency. We experimentally validate our approach by developing a digital twin of the HP memristor, which accurately extrapolates its nonlinear dynamics, achieving a 4.2-fold projected speedup and a 41.4-fold projected decrease in energy consumption compared to state-of-the-art digital hardware, while maintaining an acceptable error margin. Additionally, we demonstrate scalability through experimentally grounded simulations of Lorenz96 dynamics, exhibiting projected performance improvements of 12.6-fold in speed and 189.7-fold in energy efficiency relative to traditional digital approaches. By harnessing the capabilities of fully analogue computing, our breakthrough accelerates the development of digital twins, offering an efficient and rapid solution to meet the demands of Industry 4.0., Comment: 14 pages, 4 figures
- Published
- 2024
45. Massive 1D Dirac Line, Solitons and Reversible Manipulation on the Surface of a Prototype Obstructed Atomic Insulator, Silicon
- Author
-
Liu, Zhongkai, Deng, Peng, Xu, Yuanfeng, Yang, Haifeng, Pei, Ding, Chen, Cheng, He, Shanmei, Liu, Defa, Mo, Sung-Kwan, Kim, Timur, Cacho, Cephise, Yao, Hong, Song, Zhi-Da, Chen, Xi, Wang, Zhong, Yan, Binghai, Yang, Lexian, Bernevig, Bogdan A., and Chen, Yulin
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Superconductivity - Abstract
Topologically trivial insulators can be classified into atomic insulators (AIs) and obstructed atomic insulators (OAIs) depending on whether the Wannier charge centers are localized or not at spatial positions occupied by atoms. An OAI can possess unusual properties such as surface states along certain crystalline surfaces, which advantageously appear in materials with much larger bulk energy gap than topological insulators, making them more attractive for potential applications. In this work, we show that a well-known crystal, silicon (Si) is a model OAI, which naturally explains some of Si's unusual properties such as its famous (111) surface states. On this surface, using angle resolved photoemission spectroscopy (ARPES), we reveal sharp quasi-1D massive Dirac line dispersions; we also observe, using scanning tunneling microscopy/spectroscopy (STM/STS), topological solitons at the interface of the two atomic chains. Remarkably, we show that the different chain domains can be reversibly switched at the nanometer scale, suggesting the application potential in ultra-high density storage devices.
- Published
- 2024
46. Zero-shot Image Editing with Reference Imitation
- Author
-
Chen, Xi, Feng, Yutong, Chen, Mengting, Wang, Yiyang, Zhang, Shilong, Liu, Yu, Shen, Yujun, and Zhao, Hengshuang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Image editing serves as a practical yet challenging task considering the diverse demands from users, where one of the hardest parts is to precisely describe how the edited image should look like. In this work, we present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently. Concretely, to edit an image region of interest, users are free to directly draw inspiration from some in-the-wild references (e.g., some relative pictures come across online), without having to cope with the fit between the reference and the source. Such a design requires the system to automatically figure out what to expect from the reference to perform the editing. For this purpose, we propose a generative training framework, dubbed MimicBrush, which randomly selects two frames from a video clip, masks some regions of one frame, and learns to recover the masked regions using the information from the other frame. That way, our model, developed from a diffusion prior, is able to capture the semantic correspondence between separate images in a self-supervised manner. We experimentally show the effectiveness of our method under various test cases as well as its superiority over existing alternatives. We also construct a benchmark to facilitate further research., Comment: https://xavierchen34.github.io/MimicBrush-Page
- Published
- 2024
47. Probing the distinct extinction law of the Pillars of Creation in M16 with JWST
- Author
-
Li, Jun, Chen, Bingqiu, Jiang, Biwei, Gao, Jian, and Chen, Xi
- Subjects
Astrophysics - Astrophysics of Galaxies ,Astrophysics - Solar and Stellar Astrophysics - Abstract
Investigating the extinction law in regions of high dust extinction, such as the Pillars of Creation within the M16 region, is crucial for understanding the densest parts of the interstellar medium (ISM). In this study, we utilize observations from the Near-Infrared Camera (NIRCam) and the Mid-Infrared Instrument (MIRI) onboard the James Webb Space Telescope (JWST) to analyze the color-excess ratios $E(F090W-\lambda)/E(F090W-F200W)$ across a wavelength range of $0.9-7.7\,\mu\mathrm{m}$. Our method involves performing linear regression on color-color diagrams to derive these ratios. The enhanced detection capabilities of JWST data allow us to probe the distinct extinction law to the densest regions in M16 corresponding to an extinction depth up to $A_V \sim 60$\,mag. Remarkably, the resultant color-excess ratio curve exhibits a flatter profile than predicted by typical dust extinction models with $R_V = 5.5$ for dense ISM environments. Moreover, we observe that the mid-infrared (MIR) extinction law diverges from the near-infrared (NIR) power-law, showing a tendency for the slope to flatten as the wavelength increases. These findings have significant implications for our understanding of the dust properties in dense interstellar environments., Comment: Accepted for publication in The Astrophysical Journal Letters (9 pages, 4 figures, 2 tables)
- Published
- 2024
48. Phy-Diff: Physics-guided Hourglass Diffusion Model for Diffusion MRI Synthesis
- Author
-
Zhang, Juanhua, Yan, Ruodan, Perelli, Alessandro, Chen, Xi, and Li, Chao
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion MRI (dMRI) is an important neuroimaging technique with high acquisition costs. Deep learning approaches have been used to enhance dMRI and predict diffusion biomarkers through undersampled dMRI. To generate more comprehensive raw dMRI, generative adversarial network based methods are proposed to include b-values and b-vectors as conditions, but they are limited by unstable training and less desirable diversity. The emerging diffusion model (DM) promises to improve generative performance. However, it remains challenging to include essential information in conditioning DM for more relevant generation, i.e., the physical principles of dMRI and white matter tract structures. In this study, we propose a physics-guided diffusion model to generate high-quality dMRI. Our model introduces the physical principles of dMRI in the noise evolution in the diffusion process and introduce a query-based conditional mapping within the difussion model. In addition, to enhance the anatomical fine detials of the generation, we introduce the XTRACT atlas as prior of white matter tracts by adopting an adapter technique. Our experiment results show that our method outperforms other state-of-the-art methods and has the potential to advance dMRI enhancement., Comment: Accepted by MICCAI 2024
- Published
- 2024
49. Untrained Neural Nets for Snapshot Compressive Imaging: Theory and Algorithms
- Author
-
Zhao, Mengyu, Chen, Xi, Yuan, Xin, and Jalali, Shirin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Information Theory - Abstract
Snapshot compressive imaging (SCI) recovers high-dimensional (3D) data cubes from a single 2D measurement, enabling diverse applications like video and hyperspectral imaging to go beyond standard techniques in terms of acquisition speed and efficiency. In this paper, we focus on SCI recovery algorithms that employ untrained neural networks (UNNs), such as deep image prior (DIP), to model source structure. Such UNN-based methods are appealing as they have the potential of avoiding the computationally intensive retraining required for different source models and different measurement scenarios. We first develop a theoretical framework for characterizing the performance of such UNN-based methods. The theoretical framework, on the one hand, enables us to optimize the parameters of data-modulating masks, and on the other hand, provides a fundamental connection between the number of data frames that can be recovered from a single measurement to the parameters of the untrained NN. We also employ the recently proposed bagged-deep-image-prior (bagged-DIP) idea to develop SCI Bagged Deep Video Prior (SCI-BDVP) algorithms that address the common challenges faced by standard UNN solutions. Our experimental results show that in video SCI our proposed solution achieves state-of-the-art among UNN methods, and in the case of noisy measurements, it even outperforms supervised solutions.
- Published
- 2024
50. It Takes Two: A Peer-Prediction Solution for Blockchain Verifier's Dilemma
- Author
-
Zhao, Zishuo, Chen, Xi, and Zhou, Yuan
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Computer Science and Game Theory - Abstract
The security of blockchain systems is fundamentally based on the decentralized consensus in which the majority of parties behave honestly, and the process of content verification is essential to keep the robustness of blockchain systems. However, the phenomenon that a secure blockchain system with few or no cheaters could not provide sufficient incentive for verifiers to honestly perform the costly verification, referred to as the Verifier's Dilemma, could severely undermine the fundamental security of blockchain systems. While existing works have attempted to insert deliberate errors to disincentivize lazy verification, the decentralized environment makes it impossible to judge the correctness of verification or detect malicious verifiers directly. In this paper, we initiate the research that leverages the peer prediction approach towards the design of Bayesian truthful mechanisms for the decentralized verification game among multiple verifiers, incentivizing all verifiers to perform honest verification without access to the ground truth even in the presence of noisy observations in the verification process. With theoretically guaranteed truthfulness of our mechanism for the verification game, our work provides a framework of verification mechanisms that enhances the security and robustness of the blockchain and potentially other decentralized systems., Comment: 10 pages, 1 figure
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.