Author: "Chen XU" / Publication Type: Reports - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Chen XU"' showing total 442 results

Start Over Author "Chen XU" Publication Type Reports

442 results on '"Chen XU"'

1. HiFiVFS: High Fidelity Video Face Swapping

Author: Chen, Xu, He, Keke, Zhu, Junwei, Ge, Yanhao, Li, Wei, and Wang, Chengjie
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Face swapping aims to generate results that combine the identity from the source with attributes from the target. Existing methods primarily focus on image-based face swapping. When processing videos, each frame is handled independently, making it difficult to ensure temporal stability. From a model perspective, face swapping is gradually shifting from generative adversarial networks (GANs) to diffusion models (DMs), as DMs have been shown to possess stronger generative capabilities. Current diffusion-based approaches often employ inpainting techniques, which struggle to preserve fine-grained attributes like lighting and makeup. To address these challenges, we propose a high fidelity video face swapping (HiFiVFS) framework, which leverages the strong generative capability and temporal prior of Stable Video Diffusion (SVD). We build a fine-grained attribute module to extract identity-disentangled and fine-grained attribute features through identity desensitization and adversarial learning. Additionally, We introduce detailed identity injection to further enhance identity similarity. Extensive experiments demonstrate that our method achieves state-of-the-art (SOTA) in video face swapping, both qualitatively and quantitatively.
Published: 2024

2. Fast High-Quality Enhanced Imaging Algorithm for Layered Dielectric Targets Based on MMW MIMO-SAR System

Author: Chen, Xu, Yu, Guangsheng, Yuan, Zhian, Wu, Hao, Jiang, Yilin, Wang, Ying, Deng, Bin, and Guo, Limin
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: Millimeter-wave (MMW) multiple-input multiple-output synthetic aperture radar (MIMO-SAR) system is a technology that can achieve high resolution, high frame rate, and all-weather imaging and has received extensive attention in the non-destructive testing and internal imaging applications of layered dielectric targets. However, the non-ideal scattering effect caused by dielectric materials can significantly deteriorate the imaging quality when using the existing MIMO-SAR fast algorithms. This paper proposes a rapid, high-quality dielectric target-enhanced imaging algorithm for a new universal non-uniform MIMO-SAR system. The algorithm builds on the existing non-uniform MIMO-SAR dielectric target frequency-domain algorithm (DT-FDA) by constructing a forward sensing operator and incorporating it into the alternating direction method of multipliers (ADMM) framework. This approach avoids large matrix operations while maintaining computational efficiency. By integrating an optimal regularization parameter search, the algorithm enhances the image reconstruction quality of dielectric internal structures or defects. Experimental results show the proposed algorithm outperforms IBP and DT-FDA, achieving better focusing, sidelobe suppression, and 3D imaging accuracy. It yields the lowest image entropy (8.864) and significantly improves efficiency (imaging time: 15.29 s vs. 23295.3 s for IBP)., Comment: 8 pages
Published: 2024

3. Branches, Assemble! Multi-Branch Cooperation Network for Large-Scale Click-Through Rate Prediction at Taobao

Author: Chen, Xu, Cheng, Zida, Pan, Yuangang, Xiao, Shuai, Liu, Xiaoming, Lan, Jinsong, Liu, Qingwen, and Tsang, Ivor W.
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: Existing click-through rate (CTR) prediction works have studied the role of feature interaction through a variety of techniques. Each interaction technique exhibits its own strength, and solely using one type could constrain the model's capability to capture the complex feature relationships, especially for industrial large-scale data with enormous users and items. Recent research shows that effective CTR models often combine an MLP network with a dedicated feature interaction network in a two-parallel structure. However, the interplay and cooperative dynamics between different streams or branches remain under-researched. In this work, we introduce a novel Multi-Branch Cooperation Network (MBCnet) which enables multiple branch networks to collaborate with each other for better complex feature interaction modeling. Specifically, MBCnet consists of three branches: the Expert-based Feature Grouping and Crossing (EFGC) branch that promotes the model's memorization ability of specific feature fields, the low rank Cross Net branch and Deep branch to enhance both explicit and implicit feature crossing for improved generalization. Among branches, a novel cooperation scheme is proposed based on two principles: branch co-teaching and moderate differentiation. Branch co-teaching encourages well-learned branches to support poorly-learned ones on specific training samples. Moderate differentiation advocates branches to maintain a reasonable level of difference in their feature representations. The cooperation strategy improves learning through mutual knowledge sharing via co-teaching and boosts the discovery of diverse feature interactions across branches. Extensive experiments on large-scale industrial datasets and online A/B test demonstrate MBCnet's superior performance, delivering a 0.09 point increase in CTR, 1.49% growth in deals, and 1.62% rise in GMV. Core codes will be released soon., Comment: 10 pages
Published: 2024

4. Detection of the lowest mass ratio contact binary in the universe: TYC 3801-1529-1

Author: Li, Kai, Gao, Xiang, Guo, Di-Fu, Gao, Dong-Yang, Chen, Xu, Wang, Li-Heng, Xin, Yu-Xin, Han, Yu-Xin, Kim, Chun-Hwey, and Jeong, Min-Ji
Subjects: Astrophysics - Solar and Stellar Astrophysics, Astrophysics - Astrophysics of Galaxies
Abstract: This paper presents the first analysis of the contact binary TYC 3801-1529-1. We observed four sets of multiple bands complete light curves and one set of radial velocity curve of the primary component. Based on a simultaneous investigation of our observed and TESS light curves and the radial velocity curve, we found that TYC 3801-1529-1 is an extremely low-mass-ratio, medium contact binary with $q=0.0356$, with the contribution of the third light at a level of about 10\%. Its mass ratio is lower than V1187 Her, making TYC 3801-1529-1 the lowest mass-ratio contact binary ever found in the universe. The light curves observed in 2022 are asymmetric, which is aptly explained by a hot spot on the primary component. A 16-year eclipse timings analysis indicates a secular increase orbital period with a rate of dp/dt$=7.96(\pm0.35)\times10^{-7}$ d yr$^{-1}$. We studied the stability of this target and identified that not only the value of $J_{spin}/J_{orb}$, but also the mass ratio surpass the unstable boundary. Hence, TYC 3801-1529-1 presents a challenge to theoretical research and ought to be considered a progenitor of a contact binary merger., Comment: 6 pages, 3 figures, and 1 table, accepted by A&A Letters, Data available via China-VO PaperData repository
Published: 2024
Full Text: View/download PDF

5. Counterfactual Learning-Driven Representation Disentanglement for Search-Enhanced Recommendation

Author: Cui, Jiajun, Chen, Xu, Xiao, Shuai, Ju, Chen, Lan, Jinsong, Liu, Qingwen, and Zhang, Wei
Subjects: Computer Science - Information Retrieval
Abstract: For recommender systems in internet platforms, search activities provide additional insights into user interest through query-click interactions with items, and are thus widely used for enhancing personalized recommendation. However, these interacted items not only have transferable features matching users' interest helpful for the recommendation domain, but also have features related to users' unique intents in the search domain. Such domain gap of item features is neglected by most current search-enhanced recommendation methods. They directly incorporate these search behaviors into recommendation, and thus introduce partial negative transfer. To address this, we propose a Counterfactual learning-driven representation disentanglement framework for search-enhanced recommendation, based on the common belief that a user would click an item under a query not solely because of the item-query match but also due to the item's query-independent general features (e.g., color or style) that interest the user. These general features exclude the reflection of search-specific intents contained in queries, ensuring a pure match to users' underlying interest to complement recommendation. According to counterfactual thinking, how would user preferences and query match change for items if we removed their query-related features in search, we leverage search queries to construct counterfactual signals to disentangle item representations, isolating only query-independent general features. These representations subsequently enable feature augmentation and data augmentation for the recommendation scenario. Comprehensive experiments on real datasets demonstrate ClardRec is effective in both collaborative filtering and sequential recommendation scenarios.
Published: 2024

6. Learned Slip-Detection-Severity Framework using Tactile Deformation Field Feedback for Robotic Manipulation

Author: Jawale, Neel, Kaur, Navneet, Santoso, Amy, Hu, Xiaohai, and Chen, Xu
Subjects: Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control
Abstract: Safely handling objects and avoiding slippage are fundamental challenges in robotic manipulation, yet traditional techniques often oversimplify the issue by treating slippage as a binary occurrence. Our research presents a framework that both identifies slip incidents and measures their severity. We introduce a set of features based on detailed vector field analysis of tactile deformation data captured by the GelSight Mini sensor. Two distinct machine learning models use these features: one focuses on slip detection, and the other evaluates the slip's severity, which is the slipping velocity of the object against the sensor surface. Our slip detection model achieves an average accuracy of 92%, and the slip severity estimation model exhibits a mean absolute error (MAE) of 0.6 cm/s for unseen objects. To demonstrate the synergistic approach of this framework, we employ both the models in a tactile feedback-guided vertical sliding task. Leveraging the high accuracy of slip detection, we utilize it as the foundational and corrective model and integrate the slip severity estimation into the feedback control loop to address slips without overcompensating., Comment: Accepted at IROS 2024
Published: 2024

7. FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation

Author: Zhan, Ziwei, Zhao, Wenkuan, Li, Yuanqing, Liu, Weijie, Zhang, Xiaoxi, Tan, Chee Wei, Wu, Chuan, Guo, Deke, and Chen, Xu
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Federated learning (FL) is a collaborative machine learning approach that enables multiple clients to train models without sharing their private data. With the rise of deep learning, large-scale models have garnered significant attention due to their exceptional performance. However, a key challenge in FL is the limitation imposed by clients with constrained computational and communication resources, which hampers the deployment of these large models. The Mixture of Experts (MoE) architecture addresses this challenge with its sparse activation property, which reduces computational workload and communication demands during inference and updates. Additionally, MoE facilitates better personalization by allowing each expert to specialize in different subsets of the data distribution. To alleviate the communication burdens between the server and clients, we propose FedMoE-DA, a new FL model training framework that leverages the MoE architecture and incorporates a novel domain-aware, fine-grained aggregation strategy to enhance the robustness, personalizability, and communication efficiency simultaneously. Specifically, the correlation between both intra-client expert models and inter-client data heterogeneity is exploited. Moreover, we utilize peer-to-peer (P2P) communication between clients for selective expert model synchronization, thus significantly reducing the server-client transmissions. Experiments demonstrate that our FedMoE-DA achieves excellent performance while reducing the communication pressure on the server.
Published: 2024

8. Mixing angle of $K_1(1270/1400)$ and the $K\bar K_1(1400)$ molecular interpretation of $\eta_1(1855)$

Author: Liu, Zheng-Shu, Chen, Xu-Liang, Lian, Ding-Kun, Li, Ning, and Chen, Wei
Subjects: High Energy Physics - Phenomenology, High Energy Physics - Experiment
Abstract: Due to the SU(3) symmetry breaking effect, the axial-vector kaons $K_1(1270)$ and $K_1(1400)$ are established to be mixtures of two P-wave $K_{1A}\left( {^3{P_1}} \right)$ and $K_{1B}\left( {^1{P_1}} \right)$ states. In QCD sum rules, we propose a new construction of the $K_1$ current operators and calculate the two-point correlation functions by including the next-to-leading order four-quark condensates. The mixing angle is determined as $\theta = \left( {46.95_{ - 0.23}^{ + 0.25}} \right)^\circ$ by reproducing the masses of $K_1(1270)$ and $K_1(1400)$. We further compose the $K\bar K_1\left( {1270} \right)$ and $K\bar K_1\left( {1400} \right)$ interpolating currents with exotic quantum numbers $J^{PC}=1^{-+}$ to investigate the possible molecular interpretation of the recently observed ${\eta _1}(1855)$ state. We calculate the correlation functions and perform the QCD sum rule analyses for these two molecular systems. However, the spectral functions are found to be negative in physical regions so that they are not able to provide reliable investigations of the $K\bar K_1$ molecular states., Comment: 10 pages, 9 figures. More references added, some typos are corrected
Published: 2024

9. FedReMa: Improving Personalized Federated Learning via Leveraging the Most Relevant Clients

Author: Liang, Han, Zhan, Ziwei, Liu, Weijie, Zhang, Xiaoxi, Tan, Chee Wei, and Chen, Xu
Subjects: Computer Science - Machine Learning, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Federated Learning (FL) is a distributed machine learning paradigm that achieves a globally robust model through decentralized computation and periodic model synthesis, primarily focusing on the global model's accuracy over aggregated datasets of all participating clients. Personalized Federated Learning (PFL) instead tailors exclusive models for each client, aiming to enhance the accuracy of clients' individual models on specific local data distributions. Despite of their wide adoption, existing FL and PFL works have yet to comprehensively address the class-imbalance issue, one of the most critical challenges within the realm of data heterogeneity in PFL and FL research. In this paper, we propose FedReMa, an efficient PFL algorithm that can tackle class-imbalance by 1) utilizing an adaptive inter-client co-learning approach to identify and harness different clients' expertise on different data classes throughout various phases of the training process, and 2) employing distinct aggregation methods for clients' feature extractors and classifiers, with the choices informed by the different roles and implications of these model components. Specifically, driven by our experimental findings on inter-client similarity dynamics, we develop critical co-learning period (CCP), wherein we introduce a module named maximum difference segmentation (MDS) to assess and manage task relevance by analyzing the similarities between clients' logits of their classifiers. Outside the CCP, we employ an additional scheme for model aggregation that utilizes historical records of each client's most relevant peers to further enhance the personalization stability. We demonstrate the superiority of our FedReMa in extensive experiments., Comment: 8 pages, 4 figures, accepted by European Conference on Artificial Intelligence (2024 ECAI)
Published: 2024
Full Text: View/download PDF

10. GenSim: A General Social Simulation Platform with Large Language Model based Agents

Author: Tang, Jiakai, Gao, Heyang, Pan, Xuchen, Wang, Lei, Tan, Haoran, Gao, Dawei, Chen, Yushuo, Chen, Xu, Lin, Yankai, Li, Yaliang, Ding, Bolin, Zhou, Jingren, Wang, Jun, and Wen, Ji-Rong
Subjects: Computer Science - Multiagent Systems, Computer Science - Artificial Intelligence
Abstract: With the rapid advancement of large language models (LLMs), recent years have witnessed many promising studies on leveraging LLM-based agents to simulate human social behavior. While prior work has demonstrated significant potential across various domains, much of it has focused on specific scenarios involving a limited number of agents and has lacked the ability to adapt when errors occur during simulation. To overcome these limitations, we propose a novel LLM-agent-based simulation platform called \textit{GenSim}, which: (1) \textbf{Abstracts a set of general functions} to simplify the simulation of customized social scenarios; (2) \textbf{Supports one hundred thousand agents} to better simulate large-scale populations in real-world contexts; (3) \textbf{Incorporates error-correction mechanisms} to ensure more reliable and long-term simulations. To evaluate our platform, we assess both the efficiency of large-scale agent simulations and the effectiveness of the error-correction mechanisms. To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform based on LLM agents, promising to further advance the field of social science.
Published: 2024

11. Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation

Author: Xiao, Jie, Huang, Qianyi, Chen, Xu, and Tian, Chen
Subjects: Computer Science - Machine Learning
Abstract: As large language models (LLMs) increasingly integrate into every aspect of our work and daily lives, there are growing concerns about user privacy, which push the trend toward local deployment of these models. There are a number of lightweight LLMs (e.g., Gemini Nano, LLAMA2 7B) that can run locally on smartphones, providing users with greater control over their personal data. As a rapidly emerging application, we are concerned about their performance on commercial-off-the-shelf mobile devices. To fully understand the current landscape of LLM deployment on mobile platforms, we conduct a comprehensive measurement study on mobile devices. We evaluate both metrics that affect user experience, including token throughput, latency, and battery consumption, as well as factors critical to developers, such as resource utilization, DVFS strategies, and inference engines. In addition, we provide a detailed analysis of how these hardware capabilities and system dynamics affect on-device LLM performance, which may help developers identify and address bottlenecks for mobile LLM applications. We also provide comprehensive comparisons across the mobile system-on-chips (SoCs) from major vendors, highlighting their performance differences in handling LLM workloads. We hope that this study can provide insights for both the development of on-device LLMs and the design for future mobile system architecture.
Published: 2024

12. Cafca: High-quality Novel View Synthesis of Expressive Faces from Casual Few-shot Captures

Author: Bühler, Marcel C., Li, Gengyan, Wood, Erroll, Helminger, Leonhard, Chen, Xu, Shah, Tanmay, Wang, Daoye, Garbin, Stephan, Orts-Escolano, Sergio, Hilliges, Otmar, Lagun, Dmitry, Riviere, Jérémy, Gotardo, Paulo, Beeler, Thabo, Meka, Abhimitra, and Sarkar, Kripasindhu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Volumetric modeling and neural radiance field representations have revolutionized 3D face capture and photorealistic novel view synthesis. However, these methods often require hundreds of multi-view input images and are thus inapplicable to cases with less than a handful of inputs. We present a novel volumetric prior on human faces that allows for high-fidelity expressive face modeling from as few as three input views captured in the wild. Our key insight is that an implicit prior trained on synthetic data alone can generalize to extremely challenging real-world identities and expressions and render novel views with fine idiosyncratic details like wrinkles and eyelashes. We leverage a 3D Morphable Face Model to synthesize a large training set, rendering each identity with different expressions, hair, clothing, and other assets. We then train a conditional Neural Radiance Field prior on this synthetic dataset and, at inference time, fine-tune the model on a very sparse set of real images of a single subject. On average, the fine-tuning requires only three inputs to cross the synthetic-to-real domain gap. The resulting personalized 3D model reconstructs strong idiosyncratic facial expressions and outperforms the state-of-the-art in high-quality novel view synthesis of faces from sparse inputs in terms of perceptual and photo-metric quality., Comment: Siggraph Asia Conference Papers 2024
Published: 2024
Full Text: View/download PDF

13. MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants

Author: Zhang, Zeyu, Dai, Quanyu, Chen, Luyu, Jiang, Zeren, Li, Rui, Zhu, Jieming, Chen, Xu, Xie, Yi, Dong, Zhenhua, and Wen, Ji-Rong
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: LLM-based agents have been widely applied as personal assistants, capable of memorizing information from user messages and responding to personal queries. However, there still lacks an objective and automatic evaluation on their memory capability, largely due to the challenges in constructing reliable questions and answers (QAs) according to user messages. In this paper, we propose MemSim, a Bayesian simulator designed to automatically construct reliable QAs from generated user messages, simultaneously keeping their diversity and scalability. Specifically, we introduce the Bayesian Relation Network (BRNet) and a causal generation mechanism to mitigate the impact of LLM hallucinations on factual information, facilitating the automatic creation of an evaluation dataset. Based on MemSim, we generate a dataset in the daily-life scenario, named MemDaily, and conduct extensive experiments to assess the effectiveness of our approach. We also provide a benchmark for evaluating different memory mechanisms in LLM-based agents with the MemDaily dataset. To benefit the research community, we have released our project at https://github.com/nuster1128/MemSim., Comment: 26 pages, 25 tables, 1 figure
Published: 2024

14. SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement

Author: Pang, Yunkui, Liu, Yilin, Chen, Xu, Yap, Pew-Thian, and Lian, Jun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Cone Beam Computed Tomography (CBCT) finds diverse applications in medicine. Ensuring high image quality in CBCT scans is essential for accurate diagnosis and treatment delivery. Yet, the susceptibility of CBCT images to noise and artifacts undermines both their usefulness and reliability. Existing methods typically address CBCT artifacts through image-to-image translation approaches. These methods, however, are limited by the artifact types present in the training data, which may not cover the complete spectrum of CBCT degradations stemming from variations in imaging protocols. Gathering additional data to encompass all possible scenarios can often pose a challenge. To address this, we present SinoSynth, a physics-based degradation model that simulates various CBCT-specific artifacts to generate a diverse set of synthetic CBCT images from high-quality CT images without requiring pre-aligned data. Through extensive experiments, we demonstrate that several different generative networks trained on our synthesized data achieve remarkable results on heterogeneous multi-institutional datasets, outperforming even the same networks trained on actual data. We further show that our degradation model conveniently provides an avenue to enforce anatomical constraints in conditional generative models, yielding high-quality and structure-preserving synthetic CT images., Comment: MICCAI 2024
Published: 2024

15. Towards heavy double-gluon hybrid mesons with exotic quantum numbers in QCD sum rules

Author: Lian, Ding-Kun, Wang, Qi-Nan, Chen, Xu-Liang, Yang, Peng-Fei, Chen, Wei, and Chen, Hua-Xing
Subjects: High Energy Physics - Phenomenology
Abstract: The double-gluon hybrid meson configuration was recently proposed and investigated within QCD sum rules. In this talk, we discuss the color structures of the double-gluon hybrid meson and construct current operators with exotic quantum numbers $J^{PC}=1^{-+}$ and $2^{+-}$ for two of the structures. In the framework of QCD sum rules, we consider the condensates up to dimension-8 at the leading order of $\alpha_{s}$ for both charmonium and the bottomonium systems. The results indicate that the masses of the $1^{-+}$ and $2^{+-}$ charmonium double-gluon hybrid mesons are approximately $6.1-7.2$ GeV and $6.3-6.4$ GeV, respectively. As for the bottomonium systems, their masses fall within the range of $13.7-14.3$ GeV and $12.6-13.3$ GeV for the $1^{-+}$ and $2^{+-}$ channels, respectively. Additionally, the charmonium hybrids could be produced in the radiative decays of bottomonium mesons in BelleII experiment., Comment: 9 pages, 5 figures, 4 tables. Proceedings article for QCD24: 27th Hih-Energy Physics International Conference in Quantum Chromodynamis. arXiv admin note: substantial text overlap with arXiv:2403.18696
Published: 2024
Full Text: View/download PDF

16. Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs

Author: Wang, Yifan, Stevens, David, Shah, Pranay, Jiang, Wenwen, Liu, Miao, Chen, Xu, Kuo, Robert, Li, Na, Gong, Boying, Lee, Daniel, Hu, Jiabo, Zhang, Ning, and Kamma, Bob
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: The growing demand for AI training data has transformed data annotation into a global industry, but traditional approaches relying on human annotators are often time-consuming, labor-intensive, and prone to inconsistent quality. We propose the Model-in-the-Loop (MILO) framework, which integrates AI/ML models into the annotation process. Our research introduces a collaborative paradigm that leverages the strengths of both professional human annotators and large language models (LLMs). By employing LLMs as pre-annotation and real-time assistants, and judges on annotator responses, MILO enables effective interaction patterns between human annotators and LLMs. Three empirical studies on multimodal data annotation demonstrate MILO's efficacy in reducing handling time, improving data quality, and enhancing annotator experiences. We also introduce quality rubrics for flexible evaluation and fine-grained feedback on open-ended annotations. The MILO framework has implications for accelerating AI/ML development, reducing reliance on human annotation alone, and promoting better alignment between human and machine values.
Published: 2024

17. Research on LLM Acceleration Using the High-Performance RISC-V Processor 'Xiangshan' (Nanhu Version) Based on the Open-Source Matrix Instruction Set Extension (Vector Dot Product)

Author: Chen, Xu-Hao, Hu, Si-Peng, Liu, Hong-Chao, Liu, Bo-Ran, Tang, Dan, and Zhao, Di
Subjects: Computer Science - Hardware Architecture, C.1.3 [Other Architecture Styles]: RISC (Reduced Instruction Set Computing)
Abstract: Considering the high-performance and low-power requirements of edge AI, this study designs a specialized instruction set processor for edge AI based on the RISC-V instruction set architecture, addressing practical issues in digital signal processing for edge devices. This design enhances the execution efficiency of edge AI and reduces its energy consumption with limited hardware overhead, meeting the demands for efficient large language model (LLM) inference computation in edge AI applications. The main contributions of this paper are as follows: For the characteristics of large language models, custom instructions were extended based on the RISC-V instruction set to perform vector dot product calculations, accelerating the computation of large language models on dedicated vector dot product acceleration hardware. Based on the open-source high-performance RISC-V processor core XiangShan Nanhu architecture, the vector dot product specialized instruction set processor Nanhu-vdot was implemented, which adds vector dot product calculation units and pipeline processing logic on top of the XiangShan Nanhu.The Nanhu-vdot underwent FPGA hardware testing, achieving over four times the speed of scalar methods in vector dot product computation. Using a hardware-software co-design approach for second-generation Generative Pre-Trained Transformer (GPT-2) model inference, the speed improved by approximately 30% compared to pure software implementation with almost no additional consumption of hardware resources and power consumption., Comment: 10 pages, in Chinese language, 6 figures
Published: 2024

18. DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving

Author: Fu, Yongjie, Jain, Anmol, Di, Xuan, Chen, Xu, and Mo, Zhaobin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The advancement of autonomous driving technologies necessitates increasingly sophisticated methods for understanding and predicting real-world scenarios. Vision language models (VLMs) are emerging as revolutionary tools with significant potential to influence autonomous driving. In this paper, we propose the DriveGenVLM framework to generate driving videos and use VLMs to understand them. To achieve this, we employ a video generation framework grounded in denoising diffusion probabilistic models (DDPM) aimed at predicting real-world video sequences. We then explore the adequacy of our generated videos for use in VLMs by employing a pre-trained model known as Efficient In-context Learning on Egocentric Videos (EILEV). The diffusion model is trained with the Waymo open dataset and evaluated using the Fr\'echet Video Distance (FVD) score to ensure the quality and realism of the generated videos. Corresponding narrations are provided by EILEV for these generated videos, which may be beneficial in the autonomous driving domain. These narrations can enhance traffic scene understanding, aid in navigation, and improve planning capabilities. The integration of video generation with VLMs in the DriveGenVLM framework represents a significant step forward in leveraging advanced AI models to address complex challenges in autonomous driving.
Published: 2024

19. See or Guess: Counterfactually Regularized Image Captioning

Author: Cao, Qian, Chen, Xu, Song, Ruihua, Wang, Xiting, Huang, Xinting, and Ren, Yuchen
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Multimedia
Abstract: Image captioning, which generates natural language descriptions of the visual information in an image, is a crucial task in vision-language research. Previous models have typically addressed this task by aligning the generative capabilities of machines with human intelligence through statistical fitting of existing datasets. While effective for normal images, they may struggle to accurately describe those where certain parts of the image are obscured or edited, unlike humans who excel in such cases. These weaknesses they exhibit, including hallucinations and limited interpretability, often hinder performance in scenarios with shifted association patterns. In this paper, we present a generic image captioning framework that employs causal inference to make existing models more capable of interventional tasks, and counterfactually explainable. Our approach includes two variants leveraging either total effect or natural direct effect. Integrating them into the training process enables models to handle counterfactual scenarios, increasing their generalizability. Extensive experiments on various datasets show that our method effectively reduces hallucinations and improves the model's faithfulness to images, demonstrating high portability across both small-scale and large-scale image-to-text models. The code is available at https://github.com/Aman-4-Real/See-or-Guess., Comment: Accepted by ACM MM 2024
Published: 2024

20. Can LLMs Understand Social Norms in Autonomous Driving Games?

Author: Wang, Boxuan, Duan, Haonan, Feng, Yanhao, Chen, Xu, Fu, Yongjie, Mo, Zhaobin, and Di, Xuan
Subjects: Computer Science - Artificial Intelligence
Abstract: Social norm is defined as a shared standard of acceptable behavior in a society. The emergence of social norms fosters coordination among agents without any hard-coded rules, which is crucial for the large-scale deployment of AVs in an intelligent transportation system. This paper explores the application of LLMs in understanding and modeling social norms in autonomous driving games. We introduce LLMs into autonomous driving games as intelligent agents who make decisions according to text prompts. These agents are referred to as LLM-based agents. Our framework involves LLM-based agents playing Markov games in a multi-agent system (MAS), allowing us to investigate the emergence of social norms among individual agents. We aim to identify social norms by designing prompts and utilizing LLMs on textual information related to the environment setup and the observations of LLM-based agents. Using the OpenAI Chat API powered by GPT-4.0, we conduct experiments to simulate interactions and evaluate the performance of LLM-based agents in two driving scenarios: unsignalized intersection and highway platoon. The results show that LLM-based agents can handle dynamically changing environments in Markov games, and social norms evolve among LLM-based agents in both scenarios. In the intersection game, LLM-based agents tend to adopt a conservative driving policy when facing a potential car crash. The advantage of LLM-based agents in games lies in their strong operability and analyzability, which facilitate experimental design.
Published: 2024

21. Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning

Author: Ouyang, Bei, Ye, Shengyuan, Zeng, Liekang, Qian, Tianyi, Li, Jingyi, and Chen, Xu
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Networking and Internet Architecture
Abstract: Large language models (LLMs) have unlocked a plethora of powerful applications at the network edge, such as intelligent personal assistants. Data privacy and security concerns have prompted a shift towards edge-based fine-tuning of personal LLMs, away from cloud reliance. However, this raises issues of computational intensity and resource scarcity, hindering training efficiency and feasibility. While current studies investigate parameter-efficient fine-tuning (PEFT) techniques to mitigate resource constraints, our analysis indicates that these techniques are not sufficiently resource-efficient for edge devices. To tackle these challenges, we propose Pluto and Charon (PAC), a time and memory efficient collaborative edge AI framework for personal LLMs fine-tuning. PAC breaks the resource wall of personal LLMs fine-tuning with a sophisticated algorithm-system co-design. (1) Algorithmically, PAC implements a personal LLMs fine-tuning technique that is efficient in terms of parameters, time, and memory. It utilizes Parallel Adapters to circumvent the need for a full backward pass through the LLM backbone. Additionally, an activation cache mechanism further streamlining the process by negating the necessity for repeated forward passes across multiple epochs. (2) Systematically, PAC leverages edge devices in close proximity, pooling them as a collective resource for in-situ personal LLMs fine-tuning, utilizing a hybrid data and pipeline parallelism to orchestrate distributed training. The use of the activation cache eliminates the need for forward pass through the LLM backbone,enabling exclusive fine-tuning of the Parallel Adapters using data parallelism. Extensive evaluation based on prototype implementation demonstrates that PAC remarkably outperforms state-of-the-art approaches, achieving up to 8.64x end-to-end speedup and up to 88.16% reduction in memory footprint., Comment: Accepted by The 53rd International Conference on Parallel Processing (ICPP'24)
Published: 2024

22. Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

Author: Zhang, Chenyu, Chen, Xu, and Di, Xuan
Subjects: Computer Science - Machine Learning, Computer Science - Computer Science and Game Theory, Computer Science - Multiagent Systems, Mathematics - Optimization and Control
Abstract: Mean field games (MFGs) model the interactions within a large-population multi-agent system using the population distribution. Traditional learning methods for MFGs are based on fixed-point iteration (FPI), which calculates best responses and induced population distribution separately and sequentially. However, FPI-type methods suffer from inefficiency and instability, due to oscillations caused by the forward-backward procedure. This paper considers an online learning method for MFGs, where an agent updates its policy and population estimates simultaneously and fully asynchronously, resulting in a simple stochastic gradient descent (SGD) type method called SemiSGD. Not only does SemiSGD exhibit numerical stability and efficiency, but it also provides a novel perspective by treating the value function and population distribution as a unified parameter. We theoretically show that SemiSGD directs this unified parameter along a descent direction to the mean field equilibrium. Motivated by this perspective, we develop a linear function approximation (LFA) for both the value function and the population distribution, resulting in the first population-aware LFA for MFGs on continuous state-action space. Finite-time convergence and approximation error analysis are provided for SemiSGD equipped with population-aware LFA.
Published: 2024

23. Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices

Author: Ye, Shengyuan, Zeng, Liekang, Chu, Xiaowen, Xing, Guoliang, and Chen, Xu
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Networking and Internet Architecture
Abstract: On-device Deep Neural Network (DNN) training has been recognized as crucial for privacy-preserving machine learning at the edge. However, the intensive training workload and limited onboard computing resources pose significant challenges to the availability and efficiency of model training. While existing works address these challenges through native resource management optimization, we instead leverage our observation that edge environments usually comprise a rich set of accompanying trusted edge devices with idle resources beyond a single terminal. We propose Asteroid, a distributed edge training system that breaks the resource walls across heterogeneous edge devices for efficient model training acceleration. Asteroid adopts a hybrid pipeline parallelism to orchestrate distributed training, along with a judicious parallelism planning for maximizing throughput under certain resource constraints. Furthermore, a fault-tolerant yet lightweight pipeline replay mechanism is developed to tame the device-level dynamics for training robustness and performance stability. We implement Asteroid on heterogeneous edge devices with both vision and language models, demonstrating up to 12.2x faster training than conventional parallelism methods and 2.1x faster than state-of-the-art hybrid parallelism methods through evaluations. Furthermore, Asteroid can recover training pipeline 14x faster than baseline methods while preserving comparable throughput despite unexpected device exiting and failure., Comment: Accepted by The 30th Annual International Conference on Mobile Computing and Networking (MobiCom'24)
Published: 2024

24. MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Author: Dai, Yanqi, Hu, Huanran, Wang, Lei, Jin, Shengjie, Chen, Xu, and Lu, Zhiwu
Subjects: Computer Science - Artificial Intelligence
Abstract: Recently, Role-Playing Agents (RPAs) have garnered increasing attention for their potential to deliver emotional value and facilitate sociological research. However, existing studies are primarily confined to the textual modality, unable to simulate humans' multimodal perceptual capabilities. To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comprehensive framework, MMRole, for their development and evaluation, which comprises a personalized multimodal dataset and a robust evaluation method. Specifically, we construct a large-scale, high-quality dataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single or multi-turn dialogues. Additionally, we present a robust evaluation method, MMRole-Eval, encompassing eight metrics across three dimensions, where a reward model is trained to score MRPAs with the constructed ground-truth data for comparison. Moreover, we develop the first specialized MRPA, MMRole-Agent. Extensive evaluation results demonstrate the improved performance of MMRole-Agent and highlight the primary challenges in developing MRPAs, emphasizing the need for enhanced multimodal understanding and role-playing consistency. The data, code, and models will be available at https://github.com/YanqiDai/MMRole.
Published: 2024

25. On-the-fly Communication-and-Computing to Enable Representation Learning for Distributed Point Clouds

Author: Chen, Xu, Wu, Hai, and Huang, Kaibin
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: The advent of sixth-generation (6G) mobile networks introduces two groundbreaking capabilities: sensing and artificial intelligence (AI). Sensing leverages multi-modal sensors to capture real-time environmental data, while AI brings powerful models to the network edge, enabling intelligent Internet-of-Things (IoT) applications. These features converge in the Integrated Sensing and Edge AI (ISEA) paradigm, where edge devices collect and locally process sensor data before aggregating it centrally for AI tasks. Point clouds (PtClouds), generated by depth sensors, are crucial in this setup, supporting applications such as autonomous driving and mixed reality. However, the heavy computational load and communication demands of PtCloud fusion pose challenges. To address these, the FlyCom$^2$ framework is proposed, optimizing distributed PtCloud fusion through on-the-fly communication and computing, namely streaming on-sensor processing, progressive data uploading integrated communication-efficient AirComp, and the progressive output of a global PtCloud representation. FlyCom$^2$ distinguishes itself by aligning PtCloud fusion with Gaussian process regression (GPR), ensuring that global PtCloud representation progressively improves as more observations are received. Joint optimization of local observation synthesis and AirComp receiver settings is based on minimizing prediction error, balancing communication distortions, data heterogeneity, and temporal correlation. This framework enhances PtCloud fusion by balancing local processing demands with efficient central aggregation, paving the way for advanced 6G applications. Validation on real-world datasets demonstrates the efficacy of FlyCom$^2$, highlighting its potential in next-generation mobile networks., Comment: This is an ongoing work under revision
Published: 2024

26. Cool-Fusion: Fuse Large Language Models without Training

Author: Liu, Cong, Quan, Xiaojun, Pan, Yan, Lin, Liang, Wu, Weigang, and Chen, Xu
Subjects: Computer Science - Computation and Language
Abstract: We focus on the problem of fusing two or more heterogeneous large language models (LLMs) to facilitate their complementary strengths. One of the challenges on model fusion is high computational load, i.e. to fine-tune or to align vocabularies via combinatorial optimization. To this end, we propose \emph{Cool-Fusion}, a simple yet effective approach that fuses the knowledge of heterogeneous source LLMs to leverage their complementary strengths. \emph{Cool-Fusion} is the first method that does not require any type of training like the ensemble approaches. But unlike ensemble methods, it is applicable to any set of source LLMs that have different vocabularies. The basic idea is to have each source LLM individually generate tokens until the tokens can be decoded into a text segment that ends at word boundaries common to all source LLMs. Then, the source LLMs jointly rerank the generated text segment and select the best one, which is the fused text generation in one step. Extensive experiments are conducted across a variety of benchmark datasets. On \emph{GSM8K}, \emph{Cool-Fusion} increases accuracy from three strong source LLMs by a significant 8\%-17.8\%.
Published: 2024

27. Towards Effective and Efficient Continual Pre-training of Large Language Models

Author: Chen, Jie, Chen, Zhipeng, Wang, Jiapeng, Zhou, Kun, Zhu, Yutao, Jiang, Jinhao, Min, Yingqian, Zhao, Wayne Xin, Dou, Zhicheng, Mao, Jiaxin, Lin, Yankai, Song, Ruihua, Xu, Jun, Chen, Xu, Yan, Rui, Wei, Zhewei, Hu, Di, Huang, Wenbing, and Wen, Ji-Rong
Subjects: Computer Science - Computation and Language, 68T50, I.2.7
Abstract: Continual pre-training (CPT) has been an important approach for adapting language models to specific domains or tasks. To make the CPT approach more traceable, this paper presents a technical report for continually pre-training Llama-3 (8B), which significantly enhances the Chinese language ability and scientific reasoning ability of the backbone model. To enhance the new abilities while retaining the original abilities, we design specific data mixture and curriculum strategies by utilizing existing datasets and synthesizing high-quality datasets. Specifically, we synthesize multidisciplinary scientific question and answer (QA) pairs based on related web pages, and subsequently incorporate these synthetic data to improve the scientific reasoning ability of Llama-3. We refer to the model after CPT as Llama-3-SynE (Synthetic data Enhanced Llama-3). We also present the tuning experiments with a relatively small model -- TinyLlama, and employ the derived findings to train the backbone model. Extensive experiments on a number of evaluation benchmarks show that our approach can largely improve the performance of the backbone models, including both the general abilities (+8.81 on C-Eval and +6.31 on CMMLU) and the scientific reasoning abilities (+12.00 on MATH and +4.13 on SciEval), without hurting the original capacities. Our model, data, and codes are available at https://github.com/RUC-GSAI/Llama-3-SynE., Comment: 16 pages, 10 figures, 16 tables
Published: 2024

28. The FAST HI 21-cm absorption blind survey. II -- statistic exploration for associated and intervening systems

Author: Hu, Wenkai, Wang, Yougang, Li, Yichao, Pen, Ue-Li, Wang, Jie, Jing, Yingjie, Zhu, Ming, Zhang, Xin, Yang, Wenxiu, Xu, Yidong, Chen, Xu, Chen, Jingze, Zheng, Zheng, Li, Di, and Chen, Xuelei
Subjects: Astrophysics - Astrophysics of Galaxies
Abstract: We present an extragalactic HI 21-cm absorption lines catalog from a blind search at z $\leq$ 0.35, using drift-scan data collected in 1616.9 hours by the ongoing Commensal Radio Astronomy FasT Survey (CRAFTS) and FAST All Sky HI Survey (FASHI), which spans a sky area of 7456.8 deg$^{2}$ and covers 84,533 radio sources with a flux density greater than 12 mJy. 14 previously identified HI absorbers and 20 newly discovered HI absorbers were detected, comprising 14 associated systems, 11 intervening systems, and 9 systems with undetermined classifications. We fit HI profiles with multi-component Gaussian functions and calculate the redshift, width, flux density, optical depth, and HI column densities for each source. Through spectral stacking, the mean peak optical path, mean velocity-integrated optical path $\langle \tau\rangle$, mean FWHM and mean HI column density $\langle$ N$_{HI}\rangle$ are measured to be 0.46 and 0.34; 25.85 km/s and 4.62 km/s; 39.80 km/s and 8.95 km/s; 0.470 and 0.085 T$_{s} \times$ 10$^{20}$cm$^{-2}$K$^{-1}$, for the associated and intervening samples, respectively. Statistical analysis also reveals that associated systems tend to be hosted by red (g$-$r$>$0.7) galaxies at lower redshifts, whereas galaxies hosting intervening HI absorption are typically found at higher redshifts and are of a bluer (g$-$r$\leq$0.7) type. Additionally, it has been demonstrated that associated HI 21-cm absorptions connected to compact radio sources display higher N$_{HI}$ values compared to those linked with extended radio sources., Comment: 28 pages, 39 figures, 5 tables
Published: 2024

29. Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

Author: Ju, Chen, Wang, Haicheng, Cheng, Haozhe, Chen, Xu, Zhai, Zhonghua, Huang, Weilin, Lan, Jinsong, Xiao, Shuai, and Zheng, Bo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the impressive performance. However, their expensive computation costs, i.e., throughput and delay, impede potentials in the real-world scenarios. To achieve acceleration for VLMs, most existing methods focus on the model perspective: pruning, distillation, quantization, but completely overlook the data-perspective redundancy. To fill the overlook, this paper pioneers the severity of data redundancy, and designs one plug-and-play Turbo module guided by information degree to prune inefficient tokens from visual or textual data. In pursuit of efficiency-performance trade-offs, information degree takes two crucial factors into consideration: mutual redundancy and semantic value. Concretely, the former evaluates data duplication between sequential tokens; while the latter evaluates each token by its contribution to the overall semantics. As a result, tokens with high information degree carry less redundancy and stronger semantics. For VLMs' calculation, Turbo works as a user-friendly plug-in that sorts data referring to information degree, utilizing only top-level ones to save costs. Its advantages are multifaceted, e.g., being generally compatible to various VLMs across understanding and generation, simple use without re-training and trivial engineering efforts. On multiple VLMs benchmarks, we fully experiment to demonstrate the good acceleration of Turbo, under negligible performance drop., Comment: ECCV 2024. The first two authors share the same contribution. arXiv admin note: substantial text overlap with arXiv:2312.07408
Published: 2024

30. Weakly Coupled Type-II Superconductivity in a Laves compound ZrRe2

Author: Yu, Yingpeng, Liu, Zhaolong, Li, Qi, Chen, Zhaoxu, Wang, Yulong, Hao, Munan, Yang, Yaling, Gong, Chunsheng, Chen, Long, Xie, Zhenkai, Zhou, Kaiyao, Ren, Huifen, Chen, Xu, and Jin, Shifeng
Subjects: Condensed Matter - Superconductivity
Abstract: We present a comprehensive investigation of the superconducting properties of ZrRe2, a Re-based hexagonal Laves compounds. ZrRe2 crystallizes in a C14-type structure (space group P63/mmc), with cell parameters a=b=5.2682(5) and c=8.63045 . Resistivity and magnetic susceptibility data both suggest that ZrRe2 exhibits a sharp superconducting transition above 6.1 K. The measured lower and upper critical fields are 6.27 mT and 12.77 T, respectively, with a large upper critical field that approached the Pauli limit.Measurements of the heat capacity confirm the presence of bulk superconductivity, with a normalized specific heat change of 1.24 and an electron-phonon strength of 0.69 . DFT calculations revealed that the band structure of ZrRe2 is intricate and without van-Hove singularity. The observed large specific heat jump, combined with the electron-phonon strength , suggests that ZrRe2 is a weakly coupled type II superconductor., Comment: 14 pages,7 figures, 2 tables
Published: 2024

31. Towards Robust Recommendation via Decision Boundary-aware Graph Contrastive Learning

Author: Tang, Jiakai, Dai, Sunhao, Sun, Zexu, Chen, Xu, Xu, Jun, Yu, Wenhui, Hu, Lantao, Jiang, Peng, and Li, Han
Subjects: Computer Science - Information Retrieval
Abstract: In recent years, graph contrastive learning (GCL) has received increasing attention in recommender systems due to its effectiveness in reducing bias caused by data sparsity. However, most existing GCL models rely on heuristic approaches and usually assume entity independence when constructing contrastive views. We argue that these methods struggle to strike a balance between semantic invariance and view hardness across the dynamic training process, both of which are critical factors in graph contrastive learning. To address the above issues, we propose a novel GCL-based recommendation framework RGCL, which effectively maintains the semantic invariance of contrastive pairs and dynamically adapts as the model capability evolves through the training process. Specifically, RGCL first introduces decision boundary-aware adversarial perturbations to constrain the exploration space of contrastive augmented views, avoiding the decrease of task-specific information. Furthermore, to incorporate global user-user and item-item collaboration relationships for guiding on the generation of hard contrastive views, we propose an adversarial-contrastive learning objective to construct a relation-aware view-generator. Besides, considering that unsupervised GCL could potentially narrower margins between data points and the decision boundary, resulting in decreased model robustness, we introduce the adversarial examples based on maximum perturbations to achieve margin maximization. We also provide theoretical analyses on the effectiveness of our designs. Through extensive experiments on five public datasets, we demonstrate the superiority of RGCL compared against twelve baseline models., Comment: KDD 2024
Published: 2024
Full Text: View/download PDF

32. Resource Management for Low-latency Cooperative Fine-tuning of Foundation Models at the Network Edge

Author: Wu, Hai, Chen, Xu, and Huang, Kaibin
Subjects: Computer Science - Information Theory, Computer Science - Artificial Intelligence
Abstract: The emergence of large-scale foundation models (FoMo's) that can perform human-like intelligence motivates their deployment at the network edge for devices to access state-of-the-art artificial intelligence. For better user experiences, the pre-trained FoMo's need to be adapted to specialized downstream tasks through fine-tuning techniques. To transcend a single device's memory and computation limitations, we advocate multi-device cooperation within the device-edge cooperative fine-tuning (DEFT) paradigm, where edge devices cooperate to simultaneously optimize different parts of fine-tuning parameters within a FoMo. However, the parameter blocks reside at different depths within a FoMo architecture, leading to varied computation latency-and-memory cost due to gradient backpropagation-based calculations. The heterogeneous on-device computation and memory capacities and channel conditions necessitate an integrated communication-and-computation allocation of local computation loads and communication resources to achieve low-latency (LoLa) DEFT. To this end, we consider the depth-ware DEFT block allocation problem. The involved optimal block-device matching is tackled by the proposed low-complexity Cutting-RecoUNting-CHecking (CRUNCH) algorithm, which is designed by exploiting the monotone-increasing property between block depth and computation latency-and-memory cost. Next, the joint bandwidth-and-block allocation makes the problem more sophisticated. We observe a splittable Lagrangian expression through the transformation and analysis of the original problem, where the variables indicating device involvement are introduced. Then, the dual ascent method is employed to tackle this problem iteratively. Through extensive experiments conducted on the GLUE benchmark, our results demonstrate significant latency reduction achievable by LoLa DEFT for fine-tuning a RoBERTa model., Comment: This work has been submitted to the IEEE for possible publication
Published: 2024

33. Edge Graph Intelligence: Reciprocally Empowering Edge Networks with Graph Intelligence

Author: Zeng, Liekang, Ye, Shengyuan, Chen, Xu, Zhang, Xiaoxi, Ren, Ju, Tang, Jian, Yang, Yang, Xuemin, and Shen
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Networking and Internet Architecture
Abstract: Recent years have witnessed a thriving growth of computing facilities connected at the network edge, cultivating edge computing networks as a fundamental infrastructure for supporting miscellaneous intelligent services. Meanwhile, Artificial Intelligence frontiers have extrapolated Machine Learning to the graph domain and promoted Graph Intelligence (GI), which unlocks unprecedented ability in learning from massive data in graph structures. Given the inherent relation between graphs and networks, the interdiscipline of graph representation learning and edge networks, i.e., Edge GI or EGI, has revealed a novel interplay between them -- GI models principally open a new door for modeling, understanding, and optimizing edge networks, and conversely, edge networks serve as physical support for training, deploying, and accelerating GI models. Driven by this delicate closed-loop, EGI can be widely recognized as a promising solution to fully unleash the potential of edge computing power and is garnering significant attention. Nevertheless, research on EGI yet remains nascent, and there is a soaring demand within both the communications and AI communities for a dedicated venue to share recent advancements. To this end, this paper promotes the concept of EGI, explores its scope and core principles, and conducts a comprehensive survey concerning recent research efforts on this emerging field and specifically, introduces and discusses: 1) fundamentals of edge computing and graph representation learning, 2) emerging techniques centering on the closed loop between graph intelligence and edge networks, and 3) open challenges and research opportunities of future EGI. By bridging the gap across communication, networking, and graph learning areas, we believe that this survey can garner increased attention, foster meaningful discussions, and inspire further research ideas in EGI., Comment: 38 pages, 14 figures
Published: 2024

34. The plan for a super $\eta$ factory at Huizhou accelerator complex

Author: Chen, Xu-Rong, He, Xiong-Hong, Hu, Qiang, Lin, De-Xu, Liu, Yang, Qiu, Hao, Sun, Xu, Tian, Ye, Wang, Rong, Zhang, Hong-Lin, Zhang, Ya-Peng, and Zhao, Cheng-Xin
Subjects: High Energy Physics - Phenomenology
Abstract: As an approximate Goldstone boson with zero quantum number and zero standard model charge, the decay processes of long-lived $\eta$ meson offer a unique opportunity to explore new physics beyond the standard model and new sources of CP violation, as well as test the low-energy QCD theory and measure the fundamental parameters of light quarks. To pursue these goals in the physics frontiers, we propose a plan to construct a super $\eta$ factory at HIAF high-energy terminal or at CiADS after its energy upgrade. The high-intensity proton beam at HIAF enables the production of a vast number of $\eta$ samples, exceeding $10^{13}$ events per year in the first stage, utilizing multiple layers of thin targets made of light nucleus. This paper presents the physics goals, the first-version conceptual design of the spectrometer, and some preliminary simulation results., Comment: 23 pages, 13 figures
Published: 2024

35. YuLan: An Open-source Large Language Model

Author: Zhu, Yutao, Zhou, Kun, Mao, Kelong, Chen, Wentong, Sun, Yiding, Chen, Zhipeng, Cao, Qian, Wu, Yihan, Chen, Yushuo, Wang, Feng, Zhang, Lei, Li, Junyi, Wang, Xiaolei, Wang, Lei, Zhang, Beichen, Dong, Zican, Cheng, Xiaoxue, Chen, Yuhan, Tang, Xinyu, Hou, Yupeng, Ren, Qiangqiang, Pang, Xincheng, Xie, Shufang, Zhao, Wayne Xin, Dou, Zhicheng, Mao, Jiaxin, Lin, Yankai, Song, Ruihua, Xu, Jun, Chen, Xu, Yan, Rui, Wei, Zhewei, Hu, Di, Huang, Wenbing, Gao, Ze-Feng, Chen, Yueguo, Lu, Weizheng, and Wen, Ji-Rong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We design a three-stage pre-training method to enhance YuLan's overall capabilities. Subsequent phases of training incorporate instruction-tuning and human alignment, employing a substantial volume of high-quality synthesized data. To facilitate the learning of complex and long-tail knowledge, we devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner. YuLan's training is finished on Jan, 2024 and has achieved performance on par with state-of-the-art LLMs across various English and Chinese benchmarks. This paper outlines a comprehensive technical roadmap for developing LLMs from scratch. Our model and codes are available at https://github.com/RUC-GSAI/YuLan-Chat.
Published: 2024

36. Online Optimization of DNN Inference Network Utility in Collaborative Edge Computing

Author: Li, Rui, Ouyang, Tao, Zeng, Liekang, Liao, Guocheng, Zhou, Zhi, and Chen, Xu
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Collaborative Edge Computing (CEC) is an emerging paradigm that collaborates heterogeneous edge devices as a resource pool to compute DNN inference tasks in proximity such as edge video analytics. Nevertheless, as the key knob to improve network utility in CEC, existing works mainly focus on the workload routing strategies among edge devices with the aim of minimizing the routing cost, remaining an open question for joint workload allocation and routing optimization problem from a system perspective. To this end, this paper presents a holistic, learned optimization for CEC towards maximizing the total network utility in an online manner, even though the utility functions of task input rates are unknown a priori. In particular, we characterize the CEC system in a flow model and formulate an online learning problem in a form of cross-layer optimization. We propose a nested-loop algorithm to solve workload allocation and distributed routing iteratively, using the tools of gradient sampling and online mirror descent. To improve the convergence rate over the nested-loop version, we further devise a single-loop algorithm. Rigorous analysis is provided to show its inherent convexity, efficient convergence, as well as algorithmic optimality. Finally, extensive numerical simulations demonstrate the superior performance of our solutions., Comment: Accepted by IEEE/ACM TRANSACTIONS ON NETWORKING (ToN)
Published: 2024

37. Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices

Author: Tang, Weiheng, Li, Jingyi, Chen, Lin, and Chen, Xu
Subjects: Computer Science - Networking and Internet Architecture, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Edge computing has recently emerged as a promising paradigm to boost the performance of distributed learning by leveraging the distributed resources at edge nodes. Architecturally, the introduction of edge nodes adds an additional intermediate layer between the master and workers in the original distributed learning systems, potentially leading to more severe straggler effect. Recently, coding theory-based approaches have been proposed for stragglers mitigation in distributed learning, but the majority focus on the conventional workers-master architecture. In this paper, along a different line, we investigate the problem of mitigating the straggler effect in hierarchical distributed learning systems with an additional layer composed of edge nodes. Technically, we first derive the fundamental trade-off between the computational loads of workers and the stragglers tolerance. Then, we propose a hierarchical gradient coding framework, which provides better stragglers mitigation, to achieve the derived computational trade-off. To further improve the performance of our framework in heterogeneous scenarios, we formulate an optimization problem with the objective of minimizing the expected execution time for each iteration in the learning process. We develop an efficient algorithm to mathematically solve the problem by outputting the optimum strategy. Extensive simulation results demonstrate the superiority of our schemes compared with conventional solutions., Comment: The paper has been accepted by IEEE Transactions on Communications
Published: 2024

38. IMFL-AIGC: Incentive Mechanism Design for Federated Learning Empowered by Artificial Intelligence Generated Content

Author: Huang, Guangjing, Wu, Qiong, Li, Jingyi, and Chen, Xu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Computer Science and Game Theory
Abstract: Federated learning (FL) has emerged as a promising paradigm that enables clients to collaboratively train a shared global model without uploading their local data. To alleviate the heterogeneous data quality among clients, artificial intelligence-generated content (AIGC) can be leveraged as a novel data synthesis technique for FL model performance enhancement. Due to various costs incurred by AIGC-empowered FL (e.g., costs of local model computation and data synthesis), however, clients are usually reluctant to participate in FL without adequate economic incentives, which leads to an unexplored critical issue for enabling AIGC-empowered FL. To fill this gap, we first devise a data quality assessment method for data samples generated by AIGC and rigorously analyze the convergence performance of FL model trained using a blend of authentic and AI-generated data samples. We then propose a data quality-aware incentive mechanism to encourage clients' participation. In light of information asymmetry incurred by clients' private multi-dimensional attributes, we investigate clients' behavior patterns and derive the server's optimal incentive strategies to minimize server's cost in terms of both model accuracy loss and incentive payments for both complete and incomplete information scenarios. Numerical results demonstrate that our proposed mechanism exhibits highest training accuracy and reduces up to 53.34% of the server's cost with real-world datasets, compared with existing benchmark mechanisms., Comment: The paper has been accepted by IEEE Transactions on Mobile Computing
Published: 2024

39. Explainable Machine Learning Identification of Superconductivity from Single-Particle Spectral Functions

Author: Chen, Xu, Sun, Yuanjie, Hruska, Eugen, Dixit, Vivek, Yang, Jinming, He, Yu, Wang, Yao, and Liu, Fang
Subjects: Condensed Matter - Superconductivity, Condensed Matter - Strongly Correlated Electrons
Abstract: The traditional method of identifying symmetry-breaking phase transitions through the emergence of a single-particle gap encounters significant challenges in quantum materials with strong fluctuations. To address this, we have developed a data-driven approach using a domain-adversarial neural network trained on simulated spectra of cuprates. This model compensates for the scarcity of experimental data -- a significant barrier to the wide deployment of machine learning in physical research -- by leveraging the abundance of theoretically simulated data. When applied to unlabeled experimental spectra, our model successfully distinguishes the true superconducting states from gapped fluctuating states, without the need for fine temperature sampling across the transition. Further, the explanation of our machine learning model reveals the crucial role of the Fermi-surface spectral intensity even in gapped states. It paves the way for robust and direct spectroscopic identification of fluctuating orders, particularly in low-dimensional, strongly correlated materials., Comment: 8 pages, 5 figures
Published: 2024

40. The spectral radius and the distance spectral radius of complements of block graphs

Author: Chen, Xu, Fan, Dongjun, Shao, Rongxiao, and Wang, Guoping
Subjects: Mathematics - Combinatorics, Mathematics - Spectral Theory, 05C12, 05C50, 05C69
Abstract: In this paper, we determine the graphs whose spectral radius and distance spectral radius attain maximum and minimum among all complements of clique trees. Furthermore, we also determine the graphs whose spectral radius and distance spectral radius attain minimum and maximum among all complements of block graphs, respectively.
Published: 2024

41. Discontinuities of banana integrals in dispersion relation representation

Author: Chen, Xu-Liang, Yang, Peng-Fei, and Chen, Wei
Subjects: High Energy Physics - Phenomenology, High Energy Physics - Theory
Abstract: We derive the discontinuities of banana integrals using the dispersion relation iteratively. We find a series of identities between the parameterized discontinuities of banana integrals (p-DOBIs). Similar to elliptic integrals, these identities enable the reduction of various p-DOBIs to be a linear combination of some fundamental ones. We present a practical application of p-DOBIs for deriving Picard-Fuchs operator. Then we establish the expression of generalized dispersion relation, which enables us to obtain the dispersion relation representation of arbitrary banana integrals. Moreover, we propose a hypothesis for generalized dispersion relation and p-DOBIs, which provides a simple way to calculate the discontinuities and transform dispersion relation representation to p-DOBIs., Comment: 7 pages, 1 figure. Published in Chinese Physics Letters
Published: 2024
Full Text: View/download PDF

42. Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference

Author: Ye, Shengyuan, Du, Jiangsu, Zeng, Liekang, Ou, Wenzhong, Chu, Xiaowen, Lu, Yutong, and Chen, Xu
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Networking and Internet Architecture
Abstract: Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recognized for edge intelligence, but it still confronts significant challenges stemming from the conflict between intensive workloads and limited on-device computing resources. In this paper, we leverage our observation that many edge environments usually comprise a rich set of accompanying trusted edge devices with idle resources and propose Galaxy, a collaborative edge AI system that breaks the resource walls across heterogeneous edge devices for efficient Transformer inference acceleration. Galaxy introduces a novel hybrid model parallelism to orchestrate collaborative inference, along with a heterogeneity-aware parallelism planning for fully exploiting the resource potential. Furthermore, Galaxy devises a tile-based fine-grained overlapping of communication and computation to mitigate the impact of tensor synchronizations on inference latency under bandwidth-constrained edge environments. Extensive evaluation based on prototype implementation demonstrates that Galaxy remarkably outperforms state-of-the-art approaches under various edge environment setups, achieving up to 2.5x end-to-end latency reduction., Comment: Accepted by IEEE International Conference on Computer Communications 2024
Published: 2024

43. Attaining Human`s Desirable Outcomes in Human-AI Interaction via Structural Causal Games

Author: Liu, Anjie, Wang, Jianhong, Li, Haoxuan, Chen, Xu, Wang, Jun, Kaski, Samuel, and Yang, Mengyue
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Human-Computer Interaction
Abstract: In human-AI interaction, a prominent goal is to attain human`s desirable outcome with the assistance of AI agents, which can be ideally delineated as a problem of seeking the optimal Nash Equilibrium that matches the human`s desirable outcome. However, reaching the outcome is usually challenging due to the existence of multiple Nash Equilibria that are related to the assisting task but do not correspond to the human`s desirable outcome. To tackle this issue, we employ a theoretical framework called structural causal game (SCG) to formalize the human-AI interactive process. Furthermore, we introduce a strategy referred to as pre-policy intervention on the SCG to steer AI agents towards attaining the human`s desirable outcome. In more detail, a pre-policy is learned as a generalized intervention to guide the agents` policy selection, under a transparent and interpretable procedure determined by the SCG. To make the framework practical, we propose a reinforcement learning-like algorithm to search out this pre-policy. The proposed algorithm is tested in both gridworld environments and realistic dialogue scenarios with large language models, demonstrating its adaptability in a broader class of problems and potential effectiveness in real-world situations., Comment: 38 pages, 5 figures
Published: 2024

44. Shape of a droplet on a surface in the presence of an external field and its critical disruption condition

Author: Li, Jing, Wen, Kaiqiang, Xiao, Ke, Chen, Xiaoming, and Wu, Chen-Xu
Subjects: Condensed Matter - Soft Condensed Matter, Physics - Fluid Dynamics
Abstract: Due to the potential application of regulating droplet shape by external fields in microfluidic technology and micro devices, it becomes increasingly important to understand the shape formation of a droplet in the presence of an electric field. How to understand and determine such a deformable boundary shape at equilibrium has been a long-term physical and mathematical challenge. Here, based on the theoretical model we propose, and combining the finite element method and the gradient descent algorithm, we successfully obtain the droplet shape by considering the contributions made by electrostatic energy, surface tension energy, and gravitational potential energy. We also carry out scaling analyses and obtain an empirical critical disruption condition with a universal scaling exponent 1/2 for the contact angle in terms of normalized volume. The master curve fits both the experimental and the numerical results very well.
Published: 2024

45. Fast adiabatic preparation of multi-squeezed states by jumping along the path

Author: Chen, Chuan, Lu, Jian-Yu, Chen, Xu-Yang, and Wang, Zhen-Yu
Subjects: Quantum Physics
Abstract: Multi-squeezed states, also known as generalized squeezed states, are valuable quantum non-Gaussian resources, because they can feature non-classical properties such as large phase-space Wigner negativities. In this work, we introduce a novel shortcuts to adiabaticity (STA) method for the fast preparation of multi-squeezed states. In contrast to previous STA methods, which rely on the use of counterdiabatic control to suppress unwanted non-adiabatic effects, our method simplifies the process and accelerates state preparation by selecting an appropriate sampling along a quantum evolution path. We demonstrate the high-fidelity and fast preparation of multi-squeezed states, as well as hybrid entangled states between a bosonic mode and a qubit.
Published: 2024

46. Revisiting Counterfactual Regression through the Lens of Gromov-Wasserstein Information Bottleneck

Author: Yang, Hao, Sun, Zexu, Xu, Hongteng, and Chen, Xu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: As a promising individualized treatment effect (ITE) estimation method, counterfactual regression (CFR) maps individuals' covariates to a latent space and predicts their counterfactual outcomes. However, the selection bias between control and treatment groups often imbalances the two groups' latent distributions and negatively impacts this method's performance. In this study, we revisit counterfactual regression through the lens of information bottleneck and propose a novel learning paradigm called Gromov-Wasserstein information bottleneck (GWIB). In this paradigm, we learn CFR by maximizing the mutual information between covariates' latent representations and outcomes while penalizing the kernelized mutual information between the latent representations and the covariates. We demonstrate that the upper bound of the penalty term can be implemented as a new regularizer consisting of $i)$ the fused Gromov-Wasserstein distance between the latent representations of different groups and $ii)$ the gap between the transport cost generated by the model and the cross-group Gromov-Wasserstein distance between the latent representations and the covariates. GWIB effectively learns the CFR model through alternating optimization, suppressing selection bias while avoiding trivial latent distributions. Experiments on ITE estimation tasks show that GWIB consistently outperforms state-of-the-art CFR methods. To promote the research community, we release our project at https://github.com/peteryang1031/Causal-GWIB., Comment: 19 pages
Published: 2024

47. Are Long-LLMs A Necessity For Long-Context Tasks?

Author: Qian, Hongjin, Liu, Zheng, Zhang, Peitian, Mao, Kelong, Zhou, Yujia, Chen, Xu, and Dou, Zhicheng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The learning and deployment of long-LLMs remains a challenging problem despite recent progresses. In this work, we argue that the long-LLMs are not a necessity to solve long-context tasks, as common long-context tasks are short-context solvable, i.e. they can be solved by purely working with oracle short-contexts within the long-context tasks' inputs. On top of this argument, we propose a framework called LC-Boost (Long-Context Bootstrapper), which enables a short-LLM to address the long-context tasks in a bootstrapping manner. In our framework, the short-LLM prompts itself to reason for two critical decisions: 1) how to access to the appropriate part of context within the input, 2) how to make effective use of the accessed context. By adaptively accessing and utilizing the context based on the presented tasks, LC-Boost can serve as a general framework to handle diversified long-context processing problems. We comprehensively evaluate different types of tasks from popular long-context benchmarks, where LC-Boost is able to achieve a substantially improved performance with a much smaller consumption of resource., Comment: 18 pages
Published: 2024

48. Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Author: Han, Yue, Zhu, Junwei, He, Keke, Chen, Xu, Ge, Yanhao, Li, Wei, Li, Xiangtai, Zhang, Jiangning, Wang, Chengjie, and Liu, Yong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Current face reenactment and swapping methods mainly rely on GAN frameworks, but recent focus has shifted to pre-trained diffusion models for their superior generation capabilities. However, training these models is resource-intensive, and the results have not yet achieved satisfactory performance levels. To address this issue, we introduce Face-Adapter, an efficient and effective adapter designed for high-precision and high-fidelity face editing for pre-trained diffusion models. We observe that both face reenactment/swapping tasks essentially involve combinations of target structure, ID and attribute. We aim to sufficiently decouple the control of these factors to achieve both tasks in one model. Specifically, our method contains: 1) A Spatial Condition Generator that provides precise landmarks and background; 2) A Plug-and-play Identity Encoder that transfers face embeddings to the text space by a transformer decoder. 3) An Attribute Controller that integrates spatial conditions and detailed attributes. Face-Adapter achieves comparable or even superior performance in terms of motion control precision, ID retention capability, and generation quality compared to fully fine-tuned face reenactment/swapping models. Additionally, Face-Adapter seamlessly integrates with various StableDiffusion models., Comment: Accepted to ECCV2024; Project Page: https://faceadapter.github.io/face-adapter.github.io/
Published: 2024

49. Generalizing Knowledge Graph Embedding with Universal Orthogonal Parameterization

Author: Li, Rui, Li, Chaozhuo, Shen, Yanming, Zhang, Zeyu, and Chen, Xu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Recent advances in knowledge graph embedding (KGE) rely on Euclidean/hyperbolic orthogonal relation transformations to model intrinsic logical patterns and topological structures. However, existing approaches are confined to rigid relational orthogonalization with restricted dimension and homogeneous geometry, leading to deficient modeling capability. In this work, we move beyond these approaches in terms of both dimension and geometry by introducing a powerful framework named GoldE, which features a universal orthogonal parameterization based on a generalized form of Householder reflection. Such parameterization can naturally achieve dimensional extension and geometric unification with theoretical guarantees, enabling our framework to simultaneously capture crucial logical patterns and inherent topological heterogeneity of knowledge graphs. Empirically, GoldE achieves state-of-the-art performance on three standard benchmarks. Codes are available at https://github.com/xxrep/GoldE., Comment: Accepted by ICML 2024
Published: 2024

50. Graphon Mean Field Games with a Representative Player: Analysis and Learning Algorithm

Author: Zhou, Fuzhong, Zhang, Chenyu, Chen, Xu, and Di, Xuan
Subjects: Mathematics - Optimization and Control, Computer Science - Artificial Intelligence, Computer Science - Computer Science and Game Theory, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We propose a discrete time graphon game formulation on continuous state and action spaces using a representative player to study stochastic games with heterogeneous interaction among agents. This formulation admits both philosophical and mathematical advantages, compared to a widely adopted formulation using a continuum of players. We prove the existence and uniqueness of the graphon equilibrium with mild assumptions, and show that this equilibrium can be used to construct an approximate solution for finite player game on networks, which is challenging to analyze and solve due to curse of dimensionality. An online oracle-free learning algorithm is developed to solve the equilibrium numerically, and sample complexity analysis is provided for its convergence., Comment: Published as a conference paper at ICML 2024
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

442 results on '"Chen XU"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources