102,669 results on '"Li Yong"'
Search Results
2. Understanding World or Predicting Future? A Comprehensive Survey of World Models
- Author
-
Ding, Jingtao, Zhang, Yunke, Shang, Yu, Zhang, Yuheng, Zong, Zefang, Feng, Jie, Yuan, Yuan, Su, Hongyuan, Li, Nian, Sukiennik, Nicholas, Xu, Fengli, and Li, Yong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
The concept of world models has garnered significant attention due to advancements in multimodal large language models such as GPT-4 and video generation models such as Sora, which are central to the pursuit of artificial general intelligence. This survey offers a comprehensive review of the literature on world models. Generally, world models are regarded as tools for either understanding the present state of the world or predicting its future dynamics. This review presents a systematic categorization of world models, emphasizing two primary functions: (1) constructing internal representations to understand the mechanisms of the world, and (2) predicting future states to simulate and guide decision-making. Initially, we examine the current progress in these two categories. We then explore the application of world models in key domains, including autonomous driving, robotics, and social simulacra, with a focus on how each domain utilizes these aspects. Finally, we outline key challenges and provide insights into potential future research directions.
- Published
- 2024
3. A Survey on Human-Centric LLMs
- Author
-
Wang, Jing Yi, Sukiennik, Nicholas, Li, Tong, Su, Weikang, Hao, Qianyue, Xu, Jingbo, Huang, Zihan, Xu, Fengli, and Li, Yong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
The rapid evolution of large language models (LLMs) and their capacity to simulate human cognition and behavior has given rise to LLM-based frameworks and tools that are evaluated and applied based on their ability to perform tasks traditionally performed by humans, namely those involving cognition, decision-making, and social interaction. This survey provides a comprehensive examination of such human-centric LLM capabilities, focusing on their performance in both individual tasks (where an LLM acts as a stand-in for a single human) and collective tasks (where multiple LLMs coordinate to mimic group dynamics). We first evaluate LLM competencies across key areas including reasoning, perception, and social cognition, comparing their abilities to human-like skills. Then, we explore real-world applications of LLMs in human-centric domains such as behavioral science, political science, and sociology, assessing their effectiveness in replicating human behaviors and interactions. Finally, we identify challenges and future research directions, such as improving LLM adaptability, emotional intelligence, and cultural sensitivity, while addressing inherent biases and enhancing frameworks for human-AI collaboration. This survey aims to provide a foundational understanding of LLMs from a human-centric perspective, offering insights into their current capabilities and potential for future development.
- Published
- 2024
4. A Foundation Model for Unified Urban Spatio-Temporal Flow Prediction
- Author
-
Yuan, Yuan, Ding, Jingtao, Han, Chonghua, Jin, Depeng, and Li, Yong
- Subjects
Computer Science - Machine Learning - Abstract
Urban spatio-temporal flow prediction, encompassing traffic flows and crowd flows, is crucial for optimizing city infrastructure and managing traffic and emergency responses. Traditional approaches have relied on separate models tailored to either grid-based data, representing cities as uniform cells, or graph-based data, modeling cities as networks of nodes and edges. In this paper, we build UniFlow, a foundational model for general urban flow prediction that unifies both grid-based and graphbased data. We first design a multi-view spatio-temporal patching mechanism to standardize different data into a consistent sequential format and then introduce a spatio-temporal transformer architecture to capture complex correlations and dynamics. To leverage shared spatio-temporal patterns across different data types and facilitate effective cross-learning, we propose SpatioTemporal Memory Retrieval Augmentation (ST-MRA). By creating structured memory modules to store shared spatio-temporal patterns, ST-MRA enhances predictions through adaptive memory retrieval. Extensive experiments demonstrate that UniFlow outperforms existing models in both grid-based and graph-based flow prediction, excelling particularly in scenarios with limited data availability, showcasing its superior performance and broad applicability. The datasets and code implementation have been released on https://github.com/YuanYuan98/UniFlow.
- Published
- 2024
5. UrbanDiT: A Foundation Model for Open-World Urban Spatio-Temporal Learning
- Author
-
Yuan, Yuan, Han, Chonghua, Ding, Jingtao, Jin, Depeng, and Li, Yong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
The urban environment is characterized by complex spatio-temporal dynamics arising from diverse human activities and interactions. Effectively modeling these dynamics is essential for understanding and optimizing urban systems In this work, we introduce UrbanDiT, a foundation model for open-world urban spatio-temporal learning that successfully scale up diffusion transformers in this field. UrbanDiT pioneers a unified model that integrates diverse spatio-temporal data sources and types while learning universal spatio-temporal patterns across different cities and scenarios. This allows the model to unify both multi-data and multi-task learning, and effectively support a wide range of spatio-temporal applications. Its key innovation lies in the elaborated prompt learning framework, which adaptively generates both data-driven and task-specific prompts, guiding the model to deliver superior performance across various urban applications. UrbanDiT offers three primary advantages: 1) It unifies diverse data types, such as grid-based and graph-based data, into a sequential format, allowing to capture spatio-temporal dynamics across diverse scenarios of different cities; 2) With masking strategies and task-specific prompts, it supports a wide range of tasks, including bi-directional spatio-temporal prediction, temporal interpolation, spatial extrapolation, and spatio-temporal imputation; and 3) It generalizes effectively to open-world scenarios, with its powerful zero-shot capabilities outperforming nearly all baselines with training data. These features allow UrbanDiT to achieves state-of-the-art performance in different domains such as transportation traffic, crowd flows, taxi demand, bike usage, and cellular traffic, across multiple cities and tasks. UrbanDiT sets up a new benchmark for foundation models in the urban spatio-temporal domain.
- Published
- 2024
6. Unveiling Hidden Details: A RAW Data-Enhanced Paradigm for Real-World Super-Resolution
- Author
-
Peng, Long, Li, Wenbo, Guo, Jiaming, Di, Xin, Sun, Haoze, Li, Yong, Pei, Renjing, Wang, Yang, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Real-world image super-resolution (Real SR) aims to generate high-fidelity, detail-rich high-resolution (HR) images from low-resolution (LR) counterparts. Existing Real SR methods primarily focus on generating details from the LR RGB domain, often leading to a lack of richness or fidelity in fine details. In this paper, we pioneer the use of details hidden in RAW data to complement existing RGB-only methods, yielding superior outputs. We argue that key image processing steps in Image Signal Processing, such as denoising and demosaicing, inherently result in the loss of fine details in LR images, making LR RAW a valuable information source. To validate this, we present RealSR-RAW, a comprehensive dataset comprising over 10,000 pairs with LR and HR RGB images, along with corresponding LR RAW, captured across multiple smartphones under varying focal lengths and diverse scenes. Additionally, we propose a novel, general RAW adapter to efficiently integrate LR RAW data into existing CNNs, Transformers, and Diffusion-based Real SR models by suppressing the noise contained in LR RAW and aligning its distribution. Extensive experiments demonstrate that incorporating RAW data significantly enhances detail recovery and improves Real SR performance across ten evaluation metrics, including both fidelity and perception-oriented metrics. Our findings open a new direction for the Real SR task, with the dataset and code will be made available to support future research., Comment: We sincerely apologize, but due to some commercial confidentiality agreements related to the report, we have decided to withdraw the submission for now and will resubmit after making the necessary revisions
- Published
- 2024
7. Motion Before Action: Diffusing Object Motion as Manipulation Condition
- Author
-
Su, Yue, Zhan, Xinyu, Fang, Hongjie, Li, Yong-Lu, Lu, Cewu, and Yang, Lixin
- Subjects
Computer Science - Robotics - Abstract
Inferring object motion representations from observations enhances the performance of robotic manipulation tasks. This paper introduces a new paradigm for robot imitation learning that generates action sequences by reasoning about object motion from visual observations. We propose MBA (Motion Before Action), a novel module that employs two cascaded diffusion processes for object motion generation and robot action generation under object motion guidance. MBA first predicts the future pose sequence of the object based on observations, then uses this sequence as a condition to guide robot action generation. Designed as a plug-and-play component, MBA can be flexibly integrated into existing robotic manipulation policies with diffusion action heads. Extensive experiments in both simulated and real-world environments demonstrate that our approach substantially improves the performance of existing policies across a wide range of manipulation tasks. Project page: https://selen-suyue.github.io/MBApage/
- Published
- 2024
8. Long-Tailed Object Detection Pre-training: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction
- Author
-
Duan, Chen-Long, Li, Yong, Wei, Xiu-Shen, and Zhao, Lin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Pre-training plays a vital role in various vision tasks, such as object recognition and detection. Commonly used pre-training methods, which typically rely on randomized approaches like uniform or Gaussian distributions to initialize model parameters, often fall short when confronted with long-tailed distributions, especially in detection tasks. This is largely due to extreme data imbalance and the issue of simplicity bias. In this paper, we introduce a novel pre-training framework for object detection, called Dynamic Rebalancing Contrastive Learning with Dual Reconstruction (2DRCL). Our method builds on a Holistic-Local Contrastive Learning mechanism, which aligns pre-training with object detection by capturing both global contextual semantics and detailed local patterns. To tackle the imbalance inherent in long-tailed data, we design a dynamic rebalancing strategy that adjusts the sampling of underrepresented instances throughout the pre-training process, ensuring better representation of tail classes. Moreover, Dual Reconstruction addresses simplicity bias by enforcing a reconstruction task aligned with the self-consistency principle, specifically benefiting underrepresented tail classes. Experiments on COCO and LVIS v1.0 datasets demonstrate the effectiveness of our method, particularly in improving the mAP/AP scores for tail classes., Comment: Accepted by NeurIPS 2024
- Published
- 2024
9. LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation
- Author
-
Qiao, Shutong, Gao, Chen, Li, Yong, and Yin, Hongzhi
- Subjects
Computer Science - Information Retrieval - Abstract
Multi-interest modeling in current recommender systems (RS) is mainly based on user behavioral data, capturing user interest preferences from multiple dimensions. However, since behavioral data is implicit and often highly sparse, it is challenging to understand users' complex and diverse interests. Recent studies have shown that the rich semantic information in the text can effectively supplement the deficiencies of behavioral data. Despite this, it is still difficult for small models to directly extract semantic features associated with users' deep interests. That is, how to effectively align semantics with behavioral information to form a more comprehensive and accurate understanding of user interests has become a critical research problem. To address this, we propose an LLM-assisted explicit and implicit multi-interest learning framework (named EIMF) to model user interests on two levels: behavior and semantics. The framework consists of two parts: Implicit Behavioral Interest Module (IBIM) and Explicit Semantic Interest Module (ESIM). The traditional multi-interest RS model in IBIM can learn users' implicit behavioral interests from interactions with items. In ESIM, we first adopt a clustering algorithm to select typical samples and design a prompting strategy on LLM to obtain explicit semantic interests. Furthermore, in the training phase, the semantic interests of typical samples can enhance the representation learning of behavioral interests based on the multi-task learning on semantic prediction and modality alignment. Therefore, in the inference stage, accurate recommendations can be achieved with only the user's behavioral data. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed EIMF framework, which effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling., Comment: 10 pages
- Published
- 2024
10. ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization
- Author
-
Zhao, Weibo, Shi, Yubin, Lyu, Xinyu, Sui, Wanchen, Li, Shen, and Li, Yong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Quantization stands as a pivotal technique for large language model (LLM) serving, yet it poses significant challenges particularly in achieving effective low-bit quantization. The limited numerical mapping makes the quantized model produce a non-trivial error, bringing out intolerable performance degration. This paper is anchored in the basic idea of model compression objectives, and delves into the layer-wise error distribution of LLMs during post-training quantization. Subsequently, we introduce ASER, an algorithm consisting of (1) Error Reconstruction: low-rank compensation for quantization error with LoRA-style matrices constructed by whitening SVD; (2) Activation Smoothing: outlier extraction to gain smooth activation and better error compensation. ASER is capable of quantizing typical LLMs to low-bit ones, particularly preserving accuracy even in W4A8 per-channel setup. Experimental results show that ASER is competitive among the state-of-the-art quantization algorithms, showing potential to activation quantization, with minor overhead.
- Published
- 2024
11. Generalizing Hyperedge Expansion for Hyper-relational Knowledge Graph Modeling
- Author
-
Liu, Yu, Yang, Shu, Ding, Jingtao, Yao, Quanming, and Li, Yong
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
By representing knowledge in a primary triple associated with additional attribute-value qualifiers, hyper-relational knowledge graph (HKG) that generalizes triple-based knowledge graph (KG) has been attracting research attention recently. Compared with KG, HKG is enriched with the semantic qualifiers as well as the hyper-relational graph structure. However, to model HKG, existing studies mainly focus on either semantic information or structural information therein, which however fail to capture both simultaneously. To tackle this issue, in this paper, we generalize the hyperedge expansion in hypergraph learning and propose an equivalent transformation for HKG modeling, referred to as TransEQ. Specifically, the equivalent transformation transforms a HKG to a KG, which considers both semantic and structural characteristics. Then an encoder-decoder framework is developed to bridge the modeling research between KG and HKG. In the encoder part, KG-based graph neural networks are leveraged for structural modeling; while in the decoder part, various HKG-based scoring functions are exploited for semantic modeling. Especially, we design the sharing embedding mechanism in the encoder-decoder framework with semantic relatedness captured. We further theoretically prove that TransEQ preserves complete information in the equivalent transformation, and also achieves full expressivity. Finally, extensive experiments on three benchmarks demonstrate the superior performance of TransEQ in terms of both effectiveness and efficiency. On the largest benchmark WikiPeople, TransEQ significantly improves the state-of-the-art models by 15\% on MRR.
- Published
- 2024
12. Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length
- Author
-
Yu, Zihan, Ding, Jingtao, and Li, Yong
- Subjects
Computer Science - Machine Learning - Abstract
Symbolic regression, a task discovering the formula best fitting the given data, is typically based on the heuristical search. These methods usually update candidate formulas to obtain new ones with lower prediction errors iteratively. However, since formulas with similar function shapes may have completely different symbolic forms, the prediction error does not decrease monotonously as the search approaches the target formula, causing the low recovery rate of existing methods. To solve this problem, we propose a novel search objective based on the minimum description length, which reflects the distance from the target and decreases monotonically as the search approaches the correct form of the target formula. To estimate the minimum description length of any input data, we design a neural network, MDLformer, which enables robust and scalable estimation through large-scale training. With the MDLformer's output as the search objective, we implement a symbolic regression method, SR4MDL, that can effectively recover the correct mathematical form of the formula. Extensive experiments illustrate its excellent performance in recovering formulas from data. Our method successfully recovers around 50 formulas across two benchmark datasets comprising 133 problems, outperforming state-of-the-art methods by 43.92%.
- Published
- 2024
13. Towards Personalized Federated Learning via Comprehensive Knowledge Distillation
- Author
-
Wang, Pengju, Liu, Bochao, Guo, Weijia, Li, Yong, and Ge, Shiming
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Federated learning is a distributed machine learning paradigm designed to protect data privacy. However, data heterogeneity across various clients results in catastrophic forgetting, where the model rapidly forgets previous knowledge while acquiring new knowledge. To address this challenge, personalized federated learning has emerged to customize a personalized model for each client. However, the inherent limitation of this mechanism is its excessive focus on personalization, potentially hindering the generalization of those models. In this paper, we present a novel personalized federated learning method that uses global and historical models as teachers and the local model as the student to facilitate comprehensive knowledge distillation. The historical model represents the local model from the last round of client training, containing historical personalized knowledge, while the global model represents the aggregated model from the last round of server aggregation, containing global generalized knowledge. By applying knowledge distillation, we effectively transfer global generalized knowledge and historical personalized knowledge to the local model, thus mitigating catastrophic forgetting and enhancing the general performance of personalized models. Extensive experimental results demonstrate the significant advantages of our method., Comment: Accepted by IEEE SMC 2024
- Published
- 2024
14. Enhancing ID-based Recommendation with Large Language Models
- Author
-
Chen, Lei, Gao, Chen, Du, Xiaoyi, Luo, Hengliang, Jin, Depeng, Li, Yong, and Wang, Meng
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
Large Language Models (LLMs) have recently garnered significant attention in various domains, including recommendation systems. Recent research leverages the capabilities of LLMs to improve the performance and user modeling aspects of recommender systems. These studies primarily focus on utilizing LLMs to interpret textual data in recommendation tasks. However, it's worth noting that in ID-based recommendations, textual data is absent, and only ID data is available. The untapped potential of LLMs for ID data within the ID-based recommendation paradigm remains relatively unexplored. To this end, we introduce a pioneering approach called "LLM for ID-based Recommendation" (LLM4IDRec). This innovative approach integrates the capabilities of LLMs while exclusively relying on ID data, thus diverging from the previous reliance on textual data. The basic idea of LLM4IDRec is that by employing LLM to augment ID data, if augmented ID data can improve recommendation performance, it demonstrates the ability of LLM to interpret ID data effectively, exploring an innovative way for the integration of LLM in ID-based recommendation. We evaluate the effectiveness of our LLM4IDRec approach using three widely-used datasets. Our results demonstrate a notable improvement in recommendation performance, with our approach consistently outperforming existing methods in ID-based recommendation by solely augmenting input data.
- Published
- 2024
15. Flexible Coded Distributed Convolution Computing for Enhanced Fault Tolerance and Numerical Stability in Distributed CNNs
- Author
-
Tan, Shuo, Liu, Rui, Long, XianLei, Wan, Kai, Song, Linqi, and Li, Yong
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Information Theory ,Computer Science - Machine Learning - Abstract
Deploying Convolutional Neural Networks (CNNs) on resource-constrained devices necessitates efficient management of computational resources, often via distributed systems susceptible to latency from straggler nodes. This paper introduces the Flexible Coded Distributed Convolution Computing (FCDCC) framework to enhance fault tolerance and numerical stability in distributed CNNs. We extend Coded Distributed Computing (CDC) with Circulant and Rotation Matrix Embedding (CRME) which was originally proposed for matrix multiplication to high-dimensional tensor convolution. For the proposed scheme, referred to as Numerically Stable Coded Tensor Convolution (NSCTC) scheme, we also propose two new coded partitioning schemes: Adaptive-Padding Coded Partitioning (APCP) for input tensor and Kernel-Channel Coded Partitioning (KCCP) for filter tensor. These strategies enable linear decomposition of tensor convolutions and encoding them into CDC sub-tasks, combining model parallelism with coded redundancy for robust and efficient execution. Theoretical analysis identifies an optimal trade-off between communication and storage costs. Empirical results validate the framework's effectiveness in computational efficiency, fault tolerance, and scalability across various CNN architectures., Comment: 14 pages, 6 figures
- Published
- 2024
16. Zero-Shot Self-Consistency Learning for Seismic Irregular Spatial Sampling Reconstruction
- Author
-
Peng, Junheng, Liu, Yingtian, Wang, Mingwei, Li, Yong, and Li, Huating
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Physics - Geophysics ,68T07 ,I.4.5 - Abstract
Seismic exploration is currently the most important method for understanding subsurface structures. However, due to surface conditions, seismic receivers may not be uniformly distributed along the measurement line, making the entire exploration work difficult to carry out. Previous deep learning methods for reconstructing seismic data often relied on additional datasets for training. While some existing methods do not require extra data, they lack constraints on the reconstruction data, leading to unstable reconstruction performance. In this paper, we proposed a zero-shot self-consistency learning strategy and employed an extremely lightweight network for seismic data reconstruction. Our method does not require additional datasets and utilizes the correlations among different parts of the data to design a self-consistency learning loss function, driving a network with only 90,609 learnable parameters. We applied this method to experiments on the USGS National Petroleum Reserve-Alaska public dataset and the results indicate that our proposed approach achieved good reconstruction results. Additionally, our method also demonstrates a certain degree of noise suppression, which is highly beneficial for large and complex seismic exploration tasks., Comment: 12 pages, 8 figures
- Published
- 2024
17. Seven-octave ultrabroadband metamaterial absorbers via Q-weighted mode density modulation
- Author
-
Wang, Nengyin, Huang, Sibo, Zhou, Zhiling, Tsai, Din Ping, Zhu, Jie, and Li, Yong
- Subjects
Physics - Applied Physics - Abstract
Absorption is a crucial parameter in shaping wave propagation dynamics, yet achieving ultra-broadband absorption remains highly challenging, particularly in balancing low-frequency and broad bandwidth. Here, we present a metamaterial absorber (MMA) capable of achieving simultaneous spectral coverage across a seven-octave range of near-perfect absorption from 100 Hz to 12,800 Hz by engineering the quality-factor-weighted (Q-weighted) mode density. The Q-weighted mode density considers mode density, resonant frequencies, radiative loss, and intrinsic loss of multiple resonant modes, providing a comprehensive approach to govern broadband absorption properties. By optimizing the number of resonant modes and managing intrinsic losses, our approach achieves an intensive Q-weighted mode density across an ultra-wide bandwidth, enabling ultra-broadband absorption with high efficiency. These findings significantly advance the bandwidth capabilities of state-of-the-art MMAs and pave the way for the development of ultra-broadband metamaterial devices across various wave systems.
- Published
- 2024
18. Synergizing LLM Agents and Knowledge Graph for Socioeconomic Prediction in LBSN
- Author
-
Zhou, Zhilun, Fan, Jingyang, Liu, Yu, Xu, Fengli, Jin, Depeng, and Li, Yong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Social and Information Networks - Abstract
The fast development of location-based social networks (LBSNs) has led to significant changes in society, resulting in popular studies of using LBSN data for socioeconomic prediction, e.g., regional population and commercial activity estimation. Existing studies design various graphs to model heterogeneous LBSN data, and further apply graph representation learning methods for socioeconomic prediction. However, these approaches heavily rely on heuristic ideas and expertise to extract task-relevant knowledge from diverse data, which may not be optimal for specific tasks. Additionally, they tend to overlook the inherent relationships between different indicators, limiting the prediction accuracy. Motivated by the remarkable abilities of large language models (LLMs) in commonsense reasoning, embedding, and multi-agent collaboration, in this work, we synergize LLM agents and knowledge graph for socioeconomic prediction. We first construct a location-based knowledge graph (LBKG) to integrate multi-sourced LBSN data. Then we leverage the reasoning power of LLM agent to identify relevant meta-paths in the LBKG for each type of socioeconomic prediction task, and design a semantic-guided attention module for knowledge fusion with meta-paths. Moreover, we introduce a cross-task communication mechanism to further enhance performance by enabling knowledge sharing across tasks at both LLM agent and KG levels. On the one hand, the LLM agents for different tasks collaborate to generate more diverse and comprehensive meta-paths. On the other hand, the embeddings from different tasks are adaptively merged for better socioeconomic prediction. Experiments on two datasets demonstrate the effectiveness of the synergistic design between LLM and KG, providing insights for information sharing across socioeconomic prediction tasks.
- Published
- 2024
19. TrajAgent: An Agent Framework for Unified Trajectory Modelling
- Author
-
Du, Yuwei, Feng, Jie, Zhao, Jie, and Li, Yong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Trajectory modeling, which includes research on trajectory data pattern mining and future prediction, has widespread applications in areas such as life services, urban transportation, and public administration. Numerous methods have been proposed to address specific problems within trajectory modelling. However, due to the heterogeneity of data and the diversity of trajectory tasks, achieving unified trajectory modelling remains an important yet challenging task. In this paper, we propose TrajAgent, a large language model-based agentic framework, to unify various trajectory modelling tasks. In TrajAgent, we first develop UniEnv, an execution environment with a unified data and model interface, to support the execution and training of various models. Building on UniEnv, we introduce TAgent, an agentic workflow designed for automatic trajectory modelling across various trajectory tasks. Specifically, we design AutOpt, a systematic optimization module within TAgent, to further improve the performance of the integrated model. With diverse trajectory tasks input in natural language, TrajAgent automatically generates competitive results via training and executing appropriate models. Extensive experiments on four tasks using four real-world datasets demonstrate the effectiveness of TrajAgent in unified trajectory modelling, achieving an average performance improvement of 15.43% over baseline methods., Comment: 12 pages; the code will be openly accessible at: https://github.com/tsinghua-fib-lab/TrajAgent
- Published
- 2024
20. PSY: Posterior Sampling Based Privacy Enhancer in Large Language Models
- Author
-
Sun, Yulian, Duan, Li, and Li, Yong
- Subjects
Computer Science - Cryptography and Security - Abstract
Privacy vulnerabilities in LLMs, such as leakage from memorization, have been constantly identified, and various mitigation proposals have been proposed. LoRA is usually used in fine-tuning LLMs and a good entry point to insert privacy-enhancing modules. In this ongoing research, we introduce PSY, a Posterior Sampling based PrivacY enhancer that can be used in LoRA. We propose a simple yet effective realization of PSY using posterior sampling, which effectively prevents privacy leakage from intermediate information and, in turn, preserves the privacy of data owners. We evaluate LoRA extended with PSY against state-of-the-art membership inference and data extraction attacks. The experiments are executed on three different LLM architectures fine-tuned on three datasets with LoRA. In contrast to the commonly used differential privacy method, we find that our proposed modification consistently reduces the attack success rate. Meanwhile, our method has almost no negative impact on model fine-tuning or final performance. Most importantly, PSY reveals a promising path toward privacy enhancement with latent space extensions., Comment: 10 pages, 2 figures
- Published
- 2024
21. PESFormer: Boosting Macro- and Micro-expression Spotting with Direct Timestamp Encoding
- Author
-
Yu, Wang-Wang, Yang, Kai-Fu, Hu, Xiangrui, Jiang, Jingwen, Yan, Hong-Mei, and Li, Yong-Jie
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The task of macro- and micro-expression spotting aims to precisely localize and categorize temporal expression instances within untrimmed videos. Given the sparse distribution and varying durations of expressions, existing anchor-based methods often represent instances by encoding their deviations from predefined anchors. Additionally, these methods typically slice the untrimmed videos into fixed-length sliding windows. However, anchor-based encoding often fails to capture all training intervals, and slicing the original video as sliding windows can result in valuable training intervals being discarded. To overcome these limitations, we introduce PESFormer, a simple yet effective model based on the vision transformer architecture to achieve point-to-interval expression spotting. PESFormer employs a direct timestamp encoding (DTE) approach to replace anchors, enabling binary classification of each timestamp instead of optimizing entire ground truths. Thus, all training intervals are retained in the form of discrete timestamps. To maximize the utilization of training intervals, we enhance the preprocessing process by replacing the short videos produced through the sliding window method.Instead, we implement a strategy that involves zero-padding the untrimmed training videos to create uniform, longer videos of a predetermined duration. This operation efficiently preserves the original training intervals and eliminates video slice enhancement.Extensive qualitative and quantitative evaluations on three datasets -- CAS(ME)^2, CAS(ME)^3 and SAMM-LV -- demonstrate that our PESFormer outperforms existing techniques, achieving the best performance.
- Published
- 2024
22. Adaptive Convolutional Filter for Seismic Noise Attenuation
- Author
-
Peng, Junheng, Li, Yong, LIu, Yingtian, Wang, Mingwei, and Li, Huating
- Subjects
Physics - Geophysics - Abstract
Seismic exploration is currently the most mature approach for studying subsurface structures, yet the presence of noise greatly restricts its imaging accuracy. Previous methods still face significant challenges: traditional computational methods are often computationally complex and their effectiveness is hard to guarantee; deep learning approaches rely heavily on datasets, and the complexity of network training makes them difficult to apply in practical field scenarios. In this paper, we proposed a method that has only 2464 learnable parameters, and its parameter constraints rely on priors rather than requiring training data. The three priors we proposed can effectively attenuate random noise while significantly reducing signal leakage, ensuring that the separated noise remains as independent as possible from the processed data. We validated our method on National Petroleum Reserve-Alaska Survey, and the results indicate that our proposed approach effectively enhances noise elimination and seismic data resolution., Comment: 17 pages, 8 figures, this manuscript has been submitted to JGR: Machine Learning and Computation
- Published
- 2024
23. ImDy: Human Inverse Dynamics from Imitated Observations
- Author
-
Liu, Xinpeng, Liang, Junxuan, Lin, Zili, Hou, Haowen, Li, Yong-Lu, and Lu, Cewu
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics ,Computer Science - Robotics - Abstract
Inverse dynamics (ID), which aims at reproducing the driven torques from human kinematic observations, has been a critical tool for gait analysis. However, it is hindered from wider application to general motion due to its limited scalability. Conventional optimization-based ID requires expensive laboratory setups, restricting its availability. To alleviate this problem, we propose to exploit the recently progressive human motion imitation algorithms to learn human inverse dynamics in a data-driven manner. The key insight is that the human ID knowledge is implicitly possessed by motion imitators, though not directly applicable. In light of this, we devise an efficient data collection pipeline with state-of-the-art motion imitation algorithms and physics simulators, resulting in a large-scale human inverse dynamics benchmark as Imitated Dynamics (ImDy). ImDy contains over 150 hours of motion with joint torque and full-body ground reaction force data. With ImDy, we train a data-driven human inverse dynamics solver ImDyS(olver) in a fully supervised manner, which conducts ID and ground reaction force estimation simultaneously. Experiments on ImDy and real-world data demonstrate the impressive competency of ImDyS in human inverse dynamics and ground reaction force estimation. Moreover, the potential of ImDy(-S) as a fundamental motion analysis tool is exhibited with downstream applications. The project page is https://foruck.github.io/ImDy/., Comment: Yong-Lu Li and Cewu Lu are the corresponding authors
- Published
- 2024
24. Physics-driven AI for Channel Estimation in Cellular Network
- Author
-
Qi, Xiaoqian, Chai, Haoye, and Li, Yong
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
In cellular mobile networks, wireless channel quality (CQ) is a crucial factor in determining communication performance and user's network experience. Accurately predicting CQ based on real environmental characteristics, specific base station configurations and user trajectories can help network operators optimize base station deployment, improving coverage and capacity. The Received Signal Reference Power (RSRP) and Signal-to-Interference-plus-Noise Ratio (SINR) of user equipment (UE) are key indicators of CQ in wireless communication. However, existing researches have limitations in terms of generation accuracy. Regression methods such as statistical inference and random forests fail to effectively capture the unique characteristics of wireless environments; theoretical derivations relying on specific communication protocols lack generalization capability; data-driven machine learning (ML) methods like Long Short-Term Memory (LSTM) Network often suffer from a lack of interpretability. To overcome these limitations, we propose physics-informed diffusion models, which accurately generate RSRP and SINR at UE based on the wireless environment, base station configurations, and user trajectories. The model adopts a modular and end-to-end design, employing a teacher-student framework to achieve knowledge distillation. This method integrates expert knowledge into the training of diffusion models, enhancing both the interpretability and accuracy, while also facilitating faster convergence of the model parameters. Furthermore, it allows for self-adaptation in various scenarios through few-shot learning. This approach provides valuable guidance for optimizing base station deployment, predicting user network experience, and building real-world simulators., Comment: 7 pages, 6 figures
- Published
- 2024
25. Shorter Is Different: Characterizing the Dynamics of Short-Form Video Platforms
- Author
-
Chen, Zhilong, Liu, Peijie, Piao, Jinghua, Xu, Fengli, and Li, Yong
- Subjects
Computer Science - Multimedia ,Computer Science - Computers and Society ,Statistics - Computation - Abstract
The emerging short-form video platforms have been growing tremendously and become one of the leading social media recently. Although the expanded popularity of these platforms has attracted increasing research attention, there has been a lack of understanding of whether and how they deviate from traditional long-form video-sharing platforms such as YouTube and Bilibili. To address this, we conduct a large-scale data-driven analysis of Kuaishou, one of the largest short-form video platforms in China. Based on 248 million videos uploaded to the platform across all categories, we identify their notable differences from long-form video platforms through a comparison study with Bilibili, a leading long-form video platform in China. We find that videos are shortened by multiples on Kuaishou, with distinctive categorical distributions over-represented by life-related rather than interest-based videos. Users interact with videos less per view, but top videos can even more effectively acquire users' collective attention. More importantly, ordinary content creators have higher probabilities of producing hit videos. Our results shed light on the uniqueness of short-form video platforms and pave the way for future research and design for better short-form video ecology.
- Published
- 2024
26. FoMo: A Foundation Model for Mobile Traffic Forecasting with Diffusion Model
- Author
-
Chai, Haoye, Zhang, Shiyuan, Qi, Xiaoqian, and Li, Yong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Mobile traffic forecasting allows operators to anticipate network dynamics and performance in advance, offering substantial potential for enhancing service quality and improving user experience. However, existing models are often task-oriented and are trained with tailored data, which limits their effectiveness in diverse mobile network tasks of Base Station (BS) deployment, resource allocation, energy optimization, etc. and hinders generalization across different urban environments. Foundation models have made remarkable strides across various domains of NLP and CV due to their multi-tasking adaption and zero/few-shot learning capabilities. In this paper, we propose an innovative Foundation model for Mo}bile traffic forecasting (FoMo), aiming to handle diverse forecasting tasks of short/long-term predictions and distribution generation across multiple cities to support network planning and optimization. FoMo combines diffusion models and transformers, where various spatio-temporal masks are proposed to enable FoMo to learn intrinsic features of different tasks, and a contrastive learning strategy is developed to capture the correlations between mobile traffic and urban contexts, thereby improving its transfer learning capability. Extensive experiments on 9 real-world datasets demonstrate that FoMo outperforms current models concerning diverse forecasting tasks and zero/few-shot learning, showcasing a strong universality. We further deploy the FoMo on the JiuTian optimization platform of China Mobile, where we use the predicted mobile data to formulate network planning and optimization applications, including BS deployment, resource block scheduling, and BS sleep control., Comment: 17 pages, 11 figures
- Published
- 2024
27. EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
- Author
-
Gao, Chen, Zhao, Baining, Zhang, Weichen, Mao, Jinzhu, Zhang, Jun, Zheng, Zhiheng, Man, Fanhang, Fang, Jianjie, Zhou, Zile, Cui, Jinqiang, Chen, Xinlei, and Li, Yong
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Robotics - Abstract
Embodied artificial intelligence emphasizes the role of an agent's body in generating human-like behaviors. The recent efforts on EmbodiedAI pay a lot of attention to building up machine learning models to possess perceiving, planning, and acting abilities, thereby enabling real-time interaction with the world. However, most works focus on bounded indoor environments, such as navigation in a room or manipulating a device, with limited exploration of embodying the agents in open-world scenarios. That is, embodied intelligence in the open and outdoor environment is less explored, for which one potential reason is the lack of high-quality simulators, benchmarks, and datasets. To address it, in this paper, we construct a benchmark platform for embodied intelligence evaluation in real-world city environments. Specifically, we first construct a highly realistic 3D simulation environment based on the real buildings, roads, and other elements in a real city. In this environment, we combine historically collected data and simulation algorithms to conduct simulations of pedestrian and vehicle flows with high fidelity. Further, we designed a set of evaluation tasks covering different EmbodiedAI abilities. Moreover, we provide a complete set of input and output interfaces for access, enabling embodied agents to easily take task requirements and current environmental observations as input and then make decisions and obtain performance evaluations. On the one hand, it expands the capability of existing embodied intelligence to higher levels. On the other hand, it has a higher practical value in the real world and can support more potential applications for artificial general intelligence. Based on this platform, we evaluate some popular large language models for embodied intelligence capabilities of different dimensions and difficulties., Comment: All of the software, Python library, codes, datasets, tutorials, and real-time online service are available on this website: https://embodied-city.fiblab.net
- Published
- 2024
28. OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents
- Author
-
Yan, Yuwei, Zeng, Qingbin, Zheng, Zhiheng, Yuan, Jingzhe, Feng, Jie, Zhang, Jun, Xu, Fengli, and Li, Yong
- Subjects
Computer Science - Multiagent Systems ,Computer Science - Artificial Intelligence - Abstract
Agent-based models (ABMs) have long been employed to explore how individual behaviors aggregate into complex societal phenomena in urban space. Unlike black-box predictive models, ABMs excel at explaining the micro-macro linkages that drive such emergent behaviors. The recent rise of Large Language Models (LLMs) has led to the development of LLM agents capable of simulating urban activities with unprecedented realism. However, the extreme high computational cost of LLMs presents significant challenges for scaling up the simulations of LLM agents. To address this problem, we propose OpenCity, a scalable simulation platform optimized for both system and prompt efficiencies. Specifically, we propose a LLM request scheduler to reduce communication overhead by parallelizing requests through IO multiplexing. Besides, we deisgn a "group-and-distill" prompt optimization strategy minimizes redundancy by clustering agents with similar static attributes. Through experiments on six global cities, OpenCity achieves a 600-fold acceleration in simulation time per agent, a 70% reduction in LLM requests, and a 50% reduction in token usage. These improvements enable the simulation of 10,000 agents' daily activities in 1 hour on commodity hardware. Besides, the substantial speedup of OpenCity allows us to establish a urban simulation benchmark for LLM agents for the first time, comparing simulated urban activities with real-world data in 6 major cities around the globe. We believe our OpenCity platform provides a critical infrastructure to harness the power of LLMs for interdisciplinary studies in urban space, fostering the collective efforts of broader research communities. Code repo is available at https://anonymous.4open.science/r/Anonymous-OpenCity-42BD.
- Published
- 2024
29. Double opponency serves as a basis for color constancy
- Author
-
Yang, Kai-Fu and Li, Yong-Jie
- Subjects
Quantitative Biology - Neurons and Cognition - Abstract
Color constancy (CC) is one of the important perceptual abilities of the human visual system, which states that despite changes in illumination, the perceived colors of surfaces generally tend to remain constant. Nevertheless, the mechanisms underlying CC have been debated for several decades. A specific type of cell, known as the double opponent cell in the primary visual cortex (V1), is strongly implicated in achieving CC. However, the exact functioning manner of this cell type remains uncertain. In this work, our quantitative analysis of concentric double-opponent cells in V1 revealed their ability to identify gray surfaces within color-biased scenes. These gray surfaces can then be used to estimate the illumination easily. For the first time, this finding offers a clear functional explanation of concentric double-opponent receptive fields of this cell type in the visual system. Building on this insight, we introduced a novel computational theory--gray-anchoring (GA) theory--to explain how CC is achieved in the visual system. Specifically, GA-based CC involves detecting and anchoring gray surfaces within complex scenes. Our new theory serves as a bridge among the retinex theory, anchoring theory, and the neural mechanisms underlying visual CC in color vision., Comment: 12 pages, 3 figures
- Published
- 2024
30. DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization
- Author
-
Jiang, Yanfeng, Yang, Zelan, Chen, Bohua, Li, Shen, Li, Yong, and Li, Tao
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Large language models achieve exceptional performance on various downstream tasks through supervised fine-tuning. However, the diversity of downstream tasks and practical requirements makes deploying multiple full-parameter fine-tuned models challenging. Current methods that compress the delta weight struggle to achieve ultra-high compression, failing to minimize the deployment overhead. To address the above issue, we propose a novel distribution-driven delta compression framework DeltaDQ, which utilizes Group-wise Dropout and Separate Quantization to achieve ultra-high compression for the delta weight. We have observed that the matrix-computed intermediate results for the delta weight exhibit extremely small variance and min-max range characteristics, referred to as Balanced Intermediate Results. Exploiting this phenomenon, we introduce Group-wise Dropout to perform dropout on the delta weight using an optimal group size. Furthermore, using Separate Quantization, sparse weights are quantized and decomposed to achieve a lower bit. Experimental results show that DeltaDQ achieves 16x compression with improved accuracy compared to baselines for WizardMath and WizardCoder models across different parameter scales. Moreover, DeltaDQ demonstrates the ability for ultra-high compression ratio, achieving 128x compression for the WizardMath-7B model and 512x compression for the WizardMath-70B model.
- Published
- 2024
31. HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction
- Author
-
Hao, Qianyue, Fan, Jingyang, Xu, Fengli, Yuan, Jian, and Li, Yong
- Subjects
Computer Science - Digital Libraries ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,I.2.7 - Abstract
Citation networks are critical in modern science, and predicting which previous papers (candidates) will a new paper (query) cite is a critical problem. However, the roles of a paper's citations vary significantly, ranging from foundational knowledge basis to superficial contexts. Distinguishing these roles requires a deeper understanding of the logical relationships among papers, beyond simple edges in citation networks. The emergence of LLMs with textual reasoning capabilities offers new possibilities for discerning these relationships, but there are two major challenges. First, in practice, a new paper may select its citations from gigantic existing papers, where the texts exceed the context length of LLMs. Second, logical relationships between papers are implicit, and directly prompting an LLM to predict citations may result in surface-level textual similarities rather than the deeper logical reasoning. In this paper, we introduce the novel concept of core citation, which identifies the critical references that go beyond superficial mentions. Thereby, we elevate the citation prediction task from a simple binary classification to distinguishing core citations from both superficial citations and non-citations. To address this, we propose $\textbf{HLM-Cite}$, a $\textbf{H}$ybrid $\textbf{L}$anguage $\textbf{M}$odel workflow for citation prediction, which combines embedding and generative LMs. We design a curriculum finetune procedure to adapt a pretrained text embedding model to coarsely retrieve high-likelihood core citations from vast candidates and then design an LLM agentic workflow to rank the retrieved papers through one-shot reasoning, revealing the implicit relationships among papers. With the pipeline, we can scale the candidate sets to 100K papers. We evaluate HLM-Cite across 19 scientific fields, demonstrating a 17.6% performance improvement comparing SOTA methods., Comment: NeurIPS 2024 paper
- Published
- 2024
32. Continuous-wave amplitude control via the interference phenomenon in acoustic structures
- Author
-
Liu, Bingyi, Liu, Shanshan, Li, Liulin, Bi, Chuanxing, Guo, Kai, Li, Yong, and Guo, Zhongyi
- Subjects
Physics - Applied Physics - Abstract
We propose a strategy to continuously tune the amplitude of acoustic waves based on the interference among two mode-conversion paths in passive acoustic structures. The interference phenomenon is attributed to two conjugate acoustic geometric phases obtained with two mode-conversion processes in hybrid-type geometric-phase meta-atom (HGPM) pair. Notably, 100% modulation depth of the wave amplitude is achievable by simply varying the local orientation angle of meta-atom. We utilize the acoustic structure made of two cylindrical resonators to construct deep-subwavelength secondary source with designated initial phase delay, and HGPM supporting desired mode-conversion functionality is accordingly fabricated with four secondary sources. Both theory and experiment consistently verify the continuous amplitude modulation function of HGPM pair, which showcases a general scheme for reconfigurable amplitude-type acoustic meta-devices, i.e., those that require grayscale amplitude modulation for acoustic field engineering., Comment: 16 pages, 4 figures
- Published
- 2024
- Full Text
- View/download PDF
33. AgentSquare: Automatic LLM Agent Search in Modular Design Space
- Author
-
Shang, Yu, Li, Yu, Zhao, Keyu, Ma, Likai, Liu, Jiahe, Xu, Fengli, and Li, Yong
- Subjects
Computer Science - Computation and Language - Abstract
Recent advancements in Large Language Models (LLMs) have led to a rapid growth of agentic systems capable of handling a wide range of complex tasks. However, current research largely relies on manual, task-specific design, limiting their adaptability to novel tasks. In this paper, we introduce a new research problem: Modularized LLM Agent Search (MoLAS). We propose a modular design space that abstracts existing LLM agent designs into four fundamental modules with uniform IO interface: Planning, Reasoning, Tool Use, and Memory. Building on this design space, we present a novel LLM agent search framework called AgentSquare, which introduces two core mechanisms, i.e., module evolution and recombination, to efficiently search for optimized LLM agents. To further accelerate the process, we design a performance predictor that uses in-context surrogate models to skip unpromising agent designs. Extensive experiments across six benchmarks, covering the diverse scenarios of web, embodied, tool use and game applications, show that AgentSquare substantially outperforms hand-crafted agents, achieving an average performance gain of 17.2% against best-known human designs. Moreover, AgentSquare can generate interpretable design insights, enabling a deeper understanding of agentic architecture and its impact on task performance. We believe that the modular design space and AgentSquare search framework offer a platform for fully exploiting the potential of prior successful designs and consolidating the collective efforts of research community. Code repo is available at https://github.com/tsinghua-fib-lab/AgentSquare., Comment: 26 pages
- Published
- 2024
34. ShieldDiff: Suppressing Sexual Content Generation from Diffusion Models through Reinforcement Learning
- Author
-
Han, Dong, Mohamed, Salaheldin, and Li, Yong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
With the advance of generative AI, the text-to-image (T2I) model has the ability to generate various contents. However, the generated contents cannot be fully controlled. There is a potential risk that T2I model can generate unsafe images with uncomfortable contents. In our work, we focus on eliminating the NSFW (not safe for work) content generation from T2I model while maintaining the high quality of generated images by fine-tuning the pre-trained diffusion model via reinforcement learning by optimizing the well-designed content-safe reward function. The proposed method leverages a customized reward function consisting of the CLIP (Contrastive Language-Image Pre-training) and nudity rewards to prune the nudity contents that adhere to the pret-rained model and keep the corresponding semantic meaning on the safe side. In this way, the T2I model is robust to unsafe adversarial prompts since unsafe visual representations are mitigated from latent space. Extensive experiments conducted on different datasets demonstrate the effectiveness of the proposed method in alleviating unsafe content generation while preserving the high-fidelity of benign images as well as images generated by unsafe prompts. We compare with five existing state-of-the-art (SOTA) methods and achieve competitive performance on sexual content removal and image quality retention. In terms of robustness, our method outperforms counterparts under the SOTA black-box attacking model. Furthermore, our constructed method can be a benchmark for anti-NSFW generation with semantically-relevant safe alignment., Comment: 9 pages, 10 figures
- Published
- 2024
35. The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs
- Author
-
Li, Hong, Li, Nanxi, Chen, Yuanjie, Zhu, Jianbin, Guo, Qinlu, Lu, Cewu, and Li, Yong-Lu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Multi-modal Large Language Models (MLLMs) have exhibited impressive capability. However, recently many deficiencies of MLLMs have been found compared to human intelligence, $\textit{e.g.}$, hallucination. To drive the MLLMs study, the community dedicated efforts to building larger benchmarks with complex tasks. In this paper, we propose benchmarking an essential but usually overlooked intelligence: $\textbf{association}$, a human's basic capability to link observation and prior practice memory. To comprehensively investigate MLLM's performance on the association, we formulate the association task and devise a standard benchmark based on adjective and verb semantic concepts. Instead of costly data annotation and curation, we propose a convenient $\textbf{annotation-free}$ construction method transforming the general dataset for our association tasks. Simultaneously, we devise a rigorous data refinement process to eliminate confusion in the raw dataset. Building on this database, we establish three levels of association tasks: single-step, synchronous, and asynchronous associations. Moreover, we conduct a comprehensive investigation into the MLLMs' zero-shot association capabilities, addressing multiple dimensions, including three distinct memory strategies, both open-source and closed-source MLLMs, cutting-edge Mixture-of-Experts (MoE) models, and the involvement of human experts. Our systematic investigation shows that current open-source MLLMs consistently exhibit poor capability in our association tasks, even the currently state-of-the-art GPT-4V(vision) also has a significant gap compared to humans. We believe our benchmark would pave the way for future MLLM studies. $\textit{Our data and code are available at:}$ https://mvig-rhos.com/llm_inception.
- Published
- 2024
36. Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis
- Author
-
Mohamed, Salaheldin, Han, Dong, and Li, Yong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence, enabling the generation of high-quality images in diverse contexts based on specific text prompts. However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image and to create novel representations of those individuals in various settings. To address this, we leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process. Our approach diverges from prior methods that depend on fixed encoders or static face embeddings, which often fail to bridge encoding gaps. Instead, we capitalize on UNet's sophisticated encoding capabilities to process reference images across multiple scales. By innovatively altering the cross-attention layers of the UNet, we effectively fuse individual identities into the generative process. This strategic integration of facial features across various scales not only enhances the robustness and consistency of the generated images but also facilitates efficient multi-reference and multi-identity generation. Our method sets a new benchmark in identity-preserving image generation, delivering state-of-the-art results in similarity metrics while maintaining prompt alignment.
- Published
- 2024
37. Uniform exponential convergence of SAA with AMIS and asymptotics of its optimal value
- Author
-
Zhang, Wenjin and Li, Yong
- Subjects
Mathematics - Optimization and Control ,Mathematics - Probability - Abstract
We discuss in this paper uniform exponential convergence of sample average approximation (SAA) with adaptive multiple importance sampling (AMIS) and asymptotics of its optimal value. Using a concentration inequality for bounded martingale differences, we obtain a new exponential convergence rate. To study the asymptotics, we first derive an important functional central limit theorem (CLT) for martingale difference sequences. Subsequently, exploiting this result with the Delta theorem, we prove the asymptotics of optimal values for SAA with AMIS.
- Published
- 2024
38. Cyber Food Swamps: Investigating the Impacts of Online-to-Offline Food Delivery Platforms on Healthy Food Choices
- Author
-
Zhang, Yunke, Fan, Yiran, Liu, Peijie, Xu, Fengli, and Li, Yong
- Subjects
Computer Science - Computers and Society - Abstract
Online-to-offline (O2O) food delivery platforms have substantially enriched the food choices of urban residents by allowing them to conveniently access farther food outlets. However, concerns about the healthiness of delivered food persist, especially because the impact of O2O food delivery platforms on users' healthy food choices remains unclear. This study leverages large-scale empirical data from a leading O2O delivery platform to comprehensively analyze online food choice behaviors and how they are influenced by the online exposure to fast food restaurants, i.e., online food environment. Our analyses reveal significant discrepancy in food preferences across demographic groups and city sizes, where male, low-income, and younger users and those located in larger cities more likely to order fast food via O2O platforms. Besides, we also perform a comparative analysis on the food exposure differences in online and offline environments, confirming that the extended service ranges of O2O platforms can create larger "cyber food swamps". Furthermore, regression analysis highlights that a higher ratio of fast food orders is associated with "cyber food swamps", areas characterized by a higher share of accessible fast food restaurants. A 10% increase in this share raises the probability of ordering fast food by 22.0%. Moreover, a quasi-natural experiment substantiates the long-term causal effect of online food environment changes on healthy food choices. Our findings underscore the need for O2O food delivery platforms to address the health implications of online food choice exposure, thereby informing efforts by various stakeholders to improve residents' dietary health., Comment: 11 pages, 10 figures
- Published
- 2024
39. Distilling Generative-Discriminative Representations for Very Low-Resolution Face Recognition
- Author
-
Zhang, Junzheng, Guo, Weijia, Liu, Bochao, Shi, Ruixin, Li, Yong, and Ge, Shiming
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Multimedia - Abstract
Very low-resolution face recognition is challenging due to the serious loss of informative facial details in resolution degradation. In this paper, we propose a generative-discriminative representation distillation approach that combines generative representation with cross-resolution aligned knowledge distillation. This approach facilitates very low-resolution face recognition by jointly distilling generative and discriminative models via two distillation modules. Firstly, the generative representation distillation takes the encoder of a diffusion model pretrained for face super-resolution as the generative teacher to supervise the learning of the student backbone via feature regression, and then freezes the student backbone. After that, the discriminative representation distillation further considers a pretrained face recognizer as the discriminative teacher to supervise the learning of the student head via cross-resolution relational contrastive distillation. In this way, the general backbone representation can be transformed into discriminative head representation, leading to a robust and discriminative student model for very low-resolution face recognition. Our approach improves the recovery of the missing details in very low-resolution faces and achieves better knowledge transfer. Extensive experiments on face datasets demonstrate that our approach enhances the recognition accuracy of very low-resolution faces, showcasing its effectiveness and adaptability.
- Published
- 2024
40. Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management
- Author
-
Wang, Yujie, Zhu, Shenhan, Fu, Fangcheng, Miao, Xupeng, Zhang, Jie, Zhu, Juan, Hong, Fan, Li, Yong, and Cui, Bin
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Machine Learning - Abstract
Recent foundation models are capable of handling multiple machine learning (ML) tasks and multiple data modalities with the unified base model structure and several specialized model components. However, the development of such multi-task (MT) multi-modal (MM) models poses significant model management challenges to existing training systems. Due to the sophisticated model architecture and the heterogeneous workloads of different ML tasks and data modalities, training these models usually requires massive GPU resources and suffers from sub-optimal system efficiency. In this paper, we investigate how to achieve high-performance training of large-scale MT MM models through data heterogeneity-aware model management optimization. The key idea is to decompose the model execution into stages and address the joint optimization problem sequentially, including both heterogeneity-aware workload parallelization and dependency-driven execution scheduling. Based on this, we build a prototype system and evaluate it on various large MT MM models. Experiments demonstrate the superior performance and efficiency of our system, with speedup ratio up to 71% compared to state-of-the-art training systems.
- Published
- 2024
41. Large-scale Urban Facility Location Selection with Knowledge-informed Reinforcement Learning
- Author
-
Su, Hongyuan, Zheng, Yu, Ding, Jingtao, Jin, Depeng, and Li, Yong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society ,68T20 - Abstract
The facility location problem (FLP) is a classical combinatorial optimization challenge aimed at strategically laying out facilities to maximize their accessibility. In this paper, we propose a reinforcement learning method tailored to solve large-scale urban FLP, capable of producing near-optimal solutions at superfast inference speed. We distill the essential swap operation from local search, and simulate it by intelligently selecting edges on a graph of urban regions, guided by a knowledge-informed graph neural network, thus sidestepping the need for heavy computation of local search. Extensive experiments on four US cities with different geospatial conditions demonstrate that our approach can achieve comparable performance to commercial solvers with less than 5\% accessibility loss, while displaying up to 1000 times speedup. We deploy our model as an online geospatial application at https://huggingface.co/spaces/randommmm/MFLP., Comment: Sigspatial2024
- Published
- 2024
- Full Text
- View/download PDF
42. Learning Privacy-Preserving Student Networks via Discriminative-Generative Distillation
- Author
-
Ge, Shiming, Liu, Bochao, Wang, Pengju, Li, Yong, and Zeng, Dan
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security - Abstract
While deep models have proved successful in learning rich knowledge from massive well-annotated data, they may pose a privacy leakage risk in practical deployment. It is necessary to find an effective trade-off between high utility and strong privacy. In this work, we propose a discriminative-generative distillation approach to learn privacy-preserving deep models. Our key idea is taking models as bridge to distill knowledge from private data and then transfer it to learn a student network via two streams. First, discriminative stream trains a baseline classifier on private data and an ensemble of teachers on multiple disjoint private subsets, respectively. Then, generative stream takes the classifier as a fixed discriminator and trains a generator in a data-free manner. After that, the generator is used to generate massive synthetic data which are further applied to train a variational autoencoder (VAE). Among these synthetic data, a few of them are fed into the teacher ensemble to query labels via differentially private aggregation, while most of them are embedded to the trained VAE for reconstructing synthetic data. Finally, a semi-supervised student learning is performed to simultaneously handle two tasks: knowledge transfer from the teachers with distillation on few privately labeled synthetic data, and knowledge enhancement with tangent-normal adversarial regularization on many triples of reconstructed synthetic data. In this way, our approach can control query cost over private data and mitigate accuracy degradation in a unified manner, leading to a privacy-preserving student model. Extensive experiments and analysis clearly show the effectiveness of the proposed approach., Comment: This paper is accepted by IEEE Transactions on Image Processing (TIP)
- Published
- 2024
- Full Text
- View/download PDF
43. AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework
- Author
-
Feng, Jie, Du, Yuwei, Zhao, Jie, and Li, Yong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Information Retrieval - Abstract
Human mobility prediction plays a crucial role in various real-world applications. Although deep learning based models have shown promising results over the past decade, their reliance on extensive private mobility data for training and their inability to perform zero-shot predictions, have hindered further advancements. Recently, attempts have been made to apply large language models (LLMs) to mobility prediction task. However, their performance has been constrained by the absence of a systematic design of workflow. They directly generate the final output using LLMs, which limits the potential of LLMs to uncover complex mobility patterns and underestimates their extensive reserve of global geospatial knowledge. In this paper, we introduce AgentMove, a systematic agentic prediction framework to achieve generalized mobility prediction for any cities worldwide. In AgentMove, we first decompose the mobility prediction task into three sub-tasks and then design corresponding modules to complete these subtasks, including spatial-temporal memory for individual mobility pattern mining, world knowledge generator for modeling the effects of urban structure and collective knowledge extractor for capturing the shared patterns among population. Finally, we combine the results of three modules and conduct a reasoning step to generate the final predictions. Extensive experiments on mobility data from two sources in 12 cities demonstrate that AgentMove outperforms the best baseline more than 8% in various metrics and it shows robust predictions with various LLMs as base and also less geographical bias across cities. Codes and data can be found in https://github.com/tsinghua-fib-lab/AgentMove., Comment: 13 pages
- Published
- 2024
44. Adaptive Graded Denoising of Seismic Data Based on Noise Estimation and Local Similarity
- Author
-
Yang, Xueting, Li, Yong, Liao, Zhangquan, Liu, Yingtian, and Peng, Junheng
- Subjects
Physics - Geophysics ,I.4.4 - Abstract
Seismic data denoising is an important part of seismic data processing, which directly relate to the follow-up processing of seismic data. In terms of this issue, many authors proposed many methods based on rank reduction, sparse transformation, domain transformation, and deep learning. However, when the seismic data is noisy, complex and uneven, these methods often lead to over-denoising or under-denoising. To solve this problems, we proposed a novel method called noise level estimation and similarity segmentation for graded denoising. Specifically, we first assessed the average noise level of the entire seismic data and denoised it using block matching and three-dimensional filtering (BM3D) methods. Then, the denoised data is contrasted with the residual using local similarity, pinpointing regions where noise levels deviate significantly from the average. The remaining data is retained intact. These areas are then re-evaluated and denoised. Finally, we integrated the data retained after the first denoising with the re-denoising data to get a complete and cleaner data. This method is verified on theoretical model and actual seismic data. The experimental results show that this method has a good effect on seismic data with uneven noise., Comment: This article has been submitted to geophysics
- Published
- 2024
45. BIPeC: A Combined Change-Point Analyzer to Identify Performance Regressions in Large-scale Database Systems
- Author
-
Lyu, Zhan, Bach, Thomas, Li, Yong, Le, Nguyen Minh, and Hoemke, Lars
- Subjects
Computer Science - Databases - Abstract
Performance testing in large-scale database systems like SAP HANA is a crucial yet labor-intensive task, involving extensive manual analysis of thousands of measurements, such as CPU time and elapsed time. Manual maintenance of these metrics is time-consuming and susceptible to human error, making early detection of performance regressions challenging. We address these issues by proposing an automated approach to detect performance regressions in such measurements. Our approach integrates Bayesian inference with the Pruned Exact Linear Time (PELT) algorithm, enhancing the detection of change points and performance regressions with high precision and efficiency compared to previous approaches. Our method minimizes false negatives and ensures SAP HANA's system's reliability and performance quality. The proposed solution can accelerate testing and contribute to more sustainable performance management practices in large-scale data management environments.
- Published
- 2024
46. Generative Diffusion Models for High Dimensional Channel Estimation
- Author
-
Zhou, Xingyu, Liang, Le, Zhang, Jing, Jiang, Peiwen, Li, Yong, and Jin, Shi
- Subjects
Computer Science - Information Theory ,Electrical Engineering and Systems Science - Signal Processing - Abstract
Along with the prosperity of generative artificial intelligence (AI), its potential for solving conventional challenges in wireless communications has also surfaced. Inspired by this trend, we investigate the application of the advanced diffusion models (DMs), a representative class of generative AI models, to high dimensional wireless channel estimation. By capturing the structure of multiple-input multiple-output (MIMO) wireless channels via a deep generative prior encoded by DMs, we develop a novel posterior inference method for channel reconstruction. We further adapt the proposed method to recover channel information from low-resolution quantized measurements. Additionally, to enhance the over-the-air viability, we integrate the DM with the unsupervised Stein's unbiased risk estimator to enable learning from noisy observations and circumvent the requirements for ground truth channel data that is hardly available in practice. Results reveal that the proposed estimator achieves high-fidelity channel recovery while reducing estimation latency by a factor of 10 compared to state-of-the-art schemes, facilitating real-time implementation. Moreover, our method outperforms existing estimators while reducing the pilot overhead by half, showcasing its scalability to ultra-massive antenna arrays., Comment: This work has been submitted to the IEEE for possible publication
- Published
- 2024
47. Predicting Long-term Dynamics of Complex Networks via Identifying Skeleton in Hyperbolic Space
- Author
-
Li, Ruikun, Wang, Huandong, Piao, Jinghua, Liao, Qingmin, and Li, Yong
- Subjects
Computer Science - Social and Information Networks ,Physics - Physics and Society - Abstract
Learning complex network dynamics is fundamental for understanding, modeling, and controlling real-world complex systems. Though great efforts have been made to predict the future states of nodes on networks, the capability of capturing long-term dynamics remains largely limited. This is because they overlook the fact that long-term dynamics in complex network are predominantly governed by their inherent low-dimensional manifolds, i.e., skeletons. Therefore, we propose the Dynamics-Invariant Skeleton Neural Net}work (DiskNet), which identifies skeletons of complex networks based on the renormalization group structure in hyperbolic space to preserve both topological and dynamics properties. Specifically, we first condense complex networks with various dynamics into simple skeletons through physics-informed hyperbolic embeddings. Further, we design graph neural ordinary differential equations to capture the condensed dynamics on the skeletons. Finally, we recover the skeleton networks and dynamics to the original ones using a degree-based super-resolution module. Extensive experiments across three representative dynamics as well as five real-world and two synthetic networks demonstrate the superior performances of the proposed DiskNet, which outperforms the state-of-the-art baselines by an average of 10.18\% in terms of long-term prediction accuracy. Code for reproduction is available at: https://github.com/tsinghua-fib-lab/DiskNet.
- Published
- 2024
- Full Text
- View/download PDF
48. TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics
- Author
-
Liu, Chang, Ding, Jingtao, Song, Yiwen, and Li, Yong
- Subjects
Computer Science - Artificial Intelligence - Abstract
Predicting the resilience of complex networks, which represents the ability to retain fundamental functionality amidst external perturbations or internal failures, plays a critical role in understanding and improving real-world complex systems. Traditional theoretical approaches grounded in nonlinear dynamical systems rely on prior knowledge of network dynamics. On the other hand, data-driven approaches frequently encounter the challenge of insufficient labeled data, a predicament commonly observed in real-world scenarios. In this paper, we introduce a novel resilience prediction framework for complex networks, designed to tackle this issue through generative data augmentation of network topology and dynamics. The core idea is the strategic utilization of the inherent joint distribution present in unlabeled network data, facilitating the learning process of the resilience predictor by illuminating the relationship between network topology and dynamics. Experiment results on three network datasets demonstrate that our proposed framework TDNetGen can achieve high prediction accuracy up to 85%-95%. Furthermore, the framework still demonstrates a pronounced augmentation capability in extreme low-data regimes, thereby underscoring its utility and robustness in enhancing the prediction of network resilience. We have open-sourced our code in the following link, https://github.com/tsinghua-fib-lab/TDNetGen.
- Published
- 2024
- Full Text
- View/download PDF
49. A Population-to-individual Tuning Framework for Adapting Pretrained LM to On-device User Intent Prediction
- Author
-
Gong, Jiahui, Ding, Jingtao, Meng, Fanjin, Chen, Guilong, Chen, Hong, Zhao, Shen, Lu, Haisheng, and Li, Yong
- Subjects
Computer Science - Machine Learning ,Computer Science - Human-Computer Interaction - Abstract
Mobile devices, especially smartphones, can support rich functions and have developed into indispensable tools in daily life. With the rise of generative AI services, smartphones can potentially transform into personalized assistants, anticipating user needs and scheduling services accordingly. Predicting user intents on smartphones, and reflecting anticipated activities based on past interactions and context, remains a pivotal step towards this vision. Existing research predominantly focuses on specific domains, neglecting the challenge of modeling diverse event sequences across dynamic contexts. Leveraging pre-trained language models (PLMs) offers a promising avenue, yet adapting PLMs to on-device user intent prediction presents significant challenges. To address these challenges, we propose PITuning, a Population-to-Individual Tuning framework. PITuning enhances common pattern extraction through dynamic event-to-intent transition modeling and addresses long-tailed preferences via adaptive unlearning strategies. Experimental results on real-world datasets demonstrate PITuning's superior intent prediction performance, highlighting its ability to capture long-tailed preferences and its practicality for on-device prediction scenarios., Comment: accepted by KDD 2024
- Published
- 2024
50. Quantitative uniform exponential acceleration of averages along decaying waves
- Author
-
Tong, Zhicheng and Li, Yong
- Subjects
Mathematics - Dynamical Systems ,37A25, 37A30, 37A46 - Abstract
In this study, utilizing a specific exponential weighting function, we investigate the uniform exponential convergence of weighted Birkhoff averages along decaying waves and delve into several related variants. A key distinction from traditional scenarios is evident here: despite reduced regularity in observables, our method still maintains exponential convergence. In particular, we develop new techniques that yield very precise rates of exponential convergence, as evidenced by numerical simulations. Furthermore, this innovative approach extends to quantitative analyses involving different weighting functions employed by others, surpassing the limitations inherent in prior research. It also enhances the exponential convergence rates of weighted Birkhoff averages along quasi-periodic orbits via analytic observables. To the best of our knowledge, this is the first result on the uniform exponential acceleration beyond averages along quasi-periodic or almost periodic orbits, particularly from a quantitative perspective., Comment: 28 pages, 1 figure
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.