100,371 results on '"LI, Yong"'
Search Results
2. Paternal Origins of Mongolic-Speaking Populations: A Review of Studies from Recent Decades (1999–2019) and Their Implications for Multidisciplinary Research in the Future
- Author
-
Sharengaowa, Ma, Peng-Cheng, Yang, Wen-Jiao, Ochirbat, Altangoo, Zhabagin, Maxat, Sun, Na, Xie, Yong-Mei, Li, Yong-Lan, and Wei, Lan-Hai
- Published
- 2024
- Full Text
- View/download PDF
3. Paternal Origins of Mongolic-Speaking Populations: A Review of Studies from Recent Decades (1999–2019) and Their Implications for Multidisciplinary Research in the Future
- Author
-
Sharengaowa, Ma, Peng-Cheng, Yang, Wen-Jiao, Ochirbat, Altangoo, Zhabagin, Maxat, Sun, Na, Xie, Yong-Mei, Li, Yong-Lan, and Wei, Lan-Hai
- Published
- 2023
- Full Text
- View/download PDF
4. Learning Street View Representations with Spatiotemporal Contrast
- Author
-
Li, Yong, Huang, Yingjing, Mai, Gengchen, and Zhang, Fan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Street view imagery is extensively utilized in representation learning for urban visual environments, supporting various sustainable development tasks such as environmental perception and socio-economic assessment. However, it is challenging for existing image representations to specifically encode the dynamic urban environment (such as pedestrians, vehicles, and vegetation), the built environment (including buildings, roads, and urban infrastructure), and the environmental ambiance (such as the cultural and socioeconomic atmosphere) depicted in street view imagery to address downstream tasks related to the city. In this work, we propose an innovative self-supervised learning framework that leverages temporal and spatial attributes of street view imagery to learn image representations of the dynamic urban environment for diverse downstream tasks. By employing street view images captured at the same location over time and spatially nearby views at the same time, we construct contrastive learning tasks designed to learn the temporal-invariant characteristics of the built environment and the spatial-invariant neighborhood ambiance. Our approach significantly outperforms traditional supervised and unsupervised methods in tasks such as visual place recognition, socioeconomic estimation, and human-environment perception. Moreover, we demonstrate the varying behaviors of image representations learned through different contrastive learning objectives across various downstream tasks. This study systematically discusses representation learning strategies for urban studies based on street view images, providing a benchmark that enhances the applicability of visual data in urban science. The code is available at https://github.com/yonglleee/UrbanSTCL.
- Published
- 2025
5. Semi-Supervised Learning for AVO Inversion with Strong Spatial Feature Constraints
- Author
-
Liu, Yingtian, Li, Yong, Peng, Junheng, and Wang, Mingwei
- Subjects
Physics - Geophysics ,I.2.6 ,I.6.5 - Abstract
One-dimensional convolution is a widely used deep learning technique in prestack amplitude variation with offset (AVO) inversion; however, it lacks lateral continuity. Although two-dimensional convolution improves lateral continuity, due to the sparsity of well-log data, the model only learns weak spatial features and fails to explore the spatial correlations in seismic data fully. To overcome these challenges, we propose a novel AVO inversion method based on semi-supervised learning with strong spatial feature constraints (SSFC-SSL). First, two-dimensional predicted values are obtained through the inversion network, and the predicted values at well locations are sparsely represented using well-log labels. Subsequently, a label-annihilation operator is introduced, enabling the predicted values at non-well locations to learn the spatial features of well locations through the neural network. Ultimately, a two-way strong spatial feature mapping between non-well locations and well locations is achieved. Additionally, to reduce the dependence on well-log labels, we combine the semi-supervised learning strategy with a low-frequency model, further enhancing the robustness of the method. Experimental results on both synthetic example and field data demonstrate that the proposed method significantly improves lateral continuity and inversion accuracy compared to one- and two-dimensional deep learning techniques., Comment: The manuscript has been submitted to IEEE Transactions on Geoscience and Remote Sensing for reviewing
- Published
- 2025
6. Qwen2.5-1M Technical Report
- Author
-
Yang, An, Yu, Bowen, Li, Chengyuan, Liu, Dayiheng, Huang, Fei, Huang, Haoyan, Jiang, Jiandong, Tu, Jianhong, Zhang, Jianwei, Zhou, Jingren, Lin, Junyang, Dang, Kai, Yang, Kexin, Yu, Le, Li, Mei, Sun, Minmin, Zhu, Qin, Men, Rui, He, Tao, Xu, Weijia, Yin, Wenbiao, Yu, Wenyuan, Qiu, Xiafei, Ren, Xingzhang, Yang, Xinlong, Li, Yong, Xu, Zhiying, and Zhang, Zipeng
- Subjects
Computer Science - Computation and Language - Abstract
We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning are employed to effectively enhance long-context performance while reducing training costs. To promote the use of long-context models among a broader user base, we present and open-source our inference framework. This framework includes a length extrapolation method that can expand the model context lengths by at least four times, or even more, without additional training. To reduce inference costs, we implement a sparse attention method along with chunked prefill optimization for deployment scenarios and a sparsity refinement method to improve precision. Additionally, we detail our optimizations in the inference engine, including kernel optimization, pipeline parallelism, and scheduling optimization, which significantly enhance overall inference performance. By leveraging our inference framework, the Qwen2.5-1M models achieve a remarkable 3x to 7x prefill speedup in scenarios with 1 million tokens of context. This framework provides an efficient and powerful solution for developing applications that require long-context processing using open-source models. The Qwen2.5-1M series currently includes the open-source models Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, as well as the API-accessed model Qwen2.5-Turbo. Evaluations show that Qwen2.5-1M models have been greatly improved in long-context tasks without compromising performance in short-context scenarios. Specifically, the Qwen2.5-14B-Instruct-1M model significantly outperforms GPT-4o-mini in long-context tasks and supports contexts eight times longer.
- Published
- 2025
7. Unveiling the Power of Noise Priors: Enhancing Diffusion Models for Mobile Traffic Prediction
- Author
-
Sheng, Zhi, Yuan, Yuan, Ding, Jingtao, and Li, Yong
- Subjects
Computer Science - Machine Learning - Abstract
Accurate prediction of mobile traffic, \textit{i.e.,} network traffic from cellular base stations, is crucial for optimizing network performance and supporting urban development. However, the non-stationary nature of mobile traffic, driven by human activity and environmental changes, leads to both regular patterns and abrupt variations. Diffusion models excel in capturing such complex temporal dynamics due to their ability to capture the inherent uncertainties. Most existing approaches prioritize designing novel denoising networks but often neglect the critical role of noise itself, potentially leading to sub-optimal performance. In this paper, we introduce a novel perspective by emphasizing the role of noise in the denoising process. Our analysis reveals that noise fundamentally shapes mobile traffic predictions, exhibiting distinct and consistent patterns. We propose NPDiff, a framework that decomposes noise into \textit{prior} and \textit{residual} components, with the \textit{prior} derived from data dynamics, enhancing the model's ability to capture both regular and abrupt variations. NPDiff can seamlessly integrate with various diffusion-based prediction models, delivering predictions that are effective, efficient, and robust. Extensive experiments demonstrate that it achieves superior performance with an improvement over 30\%, offering a new perspective on leveraging diffusion models in this domain.
- Published
- 2025
8. One Fits All: General Mobility Trajectory Modeling via Masked Conditional Diffusion
- Author
-
Long, Qingyue, Rong, Can, Wang, Huandong, and Li, Yong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Trajectory data play a crucial role in many applications, ranging from network optimization to urban planning. Existing studies on trajectory data are task-specific, and their applicability is limited to the specific tasks on which they have been trained, such as generation, recovery, or prediction. However, the potential of a unified model has not yet been fully explored in trajectory modeling. Although various trajectory tasks differ in inputs, outputs, objectives, and conditions, they share common mobility patterns. Based on these common patterns, we can construct a general framework that enables a single model to address different tasks. However, building a trajectory task-general framework faces two critical challenges: 1) the diversity in the formats of different tasks and 2) the complexity of the conditions imposed on different tasks. In this work, we propose a general trajectory modeling framework via masked conditional diffusion (named GenMove). Specifically, we utilize mask conditions to unify diverse formats. To adapt to complex conditions associated with different tasks, we utilize historical trajectory data to obtain contextual trajectory embeddings, which include rich contexts such as spatiotemporal characteristics and user preferences. Integrating the contextual trajectory embedding into diffusion models through a classifier-free guidance approach allows the model to flexibly adjust its outputs based on different conditions. Extensive experiments on mainstream tasks demonstrate that our model significantly outperforms state-of-the-art baselines, with the highest performance improvement exceeding 13% in generation tasks.
- Published
- 2025
9. Systematic Abductive Reasoning via Diverse Relation Representations in Vector-symbolic Architecture
- Author
-
Sun, Zhong-Hua, Zhang, Ru-Yuan, Zhen, Zonglei, Wang, Da-Hui, Li, Yong-Jie, Wan, Xiaohong, and You, Hongzhi
- Subjects
Computer Science - Artificial Intelligence - Abstract
In abstract visual reasoning, monolithic deep learning models suffer from limited interpretability and generalization, while existing neuro-symbolic approaches fall short in capturing the diversity and systematicity of attributes and relation representations. To address these challenges, we propose a Systematic Abductive Reasoning model with diverse relation representations (Rel-SAR) in Vector-symbolic Architecture (VSA) to solve Raven's Progressive Matrices (RPM). To derive attribute representations with symbolic reasoning potential, we introduce not only various types of atomic vectors that represent numeric, periodic and logical semantics, but also the structured high-dimentional representation (SHDR) for the overall Grid component. For systematic reasoning, we propose novel numerical and logical relation functions and perform rule abduction and execution in a unified framework that integrates these relation representations. Experimental results demonstrate that Rel-SAR achieves significant improvement on RPM tasks and exhibits robust out-of-distribution generalization. Rel-SAR leverages the synergy between HD attribute representations and symbolic reasoning to achieve systematic abductive reasoning with both interpretable and computable semantics.
- Published
- 2025
10. Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
- Author
-
Xu, Fengli, Hao, Qianyue, Zong, Zefang, Wang, Jingwei, Zhang, Yunke, Wang, Jingyi, Lan, Xiaochong, Gong, Jiahui, Ouyang, Tianjian, Meng, Fanjin, Shao, Chenyang, Yan, Yuwei, Yang, Qinglong, Song, Yiwen, Ren, Sijian, Hu, Xinyuan, Li, Yu, Feng, Jie, Gao, Chen, and Li, Yong
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Language has long been conceived as an essential tool for human reasoning. The breakthrough of Large Language Models (LLMs) has sparked significant research interest in leveraging these models to tackle complex reasoning tasks. Researchers have moved beyond simple autoregressive token generation by introducing the concept of "thought" -- a sequence of tokens representing intermediate steps in the reasoning process. This innovative paradigm enables LLMs' to mimic complex human reasoning processes, such as tree search and reflective thinking. Recently, an emerging trend of learning to reason has applied reinforcement learning (RL) to train LLMs to master reasoning processes. This approach enables the automatic generation of high-quality reasoning trajectories through trial-and-error search algorithms, significantly expanding LLMs' reasoning capacity by providing substantially more training data. Furthermore, recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can further significantly boost reasoning accuracy. Therefore, the train-time and test-time scaling combined to show a new research frontier -- a path toward Large Reasoning Model. The introduction of OpenAI's o1 series marks a significant milestone in this research direction. In this survey, we present a comprehensive review of recent progress in LLM reasoning. We begin by introducing the foundational background of LLMs and then explore the key technical components driving the development of large reasoning models, with a focus on automated data construction, learning-to-reason techniques, and test-time scaling. We also analyze popular open-source projects at building large reasoning models, and conclude with open challenges and future research directions., Comment: 36 pages, 5 figures
- Published
- 2025
11. A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy
- Author
-
Wang, Huandong, Fu, Wenjie, Tang, Yingzhou, Chen, Zhilong, Huang, Yuxi, Piao, Jinghua, Gao, Chen, Xu, Fengli, Jiang, Tao, and Li, Yong
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Cryptography and Security ,Computer Science - Computers and Society - Abstract
While large language models (LLMs) present significant potential for supporting numerous real-world applications and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore, in this survey, we present a comprehensive review of recent advancements aimed at mitigating these issues, organized across the four phases of LLM development and usage: data collecting and pre-training, fine-tuning and alignment, prompting and reasoning, and post-processing and auditing. We elaborate on the recent advances for enhancing the performance of LLMs in terms of privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses. In contrast to previous surveys that focus on a single dimension of responsible LLMs, this survey presents a unified framework that encompasses these diverse dimensions, providing a comprehensive view of enhancing LLMs to better serve real-world applications.
- Published
- 2025
12. A Diffusive Data Augmentation Framework for Reconstruction of Complex Network Evolutionary History
- Author
-
Xu, En, Rong, Can, Ding, Jingtao, and Li, Yong
- Subjects
Computer Science - Artificial Intelligence - Abstract
The evolutionary processes of complex systems contain critical information regarding their functional characteristics. The generation time of edges provides insights into the historical evolution of various networked complex systems, such as protein-protein interaction networks, ecosystems, and social networks. Recovering these evolutionary processes holds significant scientific value, including aiding in the interpretation of the evolution of protein-protein interaction networks. However, existing methods are capable of predicting the generation times of remaining edges given a partial temporal network but often perform poorly in cross-network prediction tasks. These methods frequently fail in edge generation time recovery tasks for static networks that lack timestamps. In this work, we adopt a comparative paradigm-based framework that fuses multiple networks for training, enabling cross-network learning of the relationship between network structure and edge generation times. Compared to separate training, this approach yields an average accuracy improvement of 16.98%. Furthermore, given the difficulty in collecting temporal networks, we propose a novel diffusion-model-based generation method to produce a large number of temporal networks. By combining real temporal networks with generated ones for training, we achieve an additional average accuracy improvement of 5.46% through joint training.
- Published
- 2025
13. Quantum Birkhoff Normal Form in the $\sigma$-Bruno-R\'{u}ssmann non-resonant condition
- Author
-
Yuan, Huanhuan, Gao, Yixian, and Li, Yong
- Subjects
Mathematical Physics - Abstract
The aim of this paper is to construct a Gevrey quantum Birkhoff normal form for the $h$-differential operator $P_{h}(t),$ where $ t\in(-\frac{1}{2},\frac{1}{2})$, in the neighborhood of the union $\Lambda$ of KAM tori. This construction commences from an appropriate Birkhoff normal form of $H$ around $\Lambda$ and proceeds under the $\sigma$-Bruno-R\"{u}ssmann condition with $\sigma>1$.
- Published
- 2025
14. Emergence of human-like polarization among large language model agents
- Author
-
Piao, Jinghua, Lu, Zhihong, Gao, Chen, Xu, Fengli, Santos, Fernando P., Li, Yong, and Evans, James
- Subjects
Computer Science - Social and Information Networks ,Computer Science - Computers and Society - Abstract
Rapid advances in large language models (LLMs) have empowered autonomous agents to establish social relationships, communicate, and form shared and diverging opinions on political issues. Our understanding of their collective behaviours and underlying mechanisms remains incomplete, however, posing unexpected risks to human society. In this paper, we simulate a networked system involving thousands of large language model agents, discovering their social interactions, guided through LLM conversation, result in human-like polarization. We discover that these agents spontaneously develop their own social network with human-like properties, including homophilic clustering, but also shape their collective opinions through mechanisms observed in the real world, including the echo chamber effect. Similarities between humans and LLM agents -- encompassing behaviours, mechanisms, and emergent phenomena -- raise concerns about their capacity to amplify societal polarization, but also hold the potential to serve as a valuable testbed for identifying plausible strategies to mitigate polarization and its consequences.
- Published
- 2025
15. DehazeGS: Seeing Through Fog with 3D Gaussian Splatting
- Author
-
Yu, Jinze, Wang, Yiqun, Lu, Zhengda, Guo, Jianwei, Li, Yong, Qin, Hongxing, and Zhang, Xiaopeng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Current novel view synthesis tasks primarily rely on high-quality and clear images. However, in foggy scenes, scattering and attenuation can significantly degrade the reconstruction and rendering quality. Although NeRF-based dehazing reconstruction algorithms have been developed, their use of deep fully connected neural networks and per-ray sampling strategies leads to high computational costs. Moreover, NeRF's implicit representation struggles to recover fine details from hazy scenes. In contrast, recent advancements in 3D Gaussian Splatting achieve high-quality 3D scene reconstruction by explicitly modeling point clouds into 3D Gaussians. In this paper, we propose leveraging the explicit Gaussian representation to explain the foggy image formation process through a physically accurate forward rendering process. We introduce DehazeGS, a method capable of decomposing and rendering a fog-free background from participating media using only muti-view foggy images as input. We model the transmission within each Gaussian distribution to simulate the formation of fog. During this process, we jointly learn the atmospheric light and scattering coefficient while optimizing the Gaussian representation of the hazy scene. In the inference stage, we eliminate the effects of scattering and attenuation on the Gaussians and directly project them onto a 2D plane to obtain a clear view. Experiments on both synthetic and real-world foggy datasets demonstrate that DehazeGS achieves state-of-the-art performance in terms of both rendering quality and computational efficiency. visualizations are available at https://dehazegs.github.io/, Comment: 9 pages,4 figures. visualizations are available at https://dehazegs.github.io/
- Published
- 2025
16. Effects of hair on the image of a rotating black hole illuminated by a thin accretion disk
- Author
-
Meng, Yuan, Wang, Xi-Jing, Li, Yong-Zhuang, and Kuang, Xiao-Mei
- Subjects
General Relativity and Quantum Cosmology - Abstract
In this paper, we investigate the shadow and optical appearance of the hairy Kerr black hole illuminated by a thin accretion disk, the materials of which outside the innermost stable circular orbit (ISCO) move on the equatorial circular orbit, while inside the ISCO they quickly plunge into the black hole. The deformation parameter $\alpha$ and hair parameter $l_o$ are found to influence the motions of accretion as well as the redshift effect of the photon, such that they significantly affect the shadow and image of the hairy Kerr black hole. Especially, these two parameters have competing effects on the size of the black hole's shadow, and significantly increase the width of photon ring. This study provides a preliminary theoretical prediction that the image of the hairy Kerr black hole, especially the photon ring structure, may be used to constrain the hair parameters with future high-precision astronomical observation., Comment: 16 pages,7 figures
- Published
- 2025
17. SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation
- Author
-
Peng, Shi-Feng, Sun, Guolei, Li, Yong, Wang, Hongsong, and Xie, Guo-Sen
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
The primary challenge of cross-domain few-shot segmentation (CD-FSS) is the domain disparity between the training and inference phases, which can exist in either the input data or the target classes. Previous models struggle to learn feature representations that generalize to various unknown domains from limited training domain samples. In contrast, the large-scale visual model SAM, pre-trained on tens of millions of images from various domains and classes, possesses excellent generalizability. In this work, we propose a SAM-aware graph prompt reasoning network (GPRN) that fully leverages SAM to guide CD-FSS feature representation learning and improve prediction accuracy. Specifically, we propose a SAM-aware prompt initialization module (SPI) to transform the masks generated by SAM into visual prompts enriched with high-level semantic information. Since SAM tends to divide an object into many sub-regions, this may lead to visual prompts representing the same semantic object having inconsistent or fragmented features. We further propose a graph prompt reasoning (GPR) module that constructs a graph among visual prompts to reason about their interrelationships and enable each visual prompt to aggregate information from similar prompts, thus achieving global semantic consistency. Subsequently, each visual prompt embeds its semantic information into the corresponding mask region to assist in feature representation learning. To refine the segmentation mask during testing, we also design a non-parameter adaptive point selection module (APS) to select representative point prompts from query predictions and feed them back to SAM to refine inaccurate segmentation results. Experiments on four standard CD-FSS datasets demonstrate that our method establishes new state-of-the-art results. Code: https://github.com/CVL-hub/GPRN., Comment: AAAI 2025
- Published
- 2024
18. Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
- Author
-
Liu, Xiaoyang, Wen, Boran, Liu, Xinpeng, Zhou, Zizheng, Fan, Hongwei, Lu, Cewu, Ma, Lizhuang, Chen, Yulong, and Li, Yong-Lu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Spatio-temporal Human-Object Interaction (ST-HOI) understanding aims at detecting HOIs from videos, which is crucial for activity understanding. However, existing whole-body-object interaction video benchmarks overlook the truth that open-world objects are diverse, that is, they usually provide limited and predefined object classes. Therefore, we introduce a new open-world benchmark: Grounding Interacted Objects (GIO) including 1,098 interacted objects class and 290K interacted object boxes annotation. Accordingly, an object grounding task is proposed expecting vision systems to discover interacted objects. Even though today's detectors and grounding methods have succeeded greatly, they perform unsatisfactorily in localizing diverse and rare objects in GIO. This profoundly reveals the limitations of current vision systems and poses a great challenge. Thus, we explore leveraging spatio-temporal cues to address object grounding and propose a 4D question-answering framework (4D-QA) to discover interacted objects from diverse videos. Our method demonstrates significant superiority in extensive experiments compared to current baselines. Data and code will be publicly available at https://github.com/DirtyHarryLYL/HAKE-AVA., Comment: To be published in the Proceedings of AAAI 2025. The first three authors contributed equally. Project: https://github.com/DirtyHarryLYL/HAKE-AVA
- Published
- 2024
19. Position-aware Graph Transformer for Recommendation
- Author
-
Chen, Jiajia, Wu, Jiancan, Chen, Jiawei, Gao, Chongming, Li, Yong, and Wang, Xiang
- Subjects
Computer Science - Information Retrieval - Abstract
Collaborative recommendation fundamentally involves learning high-quality user and item representations from interaction data. Recently, graph convolution networks (GCNs) have advanced the field by utilizing high-order connectivity patterns in interaction graphs, as evidenced by state-of-the-art methods like PinSage and LightGCN. However, one key limitation has not been well addressed in existing solutions: capturing long-range collaborative filtering signals, which are crucial for modeling user preference. In this work, we propose a new graph transformer (GT) framework -- \textit{Position-aware Graph Transformer for Recommendation} (PGTR), which combines the global modeling capability of Transformer blocks with the local neighborhood feature extraction of GCNs. The key insight is to explicitly incorporate node position and structure information from the user-item interaction graph into GT architecture via several purpose-designed positional encodings. The long-range collaborative signals from the Transformer block are then combined linearly with the local neighborhood features from the GCN backbone to enhance node embeddings for final recommendations. Empirical studies demonstrate the effectiveness of the proposed PGTR method when implemented on various GCN-based backbones across four real-world datasets, and the robustness against interaction sparsity as well as noise.
- Published
- 2024
20. BladeDISC++: Memory Optimizations Based On Symbolic Shape
- Author
-
Yuan, Xiulong, Yan, Xu, Shen, Wenting, Qiu, Xiafei, Wang, Ang, Zhang, Jie, Li, Yong, and Lin, Wei
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Recent deep learning workloads exhibit dynamic characteristics, leading to the rising adoption of dynamic shape compilers. These compilers can generate efficient kernels for dynamic shape graphs characterized by a fixed graph topology and uncertain tensor shapes. However, memory optimization, although particularly crucial in this large model era, remains relatively underexplored for dynamic shape graphs. The fundamental challenge lies in the lack of precise tensor shapes which are essential in conventional methods such as operation scheduling(op scheduling) and rematerialization. To address this challenge, we propose op scheduling and rematerialization approaches based on symbolic shapes and developed BladeDISC++. Besides, since rematerialization decisions cannot be made solely at compile time when tensor shapes are unknown, BladeDISC++ employs a compilation-runtime combined strategy to optimally address shape dynamics. Evaluations indicate that BladeDISC++ effectively reduces memory usage for dynamic shape graphs, achieving memory consumption comparable to optimizations using precise shapes, thereby promoting the broader adoption of dynamic shape compilers.
- Published
- 2024
21. Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
- Author
-
Zhong, Yunshan, Zhou, Yuyao, Zhang, Yuxin, Li, Shen, Li, Yong, Chao, Fei, Zeng, Zhanpeng, and Ji, Rongrong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Data-free quantization (DFQ), which facilitates model quantization without real data to address increasing concerns about data security, has garnered significant attention within the model compression community. Recently, the unique architecture of vision transformers (ViTs) has driven the development of specialized DFQ techniques. However, we observe that the synthetic images from existing methods suffer from the deficient semantics issue compared to real images, thereby compromising performance. Motivated by this, we propose SPDFQ, a Semantics Prompting Data-Free Quantization method for ViTs. First, SPDFQ incorporates Attention Priors Alignment (APA), which uses randomly generated attention priors to enhance the semantics of synthetic images. Second, SPDFQ introduces Multi-Semantic Reinforcement (MSR), which utilizes localized patch optimization to prompt efficient parameterization and diverse semantics in synthetic images. Finally, SPDFQ employs Softlabel Learning (SL), where soft learning targets are adapted to encourage more complex semantics and accommodate images augmented by MSR. Experimental results demonstrate that SPDFQ significantly outperforms existing methods. For instance, SPDFQ achieves a 15.52% increase in top-1 accuracy on ImageNet for W4A4 ViT-B
- Published
- 2024
22. Central limit theorem for periodic solutions of stochastic differential equations driven by Levy noise
- Author
-
Deng, Xinying, Li, Yong, and Yang, Xue
- Subjects
Mathematics - Probability ,Mathematics - Dynamical Systems - Abstract
Through certain appropriate constructions, we establish periodic solutions in distribution for some stochastic differential equations with infinite-dimensional Levy noise. Additionally, we obtain the corresponding periodic measures and periodic transition semigroup. Under suitable conditions, we also achieve a certain contractivity in the space of probability measures. By constructing an appropriate invariant measure, we standardize the observation functions. Utilizing the classical martingale approximation approach, we establish the law of large numbers and the central limit theorem.
- Published
- 2024
23. A Universal Model for Human Mobility Prediction
- Author
-
Long, Qingyue, Yuan, Yuan, and Li, Yong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Predicting human mobility is crucial for urban planning, traffic control, and emergency response. Mobility behaviors can be categorized into individual and collective, and these behaviors are recorded by diverse mobility data, such as individual trajectory and crowd flow. As different modalities of mobility data, individual trajectory and crowd flow have a close coupling relationship. Crowd flows originate from the bottom-up aggregation of individual trajectories, while the constraints imposed by crowd flows shape these individual trajectories. Existing mobility prediction methods are limited to single tasks due to modal gaps between individual trajectory and crowd flow. In this work, we aim to unify mobility prediction to break through the limitations of task-specific models. We propose a universal human mobility prediction model (named UniMob), which can be applied to both individual trajectory and crowd flow. UniMob leverages a multi-view mobility tokenizer that transforms both trajectory and flow data into spatiotemporal tokens, facilitating unified sequential modeling through a diffusion transformer architecture. To bridge the gap between the different characteristics of these two data modalities, we implement a novel bidirectional individual and collective alignment mechanism. This mechanism enables learning common spatiotemporal patterns from different mobility data, facilitating mutual enhancement of both trajectory and flow predictions. Extensive experiments on real-world datasets validate the superiority of our model over state-of-the-art baselines in trajectory and flow prediction. Especially in noisy and scarce data scenarios, our model achieves the highest performance improvement of more than 14% and 25% in MAPE and Accuracy@5.
- Published
- 2024
24. M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
- Author
-
Chen, Zixuan, Li, Jiaxin, Tan, Liming, Guo, Yejie, Liang, Junxuan, Lu, Cewu, and Li, Yong-Lu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Intelligent robots need to interact with diverse objects across various environments. The appearance and state of objects frequently undergo complex transformations depending on the object properties, e.g., phase transitions. However, in the vision community, segmenting dynamic objects with phase transitions is overlooked. In light of this, we introduce the concept of phase in segmentation, which categorizes real-world objects based on their visual characteristics and potential morphological and appearance changes. Then, we present a new benchmark, Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation (M$^3$-VOS), to verify the ability of models to understand object phases, which consists of 479 high-resolution videos spanning over 10 distinct everyday scenarios. It provides dense instance mask annotations that capture both object phases and their transitions. We evaluate state-of-the-art methods on M$^3$-VOS, yielding several key insights. Notably, current appearancebased approaches show significant room for improvement when handling objects with phase transitions. The inherent changes in disorder suggest that the predictive performance of the forward entropy-increasing process can be improved through a reverse entropy-reducing process. These findings lead us to propose ReVOS, a new plug-andplay model that improves its performance by reversal refinement. Our data and code will be publicly available at https://zixuan-chen.github.io/M-cubeVOS.github.io/., Comment: 18 pages, 12 figures
- Published
- 2024
25. Re-Attentional Controllable Video Diffusion Editing
- Author
-
Wang, Yuanzhi, Li, Yong, Liu, Mengyi, Zhang, Xiaoya, Liu, Xin, Cui, Zhen, and Chan, Antoni B.
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Editing videos with textual guidance has garnered popularity due to its streamlined process which mandates users to solely edit the text prompt corresponding to the source video. Recent studies have explored and exploited large-scale text-to-image diffusion models for text-guided video editing, resulting in remarkable video editing capabilities. However, they may still suffer from some limitations such as mislocated objects, incorrect number of objects. Therefore, the controllability of video editing remains a formidable challenge. In this paper, we aim to challenge the above limitations by proposing a Re-Attentional Controllable Video Diffusion Editing (ReAtCo) method. Specially, to align the spatial placement of the target objects with the edited text prompt in a training-free manner, we propose a Re-Attentional Diffusion (RAD) to refocus the cross-attention activation responses between the edited text prompt and the target video during the denoising stage, resulting in a spatially location-aligned and semantically high-fidelity manipulated video. In particular, to faithfully preserve the invariant region content with less border artifacts, we propose an Invariant Region-guided Joint Sampling (IRJS) strategy to mitigate the intrinsic sampling errors w.r.t the invariant regions at each denoising timestep and constrain the generated content to be harmonized with the invariant region content. Experimental results verify that ReAtCo consistently improves the controllability of video diffusion editing and achieves superior video editing performance., Comment: Accepted by AAAI 2025. Codes are released at: https://github.com/mdswyz/ReAtCo
- Published
- 2024
26. Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis
- Author
-
Zhou, Feng, Liu, Ruiyang, Liu, Chen, He, Gaofeng, Li, Yong-Lu, Jin, Xiaogang, and Wang, Huamin
- Subjects
Computer Science - Graphics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Sewing patterns, the essential blueprints for fabric cutting and tailoring, act as a crucial bridge between design concepts and producible garments. However, existing uni-modal sewing pattern generation models struggle to effectively encode complex design concepts with a multi-modal nature and correlate them with vectorized sewing patterns that possess precise geometric structures and intricate sewing relations. In this work, we propose a novel sewing pattern generation approach Design2GarmentCode based on Large Multimodal Models (LMMs), to generate parametric pattern-making programs from multi-modal design concepts. LMM offers an intuitive interface for interpreting diverse design inputs, while pattern-making programs could serve as well-structured and semantically meaningful representations of sewing patterns, and act as a robust bridge connecting the cross-domain pattern-making knowledge embedded in LMMs with vectorized sewing patterns. Experimental results demonstrate that our method can flexibly handle various complex design expressions such as images, textual descriptions, designer sketches, or their combinations, and convert them into size-precise sewing patterns with correct stitches. Compared to previous methods, our approach significantly enhances training efficiency, generation quality, and authoring flexibility. Our code and data will be publicly available.
- Published
- 2024
27. AI Expands Scientists' Impact but Contracts Science's Focus
- Author
-
Hao, Qianyue, Xu, Fengli, Li, Yong, and Evans, James
- Subjects
Computer Science - Computers and Society - Abstract
The rapid rise of AI in science presents a paradox. Analyzing 67.9 million research papers across six major fields using a validated language model (F1=0.876), we explore AI's impact on science. Scientists who adopt AI tools publish 67.37% more papers, receive 3.16 times more citations, and become team leaders 4 years earlier than non-adopters. This individual success correlates with concerning on collective effects: AI-augmented research contracts the diameter of scientific topics studied, and diminishes follow-on scientific engagement. Rather than catalyzing the exploration of new fields, AI accelerates work in established, data-rich domains. This pattern suggests that while AI enhances individual scientific productivity, it may simultaneously reduce scientific diversity and broad engagement, highlighting a tension between personal advancement and collective scientific progress.
- Published
- 2024
28. SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World
- Author
-
Zhang, Jiaqi, Gao, Chen, Zhang, Liyuan, Li, Yong, and Yin, Hongzhi
- Subjects
Computer Science - Artificial Intelligence - Abstract
Recent advances in embodied agents with multimodal perception and reasoning capabilities based on large vision-language models (LVLMs), excel in autonomously interacting either real or cyber worlds, helping people make intelligent decisions in complex environments. However, the current works are normally optimized by golden action trajectories or ideal task-oriented solutions toward a definitive goal. This paradigm considers limited user-oriented factors, which could be the reason for their performance reduction in a wide range of personal assistant applications. To address this, we propose Chain-of-User-Thought (COUT), a novel embodied reasoning paradigm that takes a chain of thought from basic action thinking to explicit and implicit personalized preference thought to incorporate personalized factors into autonomous agent learning. To target COUT, we introduce SmartAgent, an agent framework perceiving cyber environments and reasoning personalized requirements as 1) interacting with GUI to access an item pool, 2) generating users' explicit requirements implied by previous actions, and 3) recommending items to fulfill users' implicit requirements. To demonstrate SmartAgent's capabilities, we also create a brand-new dataset SmartSpot that offers a full-stage personalized action-involved environment. To our best knowledge, our work is the first to formulate the COUT process, serving as a preliminary attempt towards embodied personalized agent learning. Our extensive experiments on SmartSpot illuminate SmartAgent's functionality among a series of embodied and personalized sub-tasks. We will release code and data upon paper notification at https://github.com/tsinghua-fib-lab/SmartAgent.
- Published
- 2024
29. AppGen: Mobility-aware App Usage Behavior Generation for Mobile Users
- Author
-
Huang, Zihan, Li, Tong, and Li, Yong
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Mobile app usage behavior reveals human patterns and is crucial for stakeholders, but data collection is costly and raises privacy issues. Data synthesis can address this by generating artificial datasets that mirror real-world data. In this paper, we propose AppGen, an autoregressive generative model designed to generate app usage behavior based on users' mobility trajectories, improving dataset accessibility and quality. Specifically, AppGen employs a probabilistic diffusion model to simulate the stochastic nature of app usage behavior. By utilizing an autoregressive structure, AppGen effectively captures the intricate sequential relationships between different app usage events. Additionally, AppGen leverages latent encoding to extract semantic features from spatio-temporal points, guiding behavior generation. These key designs ensure the generated behaviors are contextually relevant and faithfully represent users' environments and past interactions. Experiments with two real-world datasets show that AppGen outperforms state-of-the-art baselines by over 12% in critical metrics and accurately reflects real-world spatio-temporal patterns. We also test the generated datasets in applications, demonstrating their suitability for downstream tasks by maintaining algorithm accuracy and order.
- Published
- 2024
30. Homogeneous Dynamics Space for Heterogeneous Humans
- Author
-
Liu, Xinpeng, Liang, Junxuan, Zhang, Chenshuo, Cai, Zixuan, Lu, Cewu, and Li, Yong-Lu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Analyses of human motion kinematics have achieved tremendous advances. However, the production mechanism, known as human dynamics, is still undercovered. In this paper, we aim to push data-driven human dynamics understanding forward. We identify a major obstacle to this as the heterogeneity of existing human motion understanding efforts. Specifically, heterogeneity exists in not only the diverse kinematics representations and hierarchical dynamics representations but also in the data from different domains, namely biomechanics and reinforcement learning. With an in-depth analysis of the existing heterogeneity, we propose to emphasize the beneath homogeneity: all of them represent the homogeneous fact of human motion, though from different perspectives. Given this, we propose Homogeneous Dynamics Space (HDyS) as a fundamental space for human dynamics by aggregating heterogeneous data and training a homogeneous latent space with inspiration from the inverse-forward dynamics procedure. Leveraging the heterogeneous representations and datasets, HDyS achieves decent mapping between human kinematics and dynamics. We demonstrate the feasibility of HDyS with extensive experiments and applications. The project page is https://foruck.github.io/HDyS., Comment: Cewu Lu and Yong-Lu Li are the corresponding authors
- Published
- 2024
31. Memory-enhanced Invariant Prompt Learning for Urban Flow Prediction under Distribution Shifts
- Author
-
Jiang, Haiyang, Chen, Tong, Zhang, Wentao, Hung, Nguyen Quoc Viet, Yuan, Yuan, Li, Yong, and Cui, Lizhen
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
Urban flow prediction is a classic spatial-temporal forecasting task that estimates the amount of future traffic flow for a given location. Though models represented by Spatial-Temporal Graph Neural Networks (STGNNs) have established themselves as capable predictors, they tend to suffer from distribution shifts that are common with the urban flow data due to the dynamics and unpredictability of spatial-temporal events. Unfortunately, in spatial-temporal applications, the dynamic environments can hardly be quantified via a fixed number of parameters, whereas learning time- and location-specific environments can quickly become computationally prohibitive. In this paper, we propose a novel framework named Memory-enhanced Invariant Prompt learning (MIP) for urban flow prediction under constant distribution shifts. Specifically, MIP is equipped with a learnable memory bank that is trained to memorize the causal features within the spatial-temporal graph. By querying a trainable memory bank that stores the causal features, we adaptively extract invariant and variant prompts (i.e., patterns) for a given location at every time step. Then, instead of intervening the raw data based on simulated environments, we directly perform intervention on variant prompts across space and time. With the intervened variant prompts in place, we use invariant learning to minimize the variance of predictions, so as to ensure that the predictions are only made with invariant features. With extensive comparative experiments on two public urban flow datasets, we thoroughly demonstrate the robustness of MIP against OOD data.
- Published
- 2024
32. Noise Matters: Diffusion Model-based Urban Mobility Generation with Collaborative Noise Priors
- Author
-
Zhang, Yuheng, Yuan, Yuan, Ding, Jingtao, Yuan, Jian, and Li, Yong
- Subjects
Computer Science - Machine Learning - Abstract
With global urbanization, the focus on sustainable cities has largely grown, driving research into equity, resilience, and urban planning, which often relies on mobility data. The rise of web-based apps and mobile devices has provided valuable user data for mobility-related research. However, real-world mobility data is costly and raises privacy concerns. To protect privacy while retaining key features of real-world movement, the demand for synthetic data has steadily increased. Recent advances in diffusion models have shown great potential for mobility trajectory generation due to their ability to model randomness and uncertainty. However, existing approaches often directly apply identically distributed (i.i.d.) noise sampling from image generation techniques, which fail to account for the spatiotemporal correlations and social interactions that shape urban mobility patterns. In this paper, we propose CoDiffMob, a diffusion method for urban mobility generation with collaborative noise priors, we emphasize the critical role of noise in diffusion models for generating mobility data. By leveraging both individual movement characteristics and population-wide dynamics, we construct novel collaborative noise priors that provide richer and more informative guidance throughout the generation process. Extensive experiments demonstrate the superiority of our method, with generated data accurately capturing both individual preferences and collective patterns, achieving an improvement of over 32\%. Furthermore, it can effectively replace web-derived mobility data to better support downstream applications, while safeguarding user privacy and fostering a more secure and ethical web. This highlights its tremendous potential for applications in sustainable city-related research.
- Published
- 2024
33. Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models
- Author
-
Wang, Zehao, Liu, Xinpeng, Wu, Xiaoqian, Zhang, Yudonglin, Fang, Zhou, Fang, Yifan, Pu, Junfu, Lu, Cewu, and Li, Yong-Lu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multimodal Large Language Models (MLLMs) have garnered significant attention recently and demonstrate outstanding capabilities in various tasks such as OCR, VQA, captioning, $\textit{etc}$. However, hallucination remains a persistent issue. While numerous methods have been proposed to mitigate hallucinations, achieving notable improvements, these methods primarily focus on mitigating hallucinations about $\textbf{object/noun-related}$ concepts. Verb concepts, crucial for understanding human actions, have been largely overlooked. In this paper, to the best of our knowledge, we are the $\textbf{first}$ to investigate the $\textbf{verb hallucination}$ phenomenon of MLLMs from various perspectives. Our findings reveal that most state-of-the-art MLLMs suffer from severe verb hallucination. To assess the effectiveness of existing mitigation methods for object concept hallucination on verb hallucination, we evaluated these methods and found that they do not effectively address verb hallucination. To address this issue, we propose a novel rich verb knowledge-based tuning method to mitigate verb hallucination. The experiment results demonstrate that our method significantly reduces hallucinations related to verbs. $\textit{Our code and data will be made publicly available}$.
- Published
- 2024
34. Risk and Protective Factors for Suicidal Thoughts and Behaviors Among Asian American Young Adults: A Systematic Review.
- Author
-
Li, Yong, Chang, Tzu-Fen, Zhou, Qing, Li, Kathryn, Baiden, Philip, and Kaplan, Mark
- Subjects
Asian American ,protective factors ,risk factors ,suicidal thoughts and behaviors ,young adults - Abstract
Background: Asian American (AA) young adults, including AA college students, may experience more suicidal thoughts and behaviors (STBs) compared to other racial and ethnic groups of the same age. To the best of our knowledge, this study is the first systematic review of the risk and protective factors for STBs with a focus on AA young adults. Methods: Informed by the social-ecological perspective and the cultural model and theory of suicide, this study systematically reviews the risk and protective factors for STBs among AA young adults. Based on 22 research articles published between 1998 and 2023, we analyzed and discussed the effects of 37 risk and 15 protective factors at the individual, relationship, community, societal, and cultural levels. Results: Most risk factors are at the individual level (e.g., depressive symptoms and hopelessness), followed by factors at the cultural level (e.g., acculturation and acculturative stress), the relationship level (e.g., family problems and romantic relationship problems), the community level (e.g., verbal threats on campus), and the societal level (e.g., public stigma about mental health). Also, most protective factors are at the individual level (e.g., self-reliance and fear of suicide), followed by the relationship level (e.g., social support and family responsibilities), the community level (e.g., religious affiliations), and the cultural level (desire not to burden others). Conclusions: This systematic review emphasizes the need for future research to explore cultural factors, subgroup differences, and longitudinal designs, while advocating for culturally specific prevention and intervention strategies to improve mental health outcomes for AAYAs.
- Published
- 2024
35. KAN See Your Face
- Author
-
Han, Dong, Li, Yong, and Denzler, Joachim
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
With the advancement of face reconstruction (FR) systems, privacy-preserving face recognition (PPFR) has gained popularity for its secure face recognition, enhanced facial privacy protection, and robustness to various attacks. Besides, specific models and algorithms are proposed for face embedding protection by mapping embeddings to a secure space. However, there is a lack of studies on investigating and evaluating the possibility of extracting face images from embeddings of those systems, especially for PPFR. In this work, we introduce the first approach to exploit Kolmogorov-Arnold Network (KAN) for conducting embedding-to-face attacks against state-of-the-art (SOTA) FR and PPFR systems. Face embedding mapping (FEM) models are proposed to learn the distribution mapping relation between the embeddings from the initial domain and target domain. In comparison with Multi-Layer Perceptrons (MLP), we provide two variants, FEM-KAN and FEM-MLP, for efficient non-linear embedding-to-embedding mapping in order to reconstruct realistic face images from the corresponding face embedding. To verify our methods, we conduct extensive experiments with various PPFR and FR models. We also measure reconstructed face images with different metrics to evaluate the image quality. Through comprehensive experiments, we demonstrate the effectiveness of FEMs in accurate embedding mapping and face reconstruction., Comment: 16 pages, 8 figures
- Published
- 2024
36. Research on Optimal Portfolio Based on Multifractal Features
- Author
-
Li, Yong
- Subjects
Quantitative Finance - Portfolio Management ,05C99 - Abstract
Providing optimal portfolio selection for investors has always been one of the hot topics in academia. In view of the traditional portfolio model could not adapt to the actual capital market and can provide erroneous results. This paper innovatively constructs a mean-detrended cross-correlation portfolio model (M-DCCP model), This model is designed to embed detrended cross-correlation between different simultaneously recorded time series in the presence of nonstationary into the reward-risk criterion. We illustrate the model's effectiveness by selected five composite indexes (SSE 50, CSI 300, SSE 500, CSI 1000 and CSI 2000) in China A-share market. The empirical results show that compared with traditional mean-variance portfolio model (M-VP model), the M-DCCP model is more conducive for investors to construct optimal portfolios under the different fluctuation exponent preference and time scales preference, so as to improve portfolio's performance., Comment: 18 pages,3 postscript figures
- Published
- 2024
37. Understanding World or Predicting Future? A Comprehensive Survey of World Models
- Author
-
Ding, Jingtao, Zhang, Yunke, Shang, Yu, Zhang, Yuheng, Zong, Zefang, Feng, Jie, Yuan, Yuan, Su, Hongyuan, Li, Nian, Sukiennik, Nicholas, Xu, Fengli, and Li, Yong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
The concept of world models has garnered significant attention due to advancements in multimodal large language models such as GPT-4 and video generation models such as Sora, which are central to the pursuit of artificial general intelligence. This survey offers a comprehensive review of the literature on world models. Generally, world models are regarded as tools for either understanding the present state of the world or predicting its future dynamics. This review presents a systematic categorization of world models, emphasizing two primary functions: (1) constructing internal representations to understand the mechanisms of the world, and (2) predicting future states to simulate and guide decision-making. Initially, we examine the current progress in these two categories. We then explore the application of world models in key domains, including autonomous driving, robotics, and social simulacra, with a focus on how each domain utilizes these aspects. Finally, we outline key challenges and provide insights into potential future research directions.
- Published
- 2024
38. A Survey on Human-Centric LLMs
- Author
-
Wang, Jing Yi, Sukiennik, Nicholas, Li, Tong, Su, Weikang, Hao, Qianyue, Xu, Jingbo, Huang, Zihan, Xu, Fengli, and Li, Yong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
The rapid evolution of large language models (LLMs) and their capacity to simulate human cognition and behavior has given rise to LLM-based frameworks and tools that are evaluated and applied based on their ability to perform tasks traditionally performed by humans, namely those involving cognition, decision-making, and social interaction. This survey provides a comprehensive examination of such human-centric LLM capabilities, focusing on their performance in both individual tasks (where an LLM acts as a stand-in for a single human) and collective tasks (where multiple LLMs coordinate to mimic group dynamics). We first evaluate LLM competencies across key areas including reasoning, perception, and social cognition, comparing their abilities to human-like skills. Then, we explore real-world applications of LLMs in human-centric domains such as behavioral science, political science, and sociology, assessing their effectiveness in replicating human behaviors and interactions. Finally, we identify challenges and future research directions, such as improving LLM adaptability, emotional intelligence, and cultural sensitivity, while addressing inherent biases and enhancing frameworks for human-AI collaboration. This survey aims to provide a foundational understanding of LLMs from a human-centric perspective, offering insights into their current capabilities and potential for future development.
- Published
- 2024
39. A Foundation Model for Unified Urban Spatio-Temporal Flow Prediction
- Author
-
Yuan, Yuan, Ding, Jingtao, Han, Chonghua, Jin, Depeng, and Li, Yong
- Subjects
Computer Science - Machine Learning - Abstract
Urban spatio-temporal flow prediction, encompassing traffic flows and crowd flows, is crucial for optimizing city infrastructure and managing traffic and emergency responses. Traditional approaches have relied on separate models tailored to either grid-based data, representing cities as uniform cells, or graph-based data, modeling cities as networks of nodes and edges. In this paper, we build UniFlow, a foundational model for general urban flow prediction that unifies both grid-based and graphbased data. We first design a multi-view spatio-temporal patching mechanism to standardize different data into a consistent sequential format and then introduce a spatio-temporal transformer architecture to capture complex correlations and dynamics. To leverage shared spatio-temporal patterns across different data types and facilitate effective cross-learning, we propose SpatioTemporal Memory Retrieval Augmentation (ST-MRA). By creating structured memory modules to store shared spatio-temporal patterns, ST-MRA enhances predictions through adaptive memory retrieval. Extensive experiments demonstrate that UniFlow outperforms existing models in both grid-based and graph-based flow prediction, excelling particularly in scenarios with limited data availability, showcasing its superior performance and broad applicability. The datasets and code implementation have been released on https://github.com/YuanYuan98/UniFlow.
- Published
- 2024
40. UrbanDiT: A Foundation Model for Open-World Urban Spatio-Temporal Learning
- Author
-
Yuan, Yuan, Han, Chonghua, Ding, Jingtao, Jin, Depeng, and Li, Yong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
The urban environment is characterized by complex spatio-temporal dynamics arising from diverse human activities and interactions. Effectively modeling these dynamics is essential for understanding and optimizing urban systems In this work, we introduce UrbanDiT, a foundation model for open-world urban spatio-temporal learning that successfully scale up diffusion transformers in this field. UrbanDiT pioneers a unified model that integrates diverse spatio-temporal data sources and types while learning universal spatio-temporal patterns across different cities and scenarios. This allows the model to unify both multi-data and multi-task learning, and effectively support a wide range of spatio-temporal applications. Its key innovation lies in the elaborated prompt learning framework, which adaptively generates both data-driven and task-specific prompts, guiding the model to deliver superior performance across various urban applications. UrbanDiT offers three primary advantages: 1) It unifies diverse data types, such as grid-based and graph-based data, into a sequential format, allowing to capture spatio-temporal dynamics across diverse scenarios of different cities; 2) With masking strategies and task-specific prompts, it supports a wide range of tasks, including bi-directional spatio-temporal prediction, temporal interpolation, spatial extrapolation, and spatio-temporal imputation; and 3) It generalizes effectively to open-world scenarios, with its powerful zero-shot capabilities outperforming nearly all baselines with training data. These features allow UrbanDiT to achieves state-of-the-art performance in different domains such as transportation traffic, crowd flows, taxi demand, bike usage, and cellular traffic, across multiple cities and tasks. UrbanDiT sets up a new benchmark for foundation models in the urban spatio-temporal domain.
- Published
- 2024
41. Unveiling Hidden Details: A RAW Data-Enhanced Paradigm for Real-World Super-Resolution
- Author
-
Peng, Long, Li, Wenbo, Guo, Jiaming, Di, Xin, Sun, Haoze, Li, Yong, Pei, Renjing, Wang, Yang, Cao, Yang, and Zha, Zheng-Jun
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Real-world image super-resolution (Real SR) aims to generate high-fidelity, detail-rich high-resolution (HR) images from low-resolution (LR) counterparts. Existing Real SR methods primarily focus on generating details from the LR RGB domain, often leading to a lack of richness or fidelity in fine details. In this paper, we pioneer the use of details hidden in RAW data to complement existing RGB-only methods, yielding superior outputs. We argue that key image processing steps in Image Signal Processing, such as denoising and demosaicing, inherently result in the loss of fine details in LR images, making LR RAW a valuable information source. To validate this, we present RealSR-RAW, a comprehensive dataset comprising over 10,000 pairs with LR and HR RGB images, along with corresponding LR RAW, captured across multiple smartphones under varying focal lengths and diverse scenes. Additionally, we propose a novel, general RAW adapter to efficiently integrate LR RAW data into existing CNNs, Transformers, and Diffusion-based Real SR models by suppressing the noise contained in LR RAW and aligning its distribution. Extensive experiments demonstrate that incorporating RAW data significantly enhances detail recovery and improves Real SR performance across ten evaluation metrics, including both fidelity and perception-oriented metrics. Our findings open a new direction for the Real SR task, with the dataset and code will be made available to support future research., Comment: We sincerely apologize, but due to some commercial confidentiality agreements related to the report, we have decided to withdraw the submission for now and will resubmit after making the necessary revisions
- Published
- 2024
42. Motion Before Action: Diffusing Object Motion as Manipulation Condition
- Author
-
Su, Yue, Zhan, Xinyu, Fang, Hongjie, Li, Yong-Lu, Lu, Cewu, and Yang, Lixin
- Subjects
Computer Science - Robotics - Abstract
Inferring object motion representations from observations enhances the performance of robotic manipulation tasks. This paper introduces a new paradigm for robot imitation learning that generates action sequences by reasoning about object motion from visual observations. We propose MBA (Motion Before Action), a novel module that employs two cascaded diffusion processes for object motion generation and robot action generation under object motion guidance. MBA first predicts the future pose sequence of the object based on observations, then uses this sequence as a condition to guide robot action generation. Designed as a plug-and-play component, MBA can be flexibly integrated into existing robotic manipulation policies with diffusion action heads. Extensive experiments in both simulated and real-world environments demonstrate that our approach substantially improves the performance of existing policies across a wide range of manipulation tasks. Project page: https://selen-suyue.github.io/MBApage/
- Published
- 2024
43. Long-Tailed Object Detection Pre-training: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction
- Author
-
Duan, Chen-Long, Li, Yong, Wei, Xiu-Shen, and Zhao, Lin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Pre-training plays a vital role in various vision tasks, such as object recognition and detection. Commonly used pre-training methods, which typically rely on randomized approaches like uniform or Gaussian distributions to initialize model parameters, often fall short when confronted with long-tailed distributions, especially in detection tasks. This is largely due to extreme data imbalance and the issue of simplicity bias. In this paper, we introduce a novel pre-training framework for object detection, called Dynamic Rebalancing Contrastive Learning with Dual Reconstruction (2DRCL). Our method builds on a Holistic-Local Contrastive Learning mechanism, which aligns pre-training with object detection by capturing both global contextual semantics and detailed local patterns. To tackle the imbalance inherent in long-tailed data, we design a dynamic rebalancing strategy that adjusts the sampling of underrepresented instances throughout the pre-training process, ensuring better representation of tail classes. Moreover, Dual Reconstruction addresses simplicity bias by enforcing a reconstruction task aligned with the self-consistency principle, specifically benefiting underrepresented tail classes. Experiments on COCO and LVIS v1.0 datasets demonstrate the effectiveness of our method, particularly in improving the mAP/AP scores for tail classes., Comment: Accepted by NeurIPS 2024
- Published
- 2024
44. LLM-assisted Explicit and Implicit Multi-interest Learning Framework for Sequential Recommendation
- Author
-
Qiao, Shutong, Gao, Chen, Li, Yong, and Yin, Hongzhi
- Subjects
Computer Science - Information Retrieval - Abstract
Multi-interest modeling in current recommender systems (RS) is mainly based on user behavioral data, capturing user interest preferences from multiple dimensions. However, since behavioral data is implicit and often highly sparse, it is challenging to understand users' complex and diverse interests. Recent studies have shown that the rich semantic information in the text can effectively supplement the deficiencies of behavioral data. Despite this, it is still difficult for small models to directly extract semantic features associated with users' deep interests. That is, how to effectively align semantics with behavioral information to form a more comprehensive and accurate understanding of user interests has become a critical research problem. To address this, we propose an LLM-assisted explicit and implicit multi-interest learning framework (named EIMF) to model user interests on two levels: behavior and semantics. The framework consists of two parts: Implicit Behavioral Interest Module (IBIM) and Explicit Semantic Interest Module (ESIM). The traditional multi-interest RS model in IBIM can learn users' implicit behavioral interests from interactions with items. In ESIM, we first adopt a clustering algorithm to select typical samples and design a prompting strategy on LLM to obtain explicit semantic interests. Furthermore, in the training phase, the semantic interests of typical samples can enhance the representation learning of behavioral interests based on the multi-task learning on semantic prediction and modality alignment. Therefore, in the inference stage, accurate recommendations can be achieved with only the user's behavioral data. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed EIMF framework, which effectively and efficiently combines small models with LLM to improve the accuracy of multi-interest modeling., Comment: 10 pages
- Published
- 2024
45. ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization
- Author
-
Zhao, Weibo, Shi, Yubin, Lyu, Xinyu, Sui, Wanchen, Li, Shen, and Li, Yong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Quantization stands as a pivotal technique for large language model (LLM) serving, yet it poses significant challenges particularly in achieving effective low-bit quantization. The limited numerical mapping makes the quantized model produce a non-trivial error, bringing out intolerable performance degration. This paper is anchored in the basic idea of model compression objectives, and delves into the layer-wise error distribution of LLMs during post-training quantization. Subsequently, we introduce ASER, an algorithm consisting of (1) Error Reconstruction: low-rank compensation for quantization error with LoRA-style matrices constructed by whitening SVD; (2) Activation Smoothing: outlier extraction to gain smooth activation and better error compensation. ASER is capable of quantizing typical LLMs to low-bit ones, particularly preserving accuracy even in W4A8 per-channel setup. Experimental results show that ASER is competitive among the state-of-the-art quantization algorithms, showing potential to activation quantization, with minor overhead., Comment: Accepted at AAAI 2025
- Published
- 2024
46. Generalizing Hyperedge Expansion for Hyper-relational Knowledge Graph Modeling
- Author
-
Liu, Yu, Yang, Shu, Ding, Jingtao, Yao, Quanming, and Li, Yong
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
By representing knowledge in a primary triple associated with additional attribute-value qualifiers, hyper-relational knowledge graph (HKG) that generalizes triple-based knowledge graph (KG) has been attracting research attention recently. Compared with KG, HKG is enriched with the semantic qualifiers as well as the hyper-relational graph structure. However, to model HKG, existing studies mainly focus on either semantic information or structural information therein, which however fail to capture both simultaneously. To tackle this issue, in this paper, we generalize the hyperedge expansion in hypergraph learning and propose an equivalent transformation for HKG modeling, referred to as TransEQ. Specifically, the equivalent transformation transforms a HKG to a KG, which considers both semantic and structural characteristics. Then an encoder-decoder framework is developed to bridge the modeling research between KG and HKG. In the encoder part, KG-based graph neural networks are leveraged for structural modeling; while in the decoder part, various HKG-based scoring functions are exploited for semantic modeling. Especially, we design the sharing embedding mechanism in the encoder-decoder framework with semantic relatedness captured. We further theoretically prove that TransEQ preserves complete information in the equivalent transformation, and also achieves full expressivity. Finally, extensive experiments on three benchmarks demonstrate the superior performance of TransEQ in terms of both effectiveness and efficiency. On the largest benchmark WikiPeople, TransEQ significantly improves the state-of-the-art models by 15\% on MRR.
- Published
- 2024
47. Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length
- Author
-
Yu, Zihan, Ding, Jingtao, and Li, Yong
- Subjects
Computer Science - Machine Learning - Abstract
Symbolic regression, a task discovering the formula best fitting the given data, is typically based on the heuristical search. These methods usually update candidate formulas to obtain new ones with lower prediction errors iteratively. However, since formulas with similar function shapes may have completely different symbolic forms, the prediction error does not decrease monotonously as the search approaches the target formula, causing the low recovery rate of existing methods. To solve this problem, we propose a novel search objective based on the minimum description length, which reflects the distance from the target and decreases monotonically as the search approaches the correct form of the target formula. To estimate the minimum description length of any input data, we design a neural network, MDLformer, which enables robust and scalable estimation through large-scale training. With the MDLformer's output as the search objective, we implement a symbolic regression method, SR4MDL, that can effectively recover the correct mathematical form of the formula. Extensive experiments illustrate its excellent performance in recovering formulas from data. Our method successfully recovers around 50 formulas across two benchmark datasets comprising 133 problems, outperforming state-of-the-art methods by 43.92%.
- Published
- 2024
48. Towards Personalized Federated Learning via Comprehensive Knowledge Distillation
- Author
-
Wang, Pengju, Liu, Bochao, Guo, Weijia, Li, Yong, and Ge, Shiming
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Federated learning is a distributed machine learning paradigm designed to protect data privacy. However, data heterogeneity across various clients results in catastrophic forgetting, where the model rapidly forgets previous knowledge while acquiring new knowledge. To address this challenge, personalized federated learning has emerged to customize a personalized model for each client. However, the inherent limitation of this mechanism is its excessive focus on personalization, potentially hindering the generalization of those models. In this paper, we present a novel personalized federated learning method that uses global and historical models as teachers and the local model as the student to facilitate comprehensive knowledge distillation. The historical model represents the local model from the last round of client training, containing historical personalized knowledge, while the global model represents the aggregated model from the last round of server aggregation, containing global generalized knowledge. By applying knowledge distillation, we effectively transfer global generalized knowledge and historical personalized knowledge to the local model, thus mitigating catastrophic forgetting and enhancing the general performance of personalized models. Extensive experimental results demonstrate the significant advantages of our method., Comment: Accepted by IEEE SMC 2024
- Published
- 2024
49. Enhancing ID-based Recommendation with Large Language Models
- Author
-
Chen, Lei, Gao, Chen, Du, Xiaoyi, Luo, Hengliang, Jin, Depeng, Li, Yong, and Wang, Meng
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
Large Language Models (LLMs) have recently garnered significant attention in various domains, including recommendation systems. Recent research leverages the capabilities of LLMs to improve the performance and user modeling aspects of recommender systems. These studies primarily focus on utilizing LLMs to interpret textual data in recommendation tasks. However, it's worth noting that in ID-based recommendations, textual data is absent, and only ID data is available. The untapped potential of LLMs for ID data within the ID-based recommendation paradigm remains relatively unexplored. To this end, we introduce a pioneering approach called "LLM for ID-based Recommendation" (LLM4IDRec). This innovative approach integrates the capabilities of LLMs while exclusively relying on ID data, thus diverging from the previous reliance on textual data. The basic idea of LLM4IDRec is that by employing LLM to augment ID data, if augmented ID data can improve recommendation performance, it demonstrates the ability of LLM to interpret ID data effectively, exploring an innovative way for the integration of LLM in ID-based recommendation. We evaluate the effectiveness of our LLM4IDRec approach using three widely-used datasets. Our results demonstrate a notable improvement in recommendation performance, with our approach consistently outperforming existing methods in ID-based recommendation by solely augmenting input data.
- Published
- 2024
50. Flexible Coded Distributed Convolution Computing for Enhanced Fault Tolerance and Numerical Stability in Distributed CNNs
- Author
-
Tan, Shuo, Liu, Rui, Long, XianLei, Wan, Kai, Song, Linqi, and Li, Yong
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Information Theory ,Computer Science - Machine Learning - Abstract
Deploying Convolutional Neural Networks (CNNs) on resource-constrained devices necessitates efficient management of computational resources, often via distributed systems susceptible to latency from straggler nodes. This paper introduces the Flexible Coded Distributed Convolution Computing (FCDCC) framework to enhance fault tolerance and numerical stability in distributed CNNs. We extend Coded Distributed Computing (CDC) with Circulant and Rotation Matrix Embedding (CRME) which was originally proposed for matrix multiplication to high-dimensional tensor convolution. For the proposed scheme, referred to as Numerically Stable Coded Tensor Convolution (NSCTC) scheme, we also propose two new coded partitioning schemes: Adaptive-Padding Coded Partitioning (APCP) for input tensor and Kernel-Channel Coded Partitioning (KCCP) for filter tensor. These strategies enable linear decomposition of tensor convolutions and encoding them into CDC sub-tasks, combining model parallelism with coded redundancy for robust and efficient execution. Theoretical analysis identifies an optimal trade-off between communication and storage costs. Empirical results validate the framework's effectiveness in computational efficiency, fault tolerance, and scalability across various CNN architectures., Comment: 14 pages, 6 figures
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.