34,046 results on '"LIU, QI"'
Search Results
2. Collaborative Cognitive Diagnosis with Disentangled Representation Learning for Learner Modeling
- Author
-
Gao, Weibo, Liu, Qi, Yue, Linan, Yao, Fangzhou, Wang, Hao, Gu, Yin, and Zhang, Zheng
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Learners sharing similar implicit cognitive states often display comparable observable problem-solving performances. Leveraging collaborative connections among such similar learners proves valuable in comprehending human learning. Motivated by the success of collaborative modeling in various domains, such as recommender systems, we aim to investigate how collaborative signals among learners contribute to the diagnosis of human cognitive states (i.e., knowledge proficiency) in the context of intelligent education. The primary challenges lie in identifying implicit collaborative connections and disentangling the entangled cognitive factors of learners for improved explainability and controllability in learner Cognitive Diagnosis (CD). However, there has been no work on CD capable of simultaneously modeling collaborative and disentangled cognitive states. To address this gap, we present Coral, a Collaborative cognitive diagnosis model with disentangled representation learning. Specifically, Coral first introduces a disentangled state encoder to achieve the initial disentanglement of learners' states. Subsequently, a meticulously designed collaborative representation learning procedure captures collaborative signals. It dynamically constructs a collaborative graph of learners by iteratively searching for optimal neighbors in a context-aware manner. Using the constructed graph, collaborative information is extracted through node representation learning. Finally, a decoding process aligns the initial cognitive states and collaborative states, achieving co-disentanglement with practice performance reconstructions. Extensive experiments demonstrate the superior performance of Coral, showcasing significant improvements over state-of-the-art methods across several real-world datasets. Our code is available at https://github.com/bigdata-ustc/Coral., Comment: Accepted by NeurIPS2024
- Published
- 2024
3. LE-PDE++: Mamba for accelerating PDEs Simulations
- Author
-
Liang, Aoming, Mu, Zhaoyang, liu, Qi, Li, Ruipeng, Ge, Mingming, and Fan, Dixia
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Partial Differential Equations are foundational in modeling science and natural systems such as fluid dynamics and weather forecasting. The Latent Evolution of PDEs method is designed to address the computational intensity of classical and deep learning-based PDE solvers by proposing a scalable and efficient alternative. To enhance the efficiency and accuracy of LE-PDE, we incorporate the Mamba model, an advanced machine learning model known for its predictive efficiency and robustness in handling complex dynamic systems with a progressive learning strategy. The LE-PDE was tested on several benchmark problems. The method demonstrated a marked reduction in computational time compared to traditional solvers and standalone deep learning models while maintaining high accuracy in predicting system behavior over time. Our method doubles the inference speed compared to the LE-PDE while retaining the same level of parameter efficiency, making it well-suited for scenarios requiring long-term predictions., Comment: submitted in 08/15/2025 for artificial intelligence meeting
- Published
- 2024
4. DisenTS: Disentangled Channel Evolving Pattern Modeling for Multivariate Time Series Forecasting
- Author
-
Liu, Zhiding, Yang, Jiqian, Mao, Qingyang, Zhao, Yuze, Cheng, Mingyue, Li, Zhi, Liu, Qi, and Chen, Enhong
- Subjects
Computer Science - Machine Learning - Abstract
Multivariate time series forecasting plays a crucial role in various real-world applications. Significant efforts have been made to integrate advanced network architectures and training strategies that enhance the capture of temporal dependencies, thereby improving forecasting accuracy. On the other hand, mainstream approaches typically utilize a single unified model with simplistic channel-mixing embedding or cross-channel attention operations to account for the critical intricate inter-channel dependencies. Moreover, some methods even trade capacity for robust prediction based on the channel-independent assumption. Nonetheless, as time series data may display distinct evolving patterns due to the unique characteristics of each channel (including multiple strong seasonalities and trend changes), the unified modeling methods could yield suboptimal results. To this end, we propose DisenTS, a tailored framework for modeling disentangled channel evolving patterns in general multivariate time series forecasting. The central idea of DisenTS is to model the potential diverse patterns within the multivariate time series data in a decoupled manner. Technically, the framework employs multiple distinct forecasting models, each tasked with uncovering a unique evolving pattern. To guide the learning process without supervision of pattern partition, we introduce a novel Forecaster Aware Gate (FAG) module that generates the routing signals adaptively according to both the forecasters' states and input series' characteristics. The forecasters' states are derived from the Linear Weight Approximation (LWA) strategy, which quantizes the complex deep neural networks into compact matrices. Additionally, the Similarity Constraint (SC) is further proposed to guide each model to specialize in an underlying pattern by minimizing the mutual information between the representations.
- Published
- 2024
5. Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic Approach
- Author
-
Li, Qingchuan, Li, Jiatong, Liu, Tongxuan, Zeng, Yuting, Cheng, Mingyue, Huang, Weizhe, and Liu, Qi
- Subjects
Computer Science - Computation and Language - Abstract
Large Language Models (LLMs) have exhibited remarkable potential across a wide array of reasoning tasks, including logical reasoning. Although massive efforts have been made to empower the logical reasoning ability of LLMs via external logical symbolic solvers, crucial challenges of the poor generalization ability to questions with different features and inevitable question information loss of symbolic solver-driven approaches remain unresolved. To mitigate these issues, we introduce LINA, a LLM-driven neuro-symbolic approach for faithful logical reasoning. By enabling an LLM to autonomously perform the transition from propositional logic extraction to sophisticated logical reasoning, LINA not only bolsters the resilience of the reasoning process but also eliminates the dependency on external solvers. Additionally, through its adoption of a hypothetical-deductive reasoning paradigm, LINA effectively circumvents the expansive search space challenge that plagues traditional forward reasoning methods. Empirical evaluations demonstrate that LINA substantially outperforms both established propositional logic frameworks and conventional prompting techniques across a spectrum of five logical reasoning tasks. Specifically, LINA achieves an improvement of 24.34% over LINC on the FOLIO dataset, while also surpassing prompting strategies like CoT and CoT-SC by up to 24.02%. Our code is available at https://github.com/wufeiwuwoshihua/nshy.
- Published
- 2024
6. Temporal and spectral variations of the X-ray pulsar Cen X-3 observed by NuSTAR
- Author
-
Liu, Qi, Wang, Wei, Santangelo, Andrea, Kong, Lingda, Ji, Long, and Ducci, Lorenzo
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Solar and Stellar Astrophysics - Abstract
We report a time-resolved analysis of the accreting X-ray pulsar Cen X-3 using observations carried out by NuSTAR, which cover approximately two binary orbits in different intensity states. The pulse profile is relatively stable over the orbital phase and shows energy dependence. It has an obvious double-peaked shape in the energy band below 15 keV -- with the second pulse peak decreasing as energy increases -- and is gradually dominated by a single peak in higher energy bands. We find that the pulse profile in the energy band of 3-5 keV at high-intensity states shows a subtle triple-peaked shape, with the main peak divided into two subpeaks. We also find a positive correlation between the pulse fraction and both energy and flux. Our spectral analysis reveals that the spectra can be well described by the continuum of Fermi-Dirac cutoff and NPEX models, and the cyclotron line is detected with the centroid energies varying from 26 keV to 29 keV, along with the iron emission line around 6.4 keV. We investigated the dependence between the cyclotron resonant scattering feature (CRSF) centroid energy and luminosity and discuss the theoretical critical luminosity. Although the variation of $E_{\rm cyc}- L_X$ is not distinct, there is a possibility that the critical luminosity lies within the range of $\sim (0.5-4)\times 10^{37}$ erg s$^{-1}$ in the band of $4-78$ keV. The photon index shows a strong positive correlation with luminosity. Our orbital-phase analysis reveals that the spectral parameters show orbital variability, and the highly variable photoelectric absorption may indicate the existence of clumpy stellar winds., Comment: 10 pages, 10 figures
- Published
- 2024
- Full Text
- View/download PDF
7. RecFlow: An Industrial Full Flow Recommendation Dataset
- Author
-
Liu, Qi, Zheng, Kai, Huang, Rui, Li, Wuchao, Cai, Kuo, Chai, Yuan, Niu, Yanan, Hui, Yiqun, Han, Bing, Mou, Na, Wang, Hongning, Bao, Wentian, Yu, Yunen, Zhou, Guorui, Li, Han, Song, Yang, Lian, Defu, and Gai, Kun
- Subjects
Computer Science - Information Retrieval - Abstract
Industrial recommendation systems (RS) rely on the multi-stage pipeline to balance effectiveness and efficiency when delivering items from a vast corpus to users. Existing RS benchmark datasets primarily focus on the exposure space, where novel RS algorithms are trained and evaluated. However, when these algorithms transition to real world industrial RS, they face a critical challenge of handling unexposed items which are a significantly larger space than the exposed one. This discrepancy profoundly impacts their practical performance. Additionally, these algorithms often overlook the intricate interplay between multiple RS stages, resulting in suboptimal overall system performance. To address this issue, we introduce RecFlow, an industrial full flow recommendation dataset designed to bridge the gap between offline RS benchmarks and the real online environment. Unlike existing datasets, RecFlow includes samples not only from the exposure space but also unexposed items filtered at each stage of the RS funnel. Our dataset comprises 38M interactions from 42K users across nearly 9M items with additional 1.9B stage samples collected from 9.3M online requests over 37 days and spanning 6 stages. Leveraging the RecFlow dataset, we conduct courageous exploration experiments, showcasing its potential in designing new algorithms to enhance effectiveness by incorporating stage-specific samples. Some of these algorithms have already been deployed online, consistently yielding significant gains. We propose RecFlow as the first comprehensive benchmark dataset for the RS community, supporting research on designing algorithms at any stage, study of selection bias, debiased algorithms, multi-stage consistency and optimality, multi-task recommendation, and user behavior modeling. The RecFlow dataset, along with the corresponding source code, is available at https://github.com/RecFlow-ICLR/RecFlow.
- Published
- 2024
8. Studying the variations of the cyclotron line in Cen X-3 using Insight-HXMT
- Author
-
Liu, Qi, Wang, Wei, Yang, Wen, Chen, Xiao, and Wu, Hanji
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Solar and Stellar Astrophysics - Abstract
We investigate the cyclotron resonant scattering features (CRSFs) of the accreting X-ray pulsar Cen X-3 and significantly detect the 29 keV cyclotron line features in the hard X-ray averaged spectroscopy studies based on the recent Insight-HXMT observations in 2022, when Cen X-3 has X-ray luminosity $L_{\rm X} > \sim 5 \times 10^{37}$ erg\ s$^{-1}$ in the bands of 2 -- 60 keV. We do not find a harmonic line in the average spectra based on different continuum models. We showed that the CRSF energies have no correlation with time or luminosity in the average spectra. In addition, by performing a pulse phase-dependent spectral analysis, we revealed the fundamental line with the centroid energy ranging from 25 to 29 keV with a high significance over the spin phases. The evolution of the cyclotron line centroid energies over pulse phase is similar to the shape of pulse profiles, illustrating a positive correlation between the energy of CRSFs and the pulse phase flux., Comment: 10 pages, 7 figures
- Published
- 2024
- Full Text
- View/download PDF
9. Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models
- Author
-
Yuan, Yu, Zhao, Lili, Zhang, Kai, Zheng, Guangting, and Liu, Qi
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Large Language Models (LLMs) have shown remarkable capabilities in various natural language processing tasks. However, LLMs may rely on dataset biases as shortcuts for prediction, which can significantly impair their robustness and generalization capabilities. This paper presents Shortcut Suite, a comprehensive test suite designed to evaluate the impact of shortcuts on LLMs' performance, incorporating six shortcut types, five evaluation metrics, and four prompting strategies. Our extensive experiments yield several key findings: 1) LLMs demonstrate varying reliance on shortcuts for downstream tasks, significantly impairing their performance. 2) Larger LLMs are more likely to utilize shortcuts under zero-shot and few-shot in-context learning prompts. 3) Chain-of-thought prompting notably reduces shortcut reliance and outperforms other prompting strategies, while few-shot prompts generally underperform compared to zero-shot prompts. 4) LLMs often exhibit overconfidence in their predictions, especially when dealing with datasets that contain shortcuts. 5) LLMs generally have a lower explanation quality in shortcut-laden datasets, with errors falling into three types: distraction, disguised comprehension, and logical fallacy. Our findings offer new insights for evaluating robustness and generalization in LLMs and suggest potential directions for mitigating the reliance on shortcuts. The code is available at \url {https://github.com/yyhappier/ShortcutSuite.git}.
- Published
- 2024
10. Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
- Author
-
Jiang, Botian, Li, Lei, Li, Xiaonan, Li, Zhaowei, Feng, Xiachong, Kong, Lingpeng, Liu, Qi, and Qiu, Xipeng
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
The rapid advancement of Multimodal Large Language Models (MLLMs) has been accompanied by the development of various benchmarks to evaluate their capabilities. However, the true nature of these evaluations and the extent to which they assess multimodal reasoning versus merely leveraging the underlying Large Language Model (LLM) backbone remain unclear. This paper presents a comprehensive investigation into the role of LLM backbones in MLLM evaluation, focusing on two critical aspects: the degree to which current benchmarks truly assess multimodal reasoning and the influence of LLM prior knowledge on performance. Specifically, we introduce a modified evaluation protocol to disentangle the contributions of the LLM backbone from multimodal integration, and an automatic knowledge identification technique for diagnosing whether LLMs equip the necessary knowledge for corresponding multimodal questions. Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs. Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50\% of error rates can be attributed to insufficient world knowledge in the LLM backbone, indicating a heavy reliance on language capabilities. To address knowledge deficiencies, we propose a knowledge augmentation pipeline that achieves significant performance gains, with improvements of up to 60\% on certain datasets, resulting in a approximately 4x increase in performance. Our work provides crucial insights into the role of the LLM backbone in MLLMs, and highlights the need for more nuanced benchmarking approaches.
- Published
- 2024
11. DeltaDock: A Unified Framework for Accurate, Efficient, and Physically Reliable Molecular Docking
- Author
-
Yan, Jiaxian, Zhang, Zaixi, Zhu, Jintao, Zhang, Kai, Pei, Jianfeng, and Liu, Qi
- Subjects
Quantitative Biology - Biomolecules ,Computer Science - Machine Learning - Abstract
Molecular docking, a technique for predicting ligand binding poses, is crucial in structure-based drug design for understanding protein-ligand interactions. Recent advancements in docking methods, particularly those leveraging geometric deep learning (GDL), have demonstrated significant efficiency and accuracy advantages over traditional sampling methods. Despite these advancements, current methods are often tailored for specific docking settings, and limitations such as the neglect of protein side-chain structures, difficulties in handling large binding pockets, and challenges in predicting physically valid structures exist. To accommodate various docking settings and achieve accurate, efficient, and physically reliable docking, we propose a novel two-stage docking framework, DeltaDock, consisting of pocket prediction and site-specific docking. We innovatively reframe the pocket prediction task as a pocket-ligand alignment problem rather than direct prediction in the first stage. Then we follow a bi-level coarse-to-fine iterative refinement process to perform site-specific docking. Comprehensive experiments demonstrate the superior performance of DeltaDock. Notably, in the blind docking setting, DeltaDock achieves a 31\% relative improvement over the docking success rate compared with the previous state-of-the-art GDL model. With the consideration of physical validity, this improvement increases to about 300\%., Comment: Accepted by NeurIPS'24
- Published
- 2024
12. The Epochal Sawtooth Effect: Unveiling Training Loss Oscillations in Adam and Other Optimizers
- Author
-
Liu, Qi and Ma, Wanjing
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
In this paper, we identify and analyze a recurring training loss pattern, which we term the \textit{Epochal Sawtooth Effect (ESE)}, commonly observed during training with adaptive gradient-based optimizers, particularly Adam optimizer. This pattern is characterized by a sharp drop in loss at the beginning of each epoch, followed by a gradual increase, resulting in a sawtooth-shaped loss curve. Through empirical observations, we demonstrate that while this effect is most pronounced with Adam, it persists, although less severely, with other optimizers such as RMSProp. We provide an in-depth explanation of the underlying mechanisms that lead to the Epochal Sawtooth Effect. The influences of factors like \(\beta\), batch size, data shuffling on this pattern have been studied. We quantify the influence of \(\beta_2\) on the shape of the loss curve, showing that higher values of \(\beta_2\) result in a nearly linear increase in loss, while lower values create a concave upward trend. Our analysis reveals that this behavior stems from the adaptive learning rate controlled by the second moment estimate, with \(\beta_1\) playing a minimal role when \(\beta_2\) is large. To support our analysis, we replicate this phenomenon through a controlled quadratic minimization task. By incrementally solving a series of quadratic optimization problems using Adam, we demonstrate that the Epochal Sawtooth Effect can emerge even in simple optimization scenarios, reinforcing the generality of this pattern. This paper provides both theoretical insights and quantitative analysis, offering a comprehensive understanding of this ubiquitous phenomenon in modern optimization techniques., Comment: 15 pages, 21 figures
- Published
- 2024
13. VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
- Author
-
Li, Lei, Xie, Zhihui, Li, Mukai, Chen, Shunian, Wang, Peiyi, Chen, Liang, Yang, Yazheng, Wang, Benyou, Kong, Lingpeng, and Liu, Qi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
As large vision-language models (LVLMs) evolve rapidly, the demand for high-quality and diverse data to align these models becomes increasingly crucial. However, the creation of such data with human supervision proves costly and time-intensive. In this paper, we investigate the efficacy of AI feedback to scale supervision for aligning LVLMs. We introduce VLFeedback, the first large-scale vision-language feedback dataset, comprising over 82K multi-modal instructions and comprehensive rationales generated by off-the-shelf models without human annotations. To evaluate the effectiveness of AI feedback for vision-language alignment, we train Silkie, an LVLM fine-tuned via direct preference optimization on VLFeedback. Silkie showcases exceptional performance regarding helpfulness, visual faithfulness, and safety metrics. It outperforms its base model by 6.9\% and 9.5\% in perception and cognition tasks, reduces hallucination issues on MMHal-Bench, and exhibits enhanced resilience against red-teaming attacks. Furthermore, our analysis underscores the advantage of AI feedback, particularly in fostering preference diversity to deliver more comprehensive improvements. Our dataset, training code and models are available at https://vlf-silkie.github.io., Comment: EMNLP 2024 Main Conference camera-ready version (fixed small typos). This article supersedes arXiv:2312.10665
- Published
- 2024
14. Perceptual Quality Assessment of Octree-RAHT Encoded 3D Point Clouds
- Author
-
Duan, Dongshuai, Su, Honglei, Liu, Qi, Yuan, Hui, Gao, Wei, Song, Jiarun, and Wang, Zhou
- Subjects
Computer Science - Multimedia - Abstract
No-reference bitstream-layer point cloud quality assessment (PCQA) can be deployed without full decoding at any network node to achieve real-time quality monitoring. In this work, we focus on the PCQA problem dedicated to Octree-RAHT encoding mode. First, to address the issue that existing PCQA databases have a small scale and limited distortion levels, we establish the WPC5.0 database which is the first one dedicated to Octree-RAHT encoding mode with a scale of 400 distorted point clouds (PCs) including 4 geometric multiplied by 5 attitude distortion levels. Then, we propose the first PCQA model dedicated to Octree-RAHT encoding mode by parsing PC bitstreams without full decoding. The model introduces texture bitrate (TBPP) to predict texture complexity (TC) and further derives the texture distortion factor. In addition, the Geometric Quantization Parameter (PQS) is used to estimate the geometric distortion factor, which is then integrated into the model along with the texture distortion factor to obtain the proposed PCQA model named streamPCQ-OR. The proposed model has been compared with other advanced PCQA methods on the WPC5.0, BASICS and M-PCCD databases, and experimental results show that our model has excellent performance while having very low computational complexity, providing a reliable choice for time-critical applications. To facilitate subsequent research, the database and source code will be publicly released at https://github.com/qdushl/Waterloo-Point-Cloud-Database-5.0.
- Published
- 2024
15. Perceptual Quality Assessment of Trisoup-Lifting Encoded 3D Point Clouds
- Author
-
Long, Juncheng, Su, Honglei, Liu, Qi, Yuan, Hui, Gao, Wei, Song, Jiarun, and Wang, Zhou
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
No-reference bitstream-layer point cloud quality assessment (PCQA) can be deployed without full decoding at any network node to achieve real-time quality monitoring. In this work, we develop the first PCQA model dedicated to Trisoup-Lifting encoded 3D point clouds by analyzing bitstreams without full decoding. Specifically, we investigate the relationship among texture bitrate per point (TBPP), texture complexity (TC) and texture quantization parameter (TQP) while geometry encoding is lossless. Subsequently, we estimate TC by utilizing TQP and TBPP. Then, we establish a texture distortion evaluation model based on TC, TBPP and TQP. Ultimately, by integrating this texture distortion model with a geometry attenuation factor, a function of trisoupNodeSizeLog2 (tNSL), we acquire a comprehensive NR bitstream-layer PCQA model named streamPCQ-TL. In addition, this work establishes a database named WPC6.0, the first and largest PCQA database dedicated to Trisoup-Lifting encoding mode, encompassing 400 distorted point clouds with both 4 geometric multiplied by 5 texture distortion levels. Experiment results on M-PCCD, ICIP2020 and the proposed WPC6.0 database suggest that the proposed streamPCQ-TL model exhibits robust and notable performance in contrast to existing advanced PCQA metrics, particularly in terms of computational cost. The dataset and source code will be publicly released at https://github.com/qdushl/Waterloo-Point-Cloud-Database-6.0
- Published
- 2024
16. Learning Recommender Systems with Soft Target: A Decoupled Perspective
- Author
-
Zhang, Hao, Cheng, Mingyue, Liu, Qi, Luo, Yucong, Li, Rui, and Chen, Enhong
- Subjects
Computer Science - Information Retrieval - Abstract
Learning recommender systems with multi-class optimization objective is a prevalent setting in recommendation. However, as observed user feedback often accounts for a tiny fraction of the entire item pool, the standard Softmax loss tends to ignore the difference between potential positive feedback and truly negative feedback. To address this challenge, we propose a novel decoupled soft label optimization framework to consider the objectives as two aspects by leveraging soft labels, including target confidence and the latent interest distribution of non-target items. Futhermore, based on our carefully theoretical analysis, we design a decoupled loss function to flexibly adjust the importance of these two aspects. To maximize the performance of the proposed method, we additionally present a sensible soft-label generation algorithm that models a label propagation algorithm to explore users' latent interests in unobserved feedback via neighbors. We conduct extensive experiments on various recommendation system models and public datasets, the results demonstrate the effectiveness and generality of the proposed method., Comment: Accepted by DASFAA 2024
- Published
- 2024
17. OneNet: A Fine-Tuning Free Framework for Few-Shot Entity Linking via Large Language Model Prompting
- Author
-
Liu, Xukai, Liu, Ye, Zhang, Kai, Wang, Kehang, Liu, Qi, and Chen, Enhong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Entity Linking (EL) is the process of associating ambiguous textual mentions to specific entities in a knowledge base. Traditional EL methods heavily rely on large datasets to enhance their performance, a dependency that becomes problematic in the context of few-shot entity linking, where only a limited number of examples are available for training. To address this challenge, we present OneNet, an innovative framework that utilizes the few-shot learning capabilities of Large Language Models (LLMs) without the need for fine-tuning. To the best of our knowledge, this marks a pioneering approach to applying LLMs to few-shot entity linking tasks. OneNet is structured around three key components prompted by LLMs: (1) an entity reduction processor that simplifies inputs by summarizing and filtering out irrelevant entities, (2) a dual-perspective entity linker that combines contextual cues and prior knowledge for precise entity linking, and (3) an entity consensus judger that employs a unique consistency algorithm to alleviate the hallucination in the entity linking reasoning. Comprehensive evaluations across seven benchmark datasets reveal that OneNet outperforms current state-of-the-art entity linking methods., Comment: Accepted by EMNLP 2024 Main
- Published
- 2024
18. CursorCore: Assist Programming through Aligning Anything
- Author
-
Jiang, Hao, Liu, Qi, Li, Rui, Ye, Shengyu, and Wang, Shijin
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Software Engineering - Abstract
Large language models have been successfully applied to programming assistance tasks, such as code completion, code insertion, and instructional code editing. However, these applications remain insufficiently automated and struggle to effectively integrate various types of information during the programming process, including coding history, current code, and user instructions. In this work, we propose a new conversational framework that comprehensively integrates these information sources, collect data to train our models and evaluate their performance. Firstly, to thoroughly evaluate how well models align with different types of information and the quality of their outputs, we introduce a new benchmark, APEval (Assist Programming Eval), to comprehensively assess the performance of models in programming assistance tasks. Then, for data collection, we develop a data generation pipeline, Programming-Instruct, which synthesizes training data from diverse sources, such as GitHub and online judge platforms. This pipeline can automatically generate various types of messages throughout the programming process. Finally, using this pipeline, we generate 219K samples, fine-tune multiple models, and develop the CursorCore series. We show that CursorCore outperforms other models of comparable size. This framework unifies applications such as inline chat and automated editing, contributes to the advancement of coding assistants. Code, models and data are freely available at https://github.com/TechxGenus/CursorCore.
- Published
- 2024
19. Diffusion Auto-regressive Transformer for Effective Self-supervised Time Series Forecasting
- Author
-
Wang, Daoyu, Cheng, Mingyue, Liu, Zhiding, Liu, Qi, and Chen, Enhong
- Subjects
Computer Science - Machine Learning - Abstract
Self-supervised learning has become a popular and effective approach for enhancing time series forecasting, enabling models to learn universal representations from unlabeled data. However, effectively capturing both the global sequence dependence and local detail features within time series data remains challenging. To address this, we propose a novel generative self-supervised method called TimeDART, denoting Diffusion Auto-regressive Transformer for Time series forecasting. In TimeDART, we treat time series patches as basic modeling units. Specifically, we employ an self-attention based Transformer encoder to model the dependencies of inter-patches. Additionally, we introduce diffusion and denoising mechanisms to capture the detail locality features of intra-patch. Notably, we design a cross-attention-based denoising decoder that allows for adjustable optimization difficulty in the self-supervised task, facilitating more effective self-supervised pre-training. Furthermore, the entire model is optimized in an auto-regressive manner to obtain transferable representations. Extensive experiments demonstrate that TimeDART achieves state-of-the-art fine-tuning performance compared to the most advanced competitive methods in forecasting tasks. Our code is publicly available at https://github.com/Melmaphother/TimeDART., Comment: 19 pages, 3 figures, ICLR 2025
- Published
- 2024
20. Temporal Reasoning Transfer from Text to Video
- Author
-
Li, Lei, Liu, Yuanxin, Yao, Linli, Zhang, Peiyuan, An, Chenxin, Wang, Lean, Sun, Xu, Kong, Lingpeng, and Liu, Qi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Video Large Language Models (Video LLMs) have shown promising capabilities in video comprehension, yet they struggle with tracking temporal changes and reasoning about temporal relationships. While previous research attributed this limitation to the ineffective temporal encoding of visual inputs, our diagnostic study reveals that video representations contain sufficient information for even small probing classifiers to achieve perfect accuracy. Surprisingly, we find that the key bottleneck in Video LLMs' temporal reasoning capability stems from the underlying LLM's inherent difficulty with temporal concepts, as evidenced by poor performance on textual temporal question-answering tasks. Building on this discovery, we introduce the Textual Temporal reasoning Transfer (T3). T3 synthesizes diverse temporal reasoning tasks in pure text format from existing image-text datasets, addressing the scarcity of video samples with complex temporal scenarios. Remarkably, without using any video data, T3 enhances LongVA-7B's temporal understanding, yielding a 5.3 absolute accuracy improvement on the challenging TempCompass benchmark, which enables our model to outperform ShareGPT4Video-8B trained on 28,000 video samples. Additionally, the enhanced LongVA-7B model achieves competitive performance on comprehensive video benchmarks. For example, it achieves a 49.7 accuracy on the Temporal Reasoning task of Video-MME, surpassing powerful large-scale models such as InternVL-Chat-V1.5-20B and VILA1.5-40B. Further analysis reveals a strong correlation between textual and video temporal task performance, validating the efficacy of transferring temporal reasoning abilities from text to video domains., Comment: Project page: https://video-t3.github.io
- Published
- 2024
21. FlexSBDD: Structure-Based Drug Design with Flexible Protein Modeling
- Author
-
Zhang, Zaixi, Wang, Mengdi, and Liu, Qi
- Subjects
Quantitative Biology - Biomolecules - Abstract
Structure-based drug design (SBDD), which aims to generate 3D ligand molecules binding to target proteins, is a fundamental task in drug discovery. Existing SBDD methods typically treat protein as rigid and neglect protein structural change when binding with ligand molecules, leading to a big gap with real-world scenarios and inferior generation qualities (e.g., many steric clashes). To bridge the gap, we propose FlexSBDD, a deep generative model capable of accurately modeling the flexible protein-ligand complex structure for ligand molecule generation. FlexSBDD adopts an efficient flow matching framework and leverages E(3)-equivariant network with scalar-vector dual representation to model dynamic structural changes. Moreover, novel data augmentation schemes based on structure relaxation/sidechain repacking are adopted to boost performance. Extensive experiments demonstrate that FlexSBDD achieves state-of-the-art performance in generating high-affinity molecules and effectively modeling the protein's conformation change to increase favorable protein-ligand interactions (e.g., Hydrogen bonds) and decrease steric clashes., Comment: Accepted by NeurIPS 2024
- Published
- 2024
22. Generalized Protein Pocket Generation with Prior-Informed Flow Matching
- Author
-
Zhang, Zaixi, Zitnik, Marinka, and Liu, Qi
- Subjects
Quantitative Biology - Biomolecules - Abstract
Designing ligand-binding proteins, such as enzymes and biosensors, is essential in bioengineering and protein biology. One critical step in this process involves designing protein pockets, the protein interface binding with the ligand. Current approaches to pocket generation often suffer from time-intensive physical computations or template-based methods, as well as compromised generation quality due to the overlooking of domain knowledge. To tackle these challenges, we propose PocketFlow, a generative model that incorporates protein-ligand interaction priors based on flow matching. During training, PocketFlow learns to model key types of protein-ligand interactions, such as hydrogen bonds. In the sampling, PocketFlow leverages multi-granularity guidance (overall binding affinity and interaction geometry constraints) to facilitate generating high-affinity and valid pockets. Extensive experiments show that PocketFlow outperforms baselines on multiple benchmarks, e.g., achieving an average improvement of 1.29 in Vina Score and 0.05 in scRMSD. Moreover, modeling interactions make PocketFlow a generalized generative model across multiple ligand modalities, including small molecules, peptides, and RNA., Comment: Accepted by NeurIPS 2024 as Spotlight
- Published
- 2024
23. Towards More Relevant Product Search Ranking Via Large Language Models: An Empirical Study
- Author
-
Liu, Qi, Singh, Atul, Liu, Jingbo, Mu, Cun, and Yan, Zheng
- Subjects
Computer Science - Information Retrieval - Abstract
Training Learning-to-Rank models for e-commerce product search ranking can be challenging due to the lack of a gold standard of ranking relevance. In this paper, we decompose ranking relevance into content-based and engagement-based aspects, and we propose to leverage Large Language Models (LLMs) for both label and feature generation in model training, primarily aiming to improve the model's predictive capability for content-based relevance. Additionally, we introduce different sigmoid transformations on the LLM outputs to polarize relevance scores in labeling, enhancing the model's ability to balance content-based and engagement-based relevances and thus prioritize highly relevant items overall. Comprehensive online tests and offline evaluations are also conducted for the proposed design. Our work sheds light on advanced strategies for integrating LLMs into e-commerce product search ranking model training, offering a pathway to more effective and balanced models with improved ranking relevance., Comment: To be published in CIKM 2024 GenAIECommerce Workshop
- Published
- 2024
24. Long or Short or Both? An Exploration on Lookback Time Windows of Behavioral Features in Product Search Ranking
- Author
-
Liu, Qi, Singh, Atul, Liu, Jingbo, Mu, Cun, Yan, Zheng, and Pedersen, Jan
- Subjects
Computer Science - Information Retrieval - Abstract
Customer shopping behavioral features are core to product search ranking models in eCommerce. In this paper, we investigate the effect of lookback time windows when aggregating these features at the (query, product) level over history. By studying the pros and cons of using long and short time windows, we propose a novel approach to integrating these historical behavioral features of different time windows. In particular, we address the criticality of using query-level vertical signals in ranking models to effectively aggregate all information from different behavioral features. Anecdotal evidence for the proposed approach is also provided using live product search traffic on Walmart.com., Comment: Published in ACM SIGIR Workshop on eCommerce 2024
- Published
- 2024
25. Pre-trained Language Model and Knowledge Distillation for Lightweight Sequential Recommendation
- Author
-
Li, Li, Cheng, Mingyue, Liu, Zhiding, Zhang, Hao, Liu, Qi, and Chen, Enhong
- Subjects
Computer Science - Information Retrieval ,Computer Science - Machine Learning - Abstract
Sequential recommendation models user interests based on historical behaviors to provide personalized recommendation. Previous sequential recommendation algorithms primarily employ neural networks to extract features of user interests, achieving good performance. However, due to the recommendation system datasets sparsity, these algorithms often employ small-scale network frameworks, resulting in weaker generalization capability. Recently, a series of sequential recommendation algorithms based on large pre-trained language models have been proposed. Nonetheless, given the real-time demands of recommendation systems, the challenge remains in applying pre-trained language models for rapid recommendations in real scenarios. To address this, we propose a sequential recommendation algorithm based on a pre-trained language model and knowledge distillation. The key of proposed algorithm is to transfer pre-trained knowledge across domains and achieve lightweight inference by knowledge distillation. The algorithm operates in two stages: in the first stage, we fine-tune the pre-trained language model on the recommendation dataset to transfer the pre-trained knowledge to the recommendation task; in the second stage, we distill the trained language model to transfer the learned knowledge to a lightweight model. Extensive experiments on multiple public recommendation datasets show that the proposed algorithm enhances recommendation accuracy and provide timely recommendation services., Comment: in Chinese language
- Published
- 2024
26. ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models
- Author
-
Huang, Yuqing, Zhang, Rongyang, He, Xuesong, Zhi, Xuyang, Wang, Hao, Li, Xin, Xu, Feiyang, Liu, Deguang, Liang, Huadong, Li, Yi, Cui, Jian, Liu, Zimu, Wang, Shijin, Hu, Guoping, Liu, Guiquan, Liu, Qi, Lian, Defu, and Chen, Enhong
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Physics - Chemical Physics ,Quantitative Biology - Biomolecules - Abstract
There is a growing interest in the role that LLMs play in chemistry which lead to an increased focus on the development of LLMs benchmarks tailored to chemical domains to assess the performance of LLMs across a spectrum of chemical tasks varying in type and complexity. However, existing benchmarks in this domain fail to adequately meet the specific requirements of chemical research professionals. To this end, we propose \textbf{\textit{ChemEval}}, which provides a comprehensive assessment of the capabilities of LLMs across a wide range of chemical domain tasks. Specifically, ChemEval identified 4 crucial progressive levels in chemistry, assessing 12 dimensions of LLMs across 42 distinct chemical tasks which are informed by open-source data and the data meticulously crafted by chemical experts, ensuring that the tasks have practical value and can effectively evaluate the capabilities of LLMs. In the experiment, we evaluate 12 mainstream LLMs on ChemEval under zero-shot and few-shot learning contexts, which included carefully selected demonstration examples and carefully designed prompts. The results show that while general LLMs like GPT-4 and Claude-3.5 excel in literature understanding and instruction following, they fall short in tasks demanding advanced chemical knowledge. Conversely, specialized LLMs exhibit enhanced chemical competencies, albeit with reduced literary comprehension. This suggests that LLMs have significant potential for enhancement when tackling sophisticated tasks in the field of chemistry. We believe our work will facilitate the exploration of their potential to drive progress in chemistry. Our benchmark and analysis will be available at {\color{blue} \url{https://github.com/USTC-StarTeam/ChemEval}}.
- Published
- 2024
27. Bridging the Gap: GRB 230812B -- A Three-Second Supernova-Associated Burst Detected by the GRID Mission
- Author
-
Wang, Chen-Yu, Yin, Yi-Han Iris, Zhang, Bin-Bin, Feng, Hua, Zeng, Ming, Xiong, Shao-Lin, Pan, Xiao-Fan, Yang, Jun, Zhang, Yan-Qiu, Li, Chen, Yan, Zhen-Yu, Wang, Chen-Wei, Zheng, Xu-Tao, Liu, Jia-Cong, Wang, Qi-Dong, Yang, Zi-Rui, Li, Long-Hao, Liu, Qi-Ze, Zhao, Zheng-Yang, Hu, Bo, Liu, Yi-Qi, Lu, Si-Yuan, Luo, Zi-You, Cang, Ji-Rong, Cao, De-Zhi, Han, Wen-Tao, Jia, Li-Ping, Pan, Xing-Yu, Tian, Yang, Xu, Ben-Da, Yang, Xiao, and Zeng, Zhi
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
GRB 230812B, detected by the Gamma-Ray Integrated Detectors (GRID) constellation mission, is an exceptionally bright gamma-ray burst (GRB) with a duration of only 3 seconds. Sitting near the traditional boundary ($\sim$ 2 s) between long and short GRBs, GRB 230812B is notably associated with a supernova (SN), indicating a massive star progenitor. This makes it a rare example of a short-duration GRB resulting from stellar collapse. Our analysis, using a time-evolving synchrotron model, suggests that the burst has an emission radius of approximately $10^{14.5}$~cm. We propose that the short duration of GRB 230812B is due to the combined effects of the central engine's activity time and the time required for the jet to break through the stellar envelope. Our findings provide another case that challenges the conventional view that short-duration GRBs originate exclusively from compact object mergers, demonstrating that a broader range of durations exists for GRBs arising from the collapse of massive stars., Comment: 10 pages, 3 tables, 11 figures
- Published
- 2024
28. CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting
- Author
-
Chen, Runze, Xiao, Mingyu, Luo, Haiyong, Zhao, Fang, Wu, Fan, Xiong, Hao, Liu, Qi, and Song, Meng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We introduce Crowd-Sourced Splatting (CSS), a novel 3D Gaussian Splatting (3DGS) pipeline designed to overcome the challenges of pose-free scene reconstruction using crowd-sourced imagery. The dream of reconstructing historically significant but inaccessible scenes from collections of photographs has long captivated researchers. However, traditional 3D techniques struggle with missing camera poses, limited viewpoints, and inconsistent lighting. CSS addresses these challenges through robust geometric priors and advanced illumination modeling, enabling high-quality novel view synthesis under complex, real-world conditions. Our method demonstrates clear improvements over existing approaches, paving the way for more accurate and flexible applications in AR, VR, and large-scale 3D reconstruction.
- Published
- 2024
29. Revisiting the Solution of Meta KDD Cup 2024: CRAG
- Author
-
Ouyang, Jie, Luo, Yucong, Cheng, Mingyue, Wang, Daoyu, Yu, Shuo, Liu, Qi, and Chen, Enhong
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
This paper presents the solution of our team APEX in the Meta KDD CUP 2024: CRAG Comprehensive RAG Benchmark Challenge. The CRAG benchmark addresses the limitations of existing QA benchmarks in evaluating the diverse and dynamic challenges faced by Retrieval-Augmented Generation (RAG) systems. It provides a more comprehensive assessment of RAG performance and contributes to advancing research in this field. We propose a routing-based domain and dynamic adaptive RAG pipeline, which performs specific processing for the diverse and dynamic nature of the question in all three stages: retrieval, augmentation, and generation. Our method achieved superior performance on CRAG and ranked 2nd for Task 2&3 on the final competition leaderboard. Our implementation is available at this link: https://github.com/USTCAGI/CRAG-in-KDD-Cup2024.
- Published
- 2024
30. Efficient Transfer Learning Framework for Cross-Domain Click-Through Rate Prediction
- Author
-
Liu, Qi, Tang, Xingyuan, Huang, Jianqiang, Yu, Xiangqian, Jin, Haoran, Chen, Jin, Pu, Yuanhao, Lian, Defu, Qu, Tan, Wang, Zhe, Cheng, Jia, and Lei, Jun
- Subjects
Computer Science - Information Retrieval - Abstract
Natural content and advertisement coexist in industrial recommendation systems but differ in data distribution. Concretely, traffic related to the advertisement is considerably sparser compared to that of natural content, which motivates the development of transferring knowledge from the richer source natural content domain to the sparser advertising domain. The challenges include the inefficiencies arising from the management of extensive source data and the problem of 'catastrophic forgetting' that results from the CTR model's daily updating. To this end, we propose a novel tri-level asynchronous framework, i.e., Efficient Transfer Learning Framework for Cross-Domain Click-Through Rate Prediction (E-CDCTR), to transfer comprehensive knowledge of natural content to advertisement CTR models. This framework consists of three key components: Tiny Pre-training Model ((TPM), which trains a tiny CTR model with several basic features on long-term natural data; Complete Pre-training Model (CPM), which trains a CTR model holding network structure and input features the same as target advertisement on short-term natural data; Advertisement CTR model (A-CTR), which derives its parameter initialization from CPM together with multiple historical embeddings from TPM as extra feature and then fine-tunes on advertisement data. TPM provides richer representations of user and item for both the CPM and A-CTR, effectively alleviating the forgetting problem inherent in the daily updates. CPM further enhances the advertisement model by providing knowledgeable initialization, thereby alleviating the data sparsity challenges typically encountered by advertising CTR models. Such a tri-level cross-domain transfer learning framework offers an efficient solution to address both data sparsity and `catastrophic forgetting', yielding remarkable improvements.
- Published
- 2024
31. Near-symmetric multiport beam splitting for high-NOON state preparation on nonlocal metasurface
- Author
-
Tian, Yu, Liu, Qi, Tian, Zhaohua, Gong, Qihuang, and Gu, Ying
- Subjects
Quantum Physics - Abstract
Polarization beam splitting (BS) has been implemented on gradient metasurface with local response for entanglement manipulation and state reconstruction. To realize more degrees of light modulation, nonlocal modes, manifested as wavelength and momentum selectivity, should be applied into metasurface BS. Here, we demonstrate that single nonlocal phase gradient metasurface(NPGM) can function as a series of independent near-symmetric multiport BS,constructed by its momentum-polarization mode subspaces.Then, using any of above BS with simultaneous multiphoton interference, high-photon NOON states are prepared with high success probability and fidelity. For example,four-mode four-photon NOON state is obtained with 34.8% success probability and fidelity of 99.9%, greatly higher than those previously reported.With unique capability of multiphoton interference, this multiport BS on single NPGM can be directly used in the on-chip quantum photonics. Also, the efficient generation of high-photon NOON states with above BS has potential applications in quantum precision measurement., Comment: 4 figures
- Published
- 2024
32. MASQ: Multi-Agent Reinforcement Learning for Single Quadruped Robot Locomotion
- Author
-
Liu, Qi, Guo, Jingxiang, Lin, Sixu, Ma, Shuaikang, Zhu, Jinxuan, and Li, Yanjie
- Subjects
Computer Science - Robotics - Abstract
This paper proposes a novel method to improve locomotion learning for a single quadruped robot using multi-agent deep reinforcement learning (MARL). Many existing methods use single-agent reinforcement learning for an individual robot or MARL for the cooperative task in multi-robot systems. Unlike existing methods, this paper proposes using MARL for the locomotion learning of a single quadruped robot. We develop a learning structure called Multi-Agent Reinforcement Learning for Single Quadruped Robot Locomotion (MASQ), considering each leg as an agent to explore the action space of the quadruped robot, sharing a global critic, and learning collaboratively. Experimental results indicate that MASQ not only speeds up learning convergence but also enhances robustness in real-world settings, suggesting that applying MASQ to single robots such as quadrupeds could surpass traditional single-robot reinforcement learning approaches. Our study provides insightful guidance on integrating MARL with single-robot locomotion learning.
- Published
- 2024
33. Multi-Agent Target Assignment and Path Finding for Intelligent Warehouse: A Cooperative Multi-Agent Deep Reinforcement Learning Perspective
- Author
-
Liu, Qi, Gao, Jianqi, Zhu, Dongjie, Qiao, Zhongjian, Chen, Pengbin, Guo, Jingxiang, and Li, Yanjie
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Multiagent Systems - Abstract
Multi-agent target assignment and path planning (TAPF) are two key problems in intelligent warehouse. However, most literature only addresses one of these two problems separately. In this study, we propose a method to simultaneously solve target assignment and path planning from a perspective of cooperative multi-agent deep reinforcement learning (RL). To the best of our knowledge, this is the first work to model the TAPF problem for intelligent warehouse to cooperative multi-agent deep RL, and the first to simultaneously address TAPF based on multi-agent deep RL. Furthermore, previous literature rarely considers the physical dynamics of agents. In this study, the physical dynamics of the agents is considered. Experimental results show that our method performs well in various task settings, which means that the target assignment is solved reasonably well and the planned path is almost shortest. Moreover, our method is more time-efficient than baselines.
- Published
- 2024
34. SUMO: Search-Based Uncertainty Estimation for Model-Based Offline Reinforcement Learning
- Author
-
Qiao, Zhongjian, Lyu, Jiafei, Jiao, Kechen, Liu, Qi, and Li, Xiu
- Subjects
Computer Science - Machine Learning - Abstract
The performance of offline reinforcement learning (RL) suffers from the limited size and quality of static datasets. Model-based offline RL addresses this issue by generating synthetic samples through a dynamics model to enhance overall performance. To evaluate the reliability of the generated samples, uncertainty estimation methods are often employed. However, model ensemble, the most commonly used uncertainty estimation method, is not always the best choice. In this paper, we propose a \textbf{S}earch-based \textbf{U}ncertainty estimation method for \textbf{M}odel-based \textbf{O}ffline RL (SUMO) as an alternative. SUMO characterizes the uncertainty of synthetic samples by measuring their cross entropy against the in-distribution dataset samples, and uses an efficient search-based method for implementation. In this way, SUMO can achieve trustworthy uncertainty estimation. We integrate SUMO into several model-based offline RL algorithms including MOPO and Adapted MOReL (AMOReL), and provide theoretical analysis for them. Extensive experimental results on D4RL datasets demonstrate that SUMO can provide more accurate uncertainty estimation and boost the performance of base algorithms. These indicate that SUMO could be a better uncertainty estimator for model-based offline RL when used in either reward penalty or trajectory truncation. Our code is available and will be open-source for further research and development., Comment: Submitted to AAAI2025
- Published
- 2024
35. E-code: Mastering Efficient Code Generation through Pretrained Models and Expert Encoder Group
- Author
-
Pan, Yue, Lyu, Chen, Yang, Zhenyu, Li, Lantian, Liu, Qi, and Shao, Xiuting
- Subjects
Computer Science - Software Engineering - Abstract
Context: With the waning of Moore's Law, the software industry is placing increasing importance on finding alternative solutions for continuous performance enhancement. The significance and research results of software performance optimization have been on the rise in recent years, especially with the advancement propelled by Large Language Models(LLMs). However, traditional strategies for rectifying performance flaws have shown significant limitations at the competitive code efficiency optimization level, and research on this topic is surprisingly scarce. Objective: This study aims to address the research gap in this domain, offering practical solutions to the various challenges encountered. Specifically, we have overcome the constraints of traditional performance error rectification strategies and developed a Language Model (LM) tailored for the competitive code efficiency optimization realm. Method: We introduced E-code, an advanced program synthesis LM. Inspired by the recent success of expert LMs, we designed an innovative structure called the Expert Encoder Group. This structure employs multiple expert encoders to extract features tailored for different input types. We assessed the performance of E-code against other leading models on a competitive dataset and conducted in-depth ablation experiments. Results: Upon systematic evaluation, E-code achieved a 54.98% improvement in code efficiency, significantly outperforming other advanced models. In the ablation experiments, we further validated the significance of the expert encoder group and other components within E-code. Conclusion: The research findings indicate that the expert encoder group can effectively handle various inputs in efficiency optimization tasks, significantly enhancing the model's performance.
- Published
- 2024
36. Evaluation of quantum Fisher information for large system
- Author
-
Liu, Qi
- Subjects
Quantum Physics - Abstract
Quantum Fisher information (QFI) plays a vital role in quantum precision measurement, quantum information, many-body physics, and other domains. Obtaining the QFI from experiment for a quantum state reveals insights such as the limits of estimation accuracy for a certain parameter, the degree of entanglement, and the geometric characteristics of the quantum state. Nonetheless, the measurement complexity of the QFI and its lower bound hinges on the dimension of the quantum state. Consequently, reducing the complexity of measurement is a significant challenge. This paper presents a methodology for evaluating the QFI of high-dimensional systems by transferring information to an auxiliary system and measuring its sub-QFI, while also offering conditions to diminish the dimension of auxiliary system to be measured without affecting the amount of information obtained by it., Comment: 9 pages, 1 figure
- Published
- 2024
37. DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models
- Author
-
Li, Wuchao, Huang, Rui, Zhao, Haijun, Liu, Chi, Zheng, Kai, Liu, Qi, Mou, Na, Zhou, Guorui, Lian, Defu, Song, Yang, Bao, Wentian, Yu, Enyun, and Ou, Wenwu
- Subjects
Computer Science - Information Retrieval ,Computer Science - Machine Learning - Abstract
Sequential Recommendation (SR) plays a pivotal role in recommender systems by tailoring recommendations to user preferences based on their non-stationary historical interactions. Achieving high-quality performance in SR requires attention to both item representation and diversity. However, designing an SR method that simultaneously optimizes these merits remains a long-standing challenge. In this study, we address this issue by integrating recent generative Diffusion Models (DM) into SR. DM has demonstrated utility in representation learning and diverse image generation. Nevertheless, a straightforward combination of SR and DM leads to sub-optimal performance due to discrepancies in learning objectives (recommendation vs. noise reconstruction) and the respective learning spaces (non-stationary vs. stationary). To overcome this, we propose a novel framework called DimeRec (\textbf{Di}ffusion with \textbf{m}ulti-interest \textbf{e}nhanced \textbf{Rec}ommender). DimeRec synergistically combines a guidance extraction module (GEM) and a generative diffusion aggregation module (DAM). The GEM extracts crucial stationary guidance signals from the user's non-stationary interaction history, while the DAM employs a generative diffusion process conditioned on GEM's outputs to reconstruct and generate consistent recommendations. Our numerical experiments demonstrate that DimeRec significantly outperforms established baseline methods across three publicly available datasets. Furthermore, we have successfully deployed DimeRec on a large-scale short video recommendation platform, serving hundreds of millions of users. Live A/B testing confirms that our method improves both users' time spent and result diversification.
- Published
- 2024
38. RePair: Automated Program Repair with Process-based Feedback
- Author
-
Zhao, Yuze, Huang, Zhenya, Ma, Yixiao, Li, Rui, Zhang, Kai, Jiang, Hao, Liu, Qi, Zhu, Linbo, and Su, Yu
- Subjects
Computer Science - Software Engineering ,Computer Science - Computation and Language - Abstract
The gap between the trepidation of program reliability and the expense of repairs underscores the indispensability of Automated Program Repair (APR). APR is instrumental in transforming vulnerable programs into more robust ones, bolstering program reliability while simultaneously diminishing the financial burden of manual repairs. Commercial-scale language models (LM) have taken APR to unprecedented levels. However, the emergence reveals that for models fewer than 100B parameters, making single-step modifications may be difficult to achieve the desired effect. Moreover, humans interact with the LM through explicit prompts, which hinders the LM from receiving feedback from compiler and test cases to automatically optimize its repair policies. In this literature, we explore how small-scale LM (less than 20B) achieve excellent performance through process supervision and feedback. We start by constructing a dataset named CodeNet4Repair, replete with multiple repair records, which supervises the fine-tuning of a foundational model. Building upon the encouraging outcomes of reinforcement learning, we develop a reward model that serves as a critic, providing feedback for the fine-tuned LM's action, progressively optimizing its policy. During inference, we require the LM to generate solutions iteratively until the repair effect no longer improves or hits the maximum step limit. The results show that process-based not only outperforms larger outcome-based generation methods, but also nearly matches the performance of closed-source commercial large-scale LMs., Comment: 15 pages, 13 figures
- Published
- 2024
39. A Review of Human-Object Interaction Detection
- Author
-
Wang, Yuxiao, Xiong, Qiwei, Lei, Yu, Xue, Weiying, Liu, Qi, and Wei, Zhenao
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Human-object interaction (HOI) detection plays a key role in high-level visual understanding, facilitating a deep comprehension of human activities. Specifically, HOI detection aims to locate the humans and objects involved in interactions within images or videos and classify the specific interactions between them. The success of this task is influenced by several key factors, including the accurate localization of human and object instances, as well as the correct classification of object categories and interaction relationships. This paper systematically summarizes and discusses the recent work in image-based HOI detection. First, the mainstream datasets involved in HOI relationship detection are introduced. Furthermore, starting with two-stage methods and end-to-end one-stage detection approaches, this paper comprehensively discusses the current developments in image-based HOI detection, analyzing the strengths and weaknesses of these two methods. Additionally, the advancements of zero-shot learning, weakly supervised learning, and the application of large-scale language models in HOI detection are discussed. Finally, the current challenges in HOI detection are outlined, and potential research directions and future trends are explored.
- Published
- 2024
40. Modeling Domain and Feedback Transitions for Cross-Domain Sequential Recommendation
- Author
-
Zhang, Changshuo, Shi, Teng, Zhang, Xiao, Liu, Qi, Xie, Ruobing, Xu, Jun, and Wen, Ji-Rong
- Subjects
Computer Science - Information Retrieval - Abstract
Nowadays, many recommender systems encompass various domains to cater to users' diverse needs, leading to user behaviors transitioning across different domains. In fact, user behaviors across different domains reveal changes in preference toward recommended items. For instance, a shift from negative feedback to positive feedback indicates improved user satisfaction. However, existing cross-domain sequential recommendation methods typically model user interests by focusing solely on information about domain transitions, often overlooking the valuable insights provided by users' feedback transitions. In this paper, we propose $\text{Transition}^2$, a novel method to model transitions across both domains and types of user feedback. Specifically, $\text{Transition}^2$ introduces a transition-aware graph encoder based on user history, assigning different weights to edges according to the feedback type. This enables the graph encoder to extract historical embeddings that capture the transition information between different domains and feedback types. Subsequently, we encode the user history using a cross-transition multi-head self-attention, incorporating various masks to distinguish different types of transitions. Finally, we integrate these modules to make predictions across different domains. Experimental results on two public datasets demonstrate the effectiveness of $\text{Transition}^2$.
- Published
- 2024
41. Mamba Retriever: Utilizing Mamba for Effective and Efficient Dense Retrieval
- Author
-
Zhang, Hanqi, Chen, Chong, Mei, Lang, Liu, Qi, and Mao, Jiaxin
- Subjects
Computer Science - Information Retrieval - Abstract
In the information retrieval (IR) area, dense retrieval (DR) models use deep learning techniques to encode queries and passages into embedding space to compute their semantic relations. It is important for DR models to balance both efficiency and effectiveness. Pre-trained language models (PLMs), especially Transformer-based PLMs, have been proven to be effective encoders of DR models. However, the self-attention component in Transformer-based PLM results in a computational complexity that grows quadratically with sequence length, and thus exhibits a slow inference speed for long-text retrieval. Some recently proposed non-Transformer PLMs, especially the Mamba architecture PLMs, have demonstrated not only comparable effectiveness to Transformer-based PLMs on generative language tasks but also better efficiency due to linear time scaling in sequence length. This paper implements the Mamba Retriever to explore whether Mamba can serve as an effective and efficient encoder of DR model for IR tasks. We fine-tune the Mamba Retriever on the classic short-text MS MARCO passage ranking dataset and the long-text LoCoV0 dataset. Experimental results show that (1) on the MS MARCO passage ranking dataset and BEIR, the Mamba Retriever achieves comparable or better effectiveness compared to Transformer-based retrieval models, and the effectiveness grows with the size of the Mamba model; (2) on the long-text LoCoV0 dataset, the Mamba Retriever can extend to longer text length than its pre-trained length after fine-tuning on retrieval task, and it has comparable or better effectiveness compared to other long-text retrieval models; (3) the Mamba Retriever has superior inference speed for long-text retrieval. In conclusion, Mamba Retriever is both effective and efficient, making it a practical model, especially for long-text retrieval.
- Published
- 2024
42. An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation
- Author
-
Wang, Jun, Wu, Likang, Liu, Qi, and Yang, Yu
- Subjects
Computer Science - Machine Learning ,Computer Science - Information Retrieval - Abstract
Sequential recommendation, where user preference is dynamically inferred from sequential historical behaviors, is a critical task in recommender systems (RSs). To further optimize long-term user engagement, offline reinforcement-learning-based RSs have become a mainstream technique as they provide an additional advantage in avoiding global explorations that may harm online users' experiences. However, previous studies mainly focus on discrete action and policy spaces, which might have difficulties in handling dramatically growing items efficiently. To mitigate this issue, in this paper, we aim to design an algorithmic framework applicable to continuous policies. To facilitate the control in the low-dimensional but dense user preference space, we propose an \underline{\textbf{E}}fficient \underline{\textbf{Co}}ntinuous \underline{\textbf{C}}ontrol framework (ECoC). Based on a statistically tested assumption, we first propose the novel unified action representation abstracted from normalized user and item spaces. Then, we develop the corresponding policy evaluation and policy improvement procedures. During this process, strategic exploration and directional control in terms of unified actions are carefully designed and crucial to final recommendation decisions. Moreover, beneficial from unified actions, the conservatism regularization for policies and value functions are combined and perfectly compatible with the continuous framework. The resulting dual regularization ensures the successful offline training of RL-based recommendation policies. Finally, we conduct extensive experiments to validate the effectiveness of our framework. The results show that compared to the discrete baselines, our ECoC is trained far more efficiently. Meanwhile, the final policies outperform baselines in both capturing the offline data and gaining long-term rewards.
- Published
- 2024
43. Experimental evaluation of offline reinforcement learning for HVAC control in buildings
- Author
-
Wang, Jun, Li, Linyan, Liu, Qi, and Yang, Yu
- Subjects
Computer Science - Machine Learning - Abstract
Reinforcement learning (RL) techniques have been increasingly investigated for dynamic HVAC control in buildings. However, most studies focus on exploring solutions in online or off-policy scenarios without discussing in detail the implementation feasibility or effectiveness of dealing with purely offline datasets or trajectories. The lack of these works limits the real-world deployment of RL-based HVAC controllers, especially considering the abundance of historical data. To this end, this paper comprehensively evaluates the strengths and limitations of state-of-the-art offline RL algorithms by conducting analytical and numerical studies. The analysis is conducted from two perspectives: algorithms and dataset characteristics. As a prerequisite, the necessity of applying offline RL algorithms is first confirmed in two building environments. The ability of observation history modeling to reduce violations and enhance performance is subsequently studied. Next, the performance of RL-based controllers under datasets with different qualitative and quantitative conditions is investigated, including constraint satisfaction and power consumption. Finally, the sensitivity of certain hyperparameters is also evaluated. The results indicate that datasets of a certain suboptimality level and relatively small scale can be utilized to effectively train a well-performed RL-based HVAC controller. Specifically, such controllers can reduce at most 28.5% violation ratios of indoor temperatures and achieve at most 12.1% power savings compared to the baseline controller. In summary, this paper presents our well-structured investigations and new findings when applying offline reinforcement learning to building HVAC systems.
- Published
- 2024
44. Towards Few-shot Self-explaining Graph Neural Networks
- Author
-
Peng, Jingyu, Liu, Qi, Yue, Linan, Zhang, Zaixi, Zhang, Kai, and Sha, Yunhao
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Recent advancements in Graph Neural Networks (GNNs) have spurred an upsurge of research dedicated to enhancing the explainability of GNNs, particularly in critical domains such as medicine. A promising approach is the self-explaining method, which outputs explanations along with predictions. However, existing self-explaining models require a large amount of training data, rendering them unavailable in few-shot scenarios. To address this challenge, in this paper, we propose a Meta-learned Self-Explaining GNN (MSE-GNN), a novel framework that generates explanations to support predictions in few-shot settings. MSE-GNN adopts a two-stage self-explaining structure, consisting of an explainer and a predictor. Specifically, the explainer first imitates the attention mechanism of humans to select the explanation subgraph, whereby attention is naturally paid to regions containing important characteristics. Subsequently, the predictor mimics the decision-making process, which makes predictions based on the generated explanation. Moreover, with a novel meta-training process and a designed mechanism that exploits task information, MSE-GNN can achieve remarkable performance on new few-shot tasks. Extensive experimental results on four datasets demonstrate that MSE-GNN can achieve superior performance on prediction tasks while generating high-quality explanations compared with existing methods. The code is publicly available at https://github.com/jypeng28/MSE-GNN.
- Published
- 2024
45. KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination
- Author
-
Gu, Yin, Liu, Qi, Li, Zhi, and Zhang, Kai
- Subjects
Computer Science - Artificial Intelligence - Abstract
Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI field, which aims to learn an agent to cooperate with an unseen partner in training environments or even novel environments. In recent years, a popular ZSC solution paradigm has been deep reinforcement learning (DRL) combined with advanced self-play or population-based methods to enhance the neural policy's ability to handle unseen partners. Despite some success, these approaches usually rely on black-box neural networks as the policy function. However, neural networks typically lack interpretability and logic, making the learned policies difficult for partners (e.g., humans) to understand and limiting their generalization ability. These shortcomings hinder the application of reinforcement learning methods in diverse cooperative scenarios.We suggest to represent the agent's policy with an interpretable program. Unlike neural networks, programs contain stable logic, but they are non-differentiable and difficult to optimize.To automatically learn such programs, we introduce Knowledge-driven Programmatic reinforcement learning for zero-shot Coordination (KnowPC). We first define a foundational Domain-Specific Language (DSL), including program structures, conditional primitives, and action primitives. A significant challenge is the vast program search space, making it difficult to find high-performing programs efficiently. To address this, KnowPC integrates an extractor and an reasoner. The extractor discovers environmental transition knowledge from multi-agent interaction trajectories, while the reasoner deduces the preconditions of each action primitive based on the transition knowledge.
- Published
- 2024
46. Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
- Author
-
Ye, Zilyu, Liu, Jinxiu, Peng, Ruotian, Cao, Jinjin, Chen, Zhiyang, Zhang, Yiyang, Xuan, Ziwei, Zhou, Mingyuan, Shen, Xiaoqian, Elhoseiny, Mohamed, Liu, Qi, and Qi, Guo-Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent image generation models excel at creating high-quality images from brief captions. However, they fail to maintain consistency of multiple instances across images when encountering lengthy contexts. This inconsistency is largely due to in existing training datasets the absence of granular instance feature labeling in existing training datasets. To tackle these issues, we introduce Openstory++, a large-scale dataset combining additional instance-level annotations with both images and text. Furthermore, we develop a training methodology that emphasizes entity-centric image-text generation, ensuring that the models learn to effectively interweave visual and textual information. Specifically, Openstory++ streamlines the process of keyframe extraction from open-domain videos, employing vision-language models to generate captions that are then polished by a large language model for narrative continuity. It surpasses previous datasets by offering a more expansive open-domain resource, which incorporates automated captioning, high-resolution imagery tailored for instance count, and extensive frame sequences for temporal consistency. Additionally, we present Cohere-Bench, a pioneering benchmark framework for evaluating the image generation tasks when long multimodal context is provided, including the ability to keep the background, style, instances in the given context coherent. Compared to existing benchmarks, our work fills critical gaps in multi-modal generation, propelling the development of models that can adeptly generate and interpret complex narratives in open-domain environments. Experiments conducted within Cohere-Bench confirm the superiority of Openstory++ in nurturing high-quality visual storytelling models, enhancing their ability to address open-domain generation tasks. More details can be found at https://openstorypp.github.io/
- Published
- 2024
47. Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization
- Author
-
Zhang, Yanghai, Liu, Ye, Wu, Shiwei, Zhang, Kai, Liu, Xukai, Liu, Qi, and Chen, Enhong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
The rapid increase in multimedia data has spurred advancements in Multimodal Summarization with Multimodal Output (MSMO), which aims to produce a multimodal summary that integrates both text and relevant images. The inherent heterogeneity of content within multimodal inputs and outputs presents a significant challenge to the execution of MSMO. Traditional approaches typically adopt a holistic perspective on coarse image-text data or individual visual objects, overlooking the essential connections between objects and the entities they represent. To integrate the fine-grained entity knowledge, we propose an Entity-Guided Multimodal Summarization model (EGMS). Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently. A gating mechanism then combines visual data for enhanced textual summary generation, while image selection is refined through knowledge distillation from a pre-trained vision-language model. Extensive experiments on public MSMO dataset validate the superiority of the EGMS method, which also prove the necessity to incorporate entity information into MSMO problem., Comment: In ACL-Findings 2024
- Published
- 2024
48. HADES: Detecting Active Directory Attacks via Whole Network Provenance Analytics
- Author
-
Liu, Qi, Bao, Kaibin, Hassan, Wajih Ul, and Hagenmeyer, Veit
- Subjects
Computer Science - Cryptography and Security - Abstract
Due to its crucial role in identity and access management in modern enterprise networks, Active Directory (AD) is a top target of Advanced Persistence Threat (APT) actors. Conventional intrusion detection systems (IDS) excel at identifying malicious behaviors caused by malware, but often fail to detect stealthy attacks launched by APT actors. Recent advance in provenance-based IDS (PIDS) shows promises by exposing malicious system activities in causal attack graphs. However, existing approaches are restricted to intra-machine tracing, and unable to reveal the scope of attackers' traversal inside a network. We propose HADES, the first PIDS capable of performing accurate causality-based cross-machine tracing by leveraging a novel concept called logon session based execution partitioning to overcome several challenges in cross-machine tracing. We design HADES as an efficient on-demand tracing system, which performs whole-network tracing only when it first identifies an authentication anomaly signifying an ongoing AD attack, for which we introduce a novel lightweight authentication anomaly detection model rooted in our extensive analysis of AD attacks. To triage attack alerts, we present a new algorithm integrating two key insights we identified in AD attacks. Our evaluations show that HADES outperforms both popular open source detection systems and a prominent commercial AD attack detector., Comment: 13 pages
- Published
- 2024
49. Accurate and Scalable Detection and Investigation of Cyber Persistence Threats
- Author
-
Liu, Qi, Shoaib, Muhammad, Rehman, Mati Ur, Bao, Kaibin, Hagenmeyer, Veit, and Hassan, Wajih Ul
- Subjects
Computer Science - Cryptography and Security - Abstract
In Advanced Persistent Threat (APT) attacks, achieving stealthy persistence within target systems is often crucial for an attacker's success. This persistence allows adversaries to maintain prolonged access, often evading detection mechanisms. Recognizing its pivotal role in the APT lifecycle, this paper introduces Cyber Persistence Detector (CPD), a novel system dedicated to detecting cyber persistence through provenance analytics. CPD is founded on the insight that persistent operations typically manifest in two phases: the "persistence setup" and the subsequent "persistence execution". By causally relating these phases, we enhance our ability to detect persistent threats. First, CPD discerns setups signaling an impending persistent threat and then traces processes linked to remote connections to identify persistence execution activities. A key feature of our system is the introduction of pseudo-dependency edges (pseudo-edges), which effectively connect these disjoint phases using data provenance analysis, and expert-guided edges, which enable faster tracing and reduced log size. These edges empower us to detect persistence threats accurately and efficiently. Moreover, we propose a novel alert triage algorithm that further reduces false positives associated with persistence threats. Evaluations conducted on well-known datasets demonstrate that our system reduces the average false positive rate by 93% compared to state-of-the-art methods., Comment: 16 pages
- Published
- 2024
50. Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models
- Author
-
Tang, Haoyu, Liu, Ye, Liu, Xukai, Zhang, Kai, Zhang, Yanghai, Liu, Qi, and Chen, Enhong
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Recent advancements in machine learning, particularly in Natural Language Processing (NLP), have led to the development of sophisticated models trained on extensive datasets, yet raising concerns about the potential leakage of sensitive information. In response, regulatory measures such as the European Union's General Data Protection Regulation (GDPR) have driven increasing interest in Machine Unlearning techniques, which enable models to selectively forget specific data entries. Early approaches primarily relied on pre-processing methods, while more recent research has shifted towards training-based unlearning techniques. Despite their effectiveness, most existing methods require access to the original training data, which is often inaccessible. Additionally, directly applying unlearning techniques bear the cost of undermining the model's expressive capabilities. To address these challenges, we introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components: A Knowledge Unlearning Induction module designed to remove specific knowledge through an unlearning loss; A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal; And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update. Experimental results demonstrate the efficacy of our ICU method in unlearning sensitive information while maintaining the model's overall performance, offering a promising solution for privacy-conscious machine learning applications.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.