Author: "Yu, Bowen" / Database: arXiv - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yu, Bowen"' showing total 68 results

Start Over Author "Yu, Bowen" Database arXiv

68 results on '"Yu, Bowen"'

1. Aligning Large Language Models via Self-Steering Optimization

Author: Xiang, Hao, Yu, Bowen, Lin, Hongyu, Lu, Keming, Lu, Yaojie, Han, Xianpei, Sun, Le, Zhou, Jingren, and Lin, Junyang
Subjects: Computer Science - Computation and Language
Abstract: Automated alignment develops alignment systems with minimal human intervention. The key to automated alignment lies in providing learnable and accurate preference signals for preference learning without human annotation. In this paper, we introduce Self-Steering Optimization ($SSO$), an algorithm that autonomously generates high-quality preference signals based on predefined principles during iterative training, eliminating the need for manual annotation. $SSO$ maintains the accuracy of signals by ensuring a consistent gap between chosen and rejected responses while keeping them both on-policy to suit the current policy model's learning capacity. $SSO$ can benefit the online and offline training of the policy model, as well as enhance the training of reward models. We validate the effectiveness of $SSO$ with two foundation models, Qwen2 and Llama3.1, indicating that it provides accurate, on-policy preference signals throughout iterative training. Without any manual annotation or external models, $SSO$ leads to significant performance improvements across six subjective or objective benchmarks. Besides, the preference data generated by $SSO$ significantly enhanced the performance of the reward model on Rewardbench. Our work presents a scalable approach to preference optimization, paving the way for more efficient and effective automated alignment.
Published: 2024

2. A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models

Author: Tang, Qiaoyu, Yu, Le, Yu, Bowen, Lin, Hongyu, Lu, Keming, Lu, Yaojie, Han, Xianpei, and Sun, Le
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Post-training has emerged as a crucial paradigm for adapting large-scale pre-trained models to various tasks, whose effects are fully reflected by delta parameters (i.e., the disparity between post-trained and pre-trained parameters). While numerous studies have explored delta parameter properties via operations like pruning, quantization, low-rank approximation, and extrapolation, a unified framework for systematically examining these characteristics has been lacking. In this paper, we propose a novel perspective based on Riemann sum approximation of the loss function to elucidate delta parameter editing operations. Our analysis categorizes existing methods into three classes based on their post-editing performance: competitive, decreased, and improved, explaining how they are expressed by the Riemann sum approximation term and how they alter the model performance. Extensive experiments on both visual and language models, including ViT, LLaMA 3, Qwen 2, and Mistral, corroborate our theoretical findings. Furthermore, we introduce extensions to existing techniques like DARE and BitDelta, highlighting their limitations in leveraging the properties of delta parameters and reorganizing them into general expressions to enhance the applicability and effectiveness of delta parameter editing in post-trained models.
Published: 2024

3. Rethinking Data Selection at Scale: Random Selection is Almost All You Need

Author: Xia, Tingyu, Yu, Bowen, Dang, Kai, Yang, An, Wu, Yuan, Tian, Yuan, Chang, Yi, and Lin, Junyang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Supervised fine-tuning (SFT) is crucial for aligning Large Language Models (LLMs) with human instructions. The primary goal during SFT is to select a small yet representative subset of training data from the larger pool, such that fine-tuning with this subset achieves results comparable to or even exceeding those obtained using the entire dataset. However, most existing data selection techniques are designed for small-scale data pools, which fail to meet the demands of real-world SFT scenarios. In this paper, we replicated several self-scoring methods those that do not rely on external model assistance on two million scale datasets, and found that nearly all methods struggled to significantly outperform random selection when dealing with such large-scale data pools. Moreover, our comparisons suggest that, during SFT, diversity in data selection is more critical than simply focusing on high quality data. We also analyzed the limitations of several current approaches, explaining why they perform poorly on large-scale datasets and why they are unsuitable for such contexts. Finally, we found that filtering data by token length offers a stable and efficient method for improving results. This approach, particularly when training on long text data, proves highly beneficial for relatively weaker base models, such as Llama3.
Published: 2024

4. Qwen2.5-Coder Technical Report

Author: Hui, Binyuan, Yang, Jian, Cui, Zeyu, Yang, Jiaxi, Liu, Dayiheng, Zhang, Lei, Liu, Tianyu, Zhang, Jiajun, Yu, Bowen, Dang, Kai, Yang, An, Men, Rui, Huang, Fei, Ren, Xingzhang, Ren, Xuancheng, Zhou, Jingren, and Lin, Junyang
Subjects: Computer Science - Computation and Language
Abstract: In this report, we introduce the Qwen2.5-Coder series, a significant upgrade from its predecessor, CodeQwen1.5. This series includes two models: Qwen2.5-Coder-1.5B and Qwen2.5-Coder-7B. As a code-specific model, Qwen2.5-Coder is built upon the Qwen2.5 architecture and continues pretrained on a vast corpus of over 5.5 trillion tokens. Through meticulous data cleaning, scalable synthetic data generation, and balanced data mixing, Qwen2.5-Coder demonstrates impressive code generation capabilities while retaining general versatility. The model has been evaluated on a wide range of code-related tasks, achieving state-of-the-art (SOTA) performance across more than 10 benchmarks, including code generation, completion, reasoning, and repair, consistently outperforming larger models of the same model size. We believe that the release of the Qwen2.5-Coder series will not only push the boundaries of research in code intelligence but also, through its permissive licensing, encourage broader adoption by developers in real-world applications.
Published: 2024

5. Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Author: Yang, An, Zhang, Beichen, Hui, Binyuan, Gao, Bofei, Yu, Bowen, Li, Chengpeng, Liu, Dayiheng, Tu, Jianhong, Zhou, Jingren, Lin, Junyang, Lu, Keming, Xue, Mingfeng, Lin, Runji, Liu, Tianyu, Ren, Xingzhang, and Zhang, Zhenru
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: In this report, we present a series of math-specific large language models: Qwen2.5-Math and Qwen2.5-Math-Instruct-1.5B/7B/72B. The core innovation of the Qwen2.5 series lies in integrating the philosophy of self-improvement throughout the entire pipeline, from pre-training and post-training to inference: (1) During the pre-training phase, Qwen2-Math-Instruct is utilized to generate large-scale, high-quality mathematical data. (2) In the post-training phase, we develop a reward model (RM) by conducting massive sampling from Qwen2-Math-Instruct. This RM is then applied to the iterative evolution of data in supervised fine-tuning (SFT). With a stronger SFT model, it's possible to iteratively train and update the RM, which in turn guides the next round of SFT data iteration. On the final SFT model, we employ the ultimate RM for reinforcement learning, resulting in the Qwen2.5-Math-Instruct. (3) Furthermore, during the inference stage, the RM is used to guide sampling, optimizing the model's performance. Qwen2.5-Math-Instruct supports both Chinese and English, and possess advanced mathematical reasoning capabilities, including Chain-of-Thought (CoT) and Tool-Integrated Reasoning (TIR). We evaluate our models on 10 mathematics datasets in both English and Chinese, such as GSM8K, MATH, GaoKao, AMC23, and AIME24, covering a range of difficulties from grade school level to math competition problems.
Published: 2024

6. Towards a Unified View of Preference Learning for Large Language Models: A Survey

Author: Gao, Bofei, Song, Feifan, Miao, Yibo, Cai, Zefan, Yang, Zhe, Chen, Liang, Hu, Helan, Xu, Runxin, Dong, Qingxiu, Zheng, Ce, Xiao, Wen, Zhang, Ge, Zan, Daoguang, Lu, Keming, Yu, Bowen, Liu, Dayiheng, Cui, Zeyu, Yang, Jian, Sha, Lei, Wang, Houfeng, Sui, Zhifang, Wang, Peiyi, Liu, Tianyu, and Chang, Baobao
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to understand. The relationships between different methods have been under-explored, limiting the development of the preference alignment. In light of this, we break down the existing popular alignment strategies into different components and provide a unified framework to study the current alignment strategies, thereby establishing connections among them. In this survey, we decompose all the strategies in preference learning into four components: model, data, feedback, and algorithm. This unified view offers an in-depth understanding of existing alignment algorithms and also opens up possibilities to synergize the strengths of different strategies. Furthermore, we present detailed working examples of prevalent existing algorithms to facilitate a comprehensive understanding for the readers. Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences., Comment: 23 pages, 6 figures
Published: 2024

7. Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model

Author: Yuan, Chenhan, Huang, Fei, Peng, Ru, Lu, Keming, Yu, Bowen, Zhou, Chang, and Zhou, Jingren
Subjects: Computer Science - Computation and Language
Abstract: Transformer-based large language models (LLMs) exhibit limitations such as generating unsafe responses, unreliable reasoning, etc. Existing inference intervention approaches attempt to mitigate these issues by finetuning additional models to produce calibration signals (such as rewards) that guide the LLM's decoding process. However, this solution introduces substantial time and space overhead due to the separate models required. This work proposes Non-disruptive parameters insertion (Otter), inserting extra parameters into the transformer architecture to predict calibration signals along with the original LLM output. Otter offers state-of-the-art performance on multiple demanding tasks while saving up to 86.5\% extra space and 98.5\% extra time. Furthermore, Otter seamlessly integrates with existing inference engines, requiring only a one-line code change, and the original model response remains accessible after the parameter insertion. Our code is publicly available at \url{https://github.com/chenhan97/Otter}, Comment: 16 pages
Published: 2024

8. Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement

Author: Yu, Le, Yu, Bowen, Yu, Haiyang, Huang, Fei, and Li, Yongbin
Subjects: Computer Science - Computation and Language
Abstract: Merging Large Language Models (LLMs) aims to amalgamate multiple homologous LLMs into one with all the capabilities. Ideally, any LLMs sharing the same backbone should be mergeable, irrespective of whether they are Fine-Tuned (FT) with minor parameter changes or Pre-Trained (PT) with substantial parameter shifts. However, existing methods often manually assign the model importance, rendering them feasible only for LLMs with similar parameter alterations, such as multiple FT LLMs. The diverse parameter changed ranges between FT and PT LLMs pose challenges for current solutions in empirically determining the optimal combination. In this paper, we make a pioneering effort to broaden the applicability of merging techniques from FT to PT LLMs. We initially examine the efficacy of current methods in merging FT and PT LLMs, discovering that they struggle to deal with PT LLMs. Subsequently, we introduce an approach based on WeIght DisENtanglement (WIDEN) to effectively extend the merging scope, which first disentangles model weights into magnitude and direction components, and then performs adaptive fusion by considering their respective contributions. In the experiments, we merge Qwen1.5-Chat (an FT LLM with instruction-following skills) with Sailor (a PT LLM with multilingual abilities) across 7B and 14B model scales. Results reveal that: (1) existing solutions usually fail when merging Sailor, either losing both abilities or only retaining instruction-following skills; (2) WIDEN successfully injects the multilingual abilities of Sailor into Qwen1.5-Chat and make it proficient in Southeast Asian languages, achieving enhancements in the fundamental capabilities. In light of previous research, we also merge multiple 13B FT LLMs and observe that WIDEN achieves a balanced amalgamation of instruction following, mathematical reasoning, and code generation skills., Comment: 17 pages
Published: 2024

9. Qwen2 Technical Report

Author: Yang, An, Yang, Baosong, Hui, Binyuan, Zheng, Bo, Yu, Bowen, Zhou, Chang, Li, Chengpeng, Li, Chengyuan, Liu, Dayiheng, Huang, Fei, Dong, Guanting, Wei, Haoran, Lin, Huan, Tang, Jialong, Wang, Jialin, Yang, Jian, Tu, Jianhong, Zhang, Jianwei, Ma, Jianxin, Yang, Jianxin, Xu, Jin, Zhou, Jingren, Bai, Jinze, He, Jinzheng, Lin, Junyang, Dang, Kai, Lu, Keming, Chen, Keqin, Yang, Kexin, Li, Mei, Xue, Mingfeng, Ni, Na, Zhang, Pei, Wang, Peng, Peng, Ru, Men, Rui, Gao, Ruize, Lin, Runji, Wang, Shijie, Bai, Shuai, Tan, Sinan, Zhu, Tianhang, Li, Tianhao, Liu, Tianyu, Ge, Wenbin, Deng, Xiaodong, Zhou, Xiaohuan, Ren, Xingzhang, Zhang, Xinyu, Wei, Xipin, Ren, Xuancheng, Liu, Xuejing, Fan, Yang, Yao, Yang, Zhang, Yichang, Wan, Yu, Chu, Yunfei, Liu, Yuqiong, Cui, Zeyu, Zhang, Zhenru, Guo, Zhifang, and Fan, Zhihao
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, showcases remarkable performance: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as a base language model. The instruction-tuned variant, Qwen2-72B-Instruct, attains 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench. Moreover, Qwen2 demonstrates robust multilingual capabilities, proficient in approximately 30 languages, spanning English, Chinese, Spanish, French, German, Arabic, Russian, Korean, Japanese, Thai, Vietnamese, and more, underscoring its versatility and global reach. To foster community innovation and accessibility, we have made the Qwen2 model weights openly available on Hugging Face and ModelScope, and the supplementary materials including example code on GitHub. These platforms also include resources for quantization, fine-tuning, and deployment, facilitating a wide range of applications and research endeavors., Comment: 26 pages, 1 figure
Published: 2024

10. LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation

Author: Zhao, Hongke, Zheng, Songming, Wu, Likang, Yu, Bowen, and Wang, Jing
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: The explainability of recommendation systems is crucial for enhancing user trust and satisfaction. Leveraging large language models (LLMs) offers new opportunities for comprehensive recommendation logic generation. However, in existing related studies, fine-tuning LLM models for recommendation tasks incurs high computational costs and alignment issues with existing systems, limiting the application potential of proven proprietary/closed-source LLM models, such as GPT-4. In this work, our proposed effective strategy LANE aligns LLMs with online recommendation systems without additional LLMs tuning, reducing costs and improving explainability. This innovative approach addresses key challenges in integrating language models with recommendation systems while fully utilizing the capabilities of powerful proprietary models. Specifically, our strategy operates through several key components: semantic embedding, user multi-preference extraction using zero-shot prompting, semantic alignment, and explainable recommendation generation using Chain of Thought (CoT) prompting. By embedding item titles instead of IDs and utilizing multi-head attention mechanisms, our approach aligns the semantic features of user preferences with those of candidate items, ensuring coherent and user-aligned recommendations. Sufficient experimental results including performance comparison, questionnaire voting, and visualization cases prove that our method can not only ensure recommendation performance, but also provide easy-to-understand and reasonable recommendation logic.
Published: 2024

11. Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Author: Dong, Guanting, Lu, Keming, Li, Chengpeng, Xia, Tingyu, Yu, Bowen, Zhou, Chang, and Zhou, Jingren
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: One core capability of large language models (LLMs) is to follow natural language instructions. However, the issue of automatically constructing high-quality training data to enhance the complex instruction-following abilities of LLMs without manual annotation remains unresolved. In this paper, we introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data. AutoIF transforms the validation of instruction-following data quality into code verification, requiring LLMs to generate instructions, the corresponding code to check the correctness of the instruction responses, and unit test samples to verify the code's correctness. Then, execution feedback-based rejection sampling can generate data for Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) training. AutoIF achieves significant improvements across three training algorithms, SFT, Offline DPO, and Online DPO, when applied to the top open-source LLMs, Qwen2 and LLaMA3, in self-alignment and strong-to-weak distillation settings. Our code is publicly available at https://github.com/QwenLM/AutoIF., Comment: Work in progress
Published: 2024

12. Towards Scalable Automated Alignment of LLMs: A Survey

Author: Cao, Boxi, Lu, Keming, Lu, Xinyu, Chen, Jiawei, Ren, Mengjie, Xiang, Hao, Liu, Peilin, Lu, Yaojie, He, Ben, Han, Xianpei, Sun, Le, Lin, Hongyu, and Yu, Bowen
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approaches. In this paper, we systematically review the recently emerging methods of automated alignment, attempting to explore how to achieve effective, scalable, automated alignment once the capabilities of LLMs exceed those of humans. Specifically, we categorize existing automated alignment methods into 4 major categories based on the sources of alignment signals and discuss the current status and potential development of each category. Additionally, we explore the underlying mechanisms that enable automated alignment and discuss the essential factors that make automated alignment technologies feasible and effective from the fundamental role of alignment., Comment: Paper List: https://github.com/cascip/awesome-auto-alignment
Published: 2024

13. Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment

Author: Lu, Keming, Yu, Bowen, Huang, Fei, Fan, Yang, Lin, Runji, and Zhou, Chang
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Effectively aligning Large Language Models (LLMs) with human-centric values while preventing the degradation of abilities acquired through Pre-training and Supervised Fine-tuning (SFT) poses a central challenge in Reinforcement Learning from Human Feedback (RLHF). In this paper, we first discover that interpolating RLHF and SFT model parameters can adjust the trade-off between human preference and basic capabilities, thereby reducing the alignment tax at the cost of alignment reward. Inspired by this, we propose integrating the RL policy and SFT models at each optimization step in RLHF to continuously regulate the training direction, introducing the Online Merging Optimizer. Specifically, we merge gradients with the parameter differences between SFT and pretrained models, effectively steering the gradient towards maximizing rewards in the direction of SFT optimization. We demonstrate that our optimizer works well with different LLM families, such as Qwen and LLaMA, across various model sizes ranging from 1.8B to 8B, various RLHF algorithms like DPO and KTO, and existing model merging methods. It significantly enhances alignment reward while mitigating alignment tax, achieving higher overall performance across 14 benchmarks.
Published: 2024

14. Language Models can Evaluate Themselves via Probability Discrepancy

Author: Xia, Tingyu, Yu, Bowen, Wu, Yuan, Chang, Yi, and Zhou, Chang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: In this paper, we initiate our discussion by demonstrating how Large Language Models (LLMs), when tasked with responding to queries, display a more even probability distribution in their answers if they are more adept, as opposed to their less skilled counterparts. Expanding on this foundational insight, we propose a new self-evaluation method ProbDiff for assessing the efficacy of various LLMs. This approach obviates the necessity for an additional evaluation model or the dependence on external, proprietary models like GPT-4 for judgment. It uniquely utilizes the LLMs being tested to compute the probability discrepancy between the initial response and its revised versions. A higher discrepancy for a given query between two LLMs indicates a relatively weaker capability. Our findings reveal that ProbDiff achieves results on par with those obtained from evaluations based on GPT-4, spanning a range of scenarios that include natural language generation (NLG) tasks such as translation, summarization, and our proposed Xiaohongshu blog writing task, and benchmarks for LLM evaluation like AlignBench, MT-Bench, and AlpacaEval, across LLMs of varying magnitudes., Comment: ACL 2024 Findings
Published: 2024

15. Polarization-based Metalenses with High Numerical Aperture and Focusing Efficiency Utilizing Silicon-rich Nitride

Author: Khalilian, Alireza, Yu, Bowen, and Yi, Yasha
Subjects: Physics - Optics, Physics - Applied Physics
Abstract: We explore the cutting-edge application of silicon-rich nitride (SRN) in the realm of high numerical aperture (NA) metalens design, focusing on the crucial role of pitch size optimization in amplifying lens efficiency through advanced simulations. Our investigation unveils how the exceptional tunable high refractive index of SRN can be harnessed to achieve significant advancements in metalens performance. By meticulously designing and simulating two innovative SRN-based metalenses - Mk1, with an NA of 0.9, reaching an impressive 75% focusing efficiency with full width at half maximum (FWHM) of 0.53{\lambda}, and Mk2, with an NA of 0.99, achieving a 42% efficiency while maintaining an FWHM of 0.48{\lambda}. We demonstrate the critical influence of reduced pitch size on enhancing efficiency. This study not only highlights the unparalleled potential of SRN in optimizing metalens efficiency but also represents a significant leap forward in the field of nanophotonics, offering new pathways for the development of highly efficient flat photonic devices.
Published: 2024

16. Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment

Author: Song, Feifan, Yu, Bowen, Lang, Hao, Yu, Haiyang, Huang, Fei, Wang, Houfeng, and Li, Yongbin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Alignment with human preference prevents large language models (LLMs) from generating misleading or toxic content while requiring high-cost human feedback. Assuming resources of human annotation are limited, there are two different ways of allocating considered: more diverse PROMPTS or more diverse RESPONSES to be labeled. Nonetheless, a straightforward comparison between their impact is absent. In this work, we first control the diversity of both sides according to the number of samples for fine-tuning, which can directly reflect their influence. We find that instead of numerous prompts, more responses but fewer prompts better trigger LLMs for human alignment. Additionally, the concept of diversity for prompts can be more complex than responses that are typically quantified by single digits. Consequently, a new formulation of prompt diversity is proposed, further implying a linear correlation with the final performance of LLMs after fine-tuning. We also leverage it on data augmentation and conduct experiments to show its effect on different algorithms., Comment: Accepted by LREC-COLING 2024
Published: 2024

17. Pre-trained Model-based Actionable Warning Identification: A Feasibility Study

Author: Ge, Xiuting, Fang, Chunrong, Zhang, Quanjun, Wu, Daoyuan, Yu, Bowen, Zheng, Qirui, Guo, An, Lin, Shangwei, Zhao, Zhihong, Liu, Yang, and Chen, Zhenyu
Subjects: Computer Science - Software Engineering
Abstract: Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of static code analyzers. Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. However, these approaches still face the problem of restricted performance due to the direct reliance on a limited number of labeled warnings to develop a classifier. Very recently, Pre-Trained Models (PTMs), which have been trained through billions of text/code tokens and demonstrated substantial success applications on various code-related tasks, could potentially circumvent the above problem. Nevertheless, the performance of PTMs on AWI has not been systematically investigated, leaving a gap in understanding their pros and cons. In this paper, we are the first to explore the feasibility of applying various PTMs for AWI. By conducting the extensive evaluation on 10K+ SpotBugs warnings from 10 large-scale and open-source projects, we observe that all studied PTMs are consistently 9.85%~21.12% better than the state-of-the-art ML-based AWI approaches. Besides, we investigate the impact of three primary aspects (i.e., data preprocessing, model training, and model prediction) in the typical PTM-based AWI workflow. Further, we identify the reasons for current PTMs' underperformance on AWI. Based on our findings, we provide several practical guidelines to enhance PTM-based AWI in future work.
Published: 2024

18. SoFA: Shielded On-the-fly Alignment via Priority Rule Following

Author: Lu, Xinyu, Yu, Bowen, Lu, Yaojie, Lin, Hongyu, Yu, Haiyang, Sun, Le, Han, Xianpei, and Li, Yongbin
Subjects: Computer Science - Computation and Language
Abstract: The alignment problem in Large Language Models (LLMs) involves adapting them to the broad spectrum of human values. This requirement challenges existing alignment methods due to diversity of preferences and regulatory standards. This paper introduces a novel alignment paradigm, priority rule following, which defines rules as the primary control mechanism in each dialog, prioritizing them over user instructions. Our preliminary analysis reveals that even the advanced LLMs, such as GPT-4, exhibit shortcomings in understanding and prioritizing the rules. Therefore, we present PriorityDistill, a semi-automated approach for distilling priority following signals from LLM simulations to ensure robust rule integration and adherence. Our experiments show that this method not only effectively minimizes misalignments utilizing only one general rule but also adapts smoothly to various unseen rules, ensuring they are shielded from hijacking and that the model responds appropriately.
Published: 2024

19. Self-Retrieval: Building an Information Retrieval System with One Large Language Model

Author: Tang, Qiaoyu, Chen, Jiawei, Yu, Bowen, Lu, Yaojie, Fu, Cheng, Yu, Haiyang, Lin, Hongyu, Huang, Fei, He, Ben, Han, Xianpei, Sun, Le, and Li, Yongbin
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: The rise of large language models (LLMs) has transformed the role of information retrieval (IR) systems in the way to humans accessing information. Due to the isolated architecture and the limited interaction, existing IR systems are unable to fully accommodate the shift from directly providing information to humans to indirectly serving large language models. In this paper, we propose Self-Retrieval, an end-to-end, LLM-driven information retrieval architecture that can fully internalize the required abilities of IR systems into a single LLM and deeply leverage the capabilities of LLMs during IR process. Specifically, Self-retrieval internalizes the corpus to retrieve into a LLM via a natural language indexing architecture. Then the entire retrieval process is redefined as a procedure of document generation and self-assessment, which can be end-to-end executed using a single large language model. Experimental results demonstrate that Self-Retrieval not only significantly outperforms previous retrieval approaches by a large margin, but also can significantly boost the performance of LLM-driven downstream applications like retrieval augumented generation.
Published: 2024

20. Passive Aperiodic Optical Phased Array based on Uniform Random Shuffle

Author: Yu, Bowen, Wu, Dachuan, and Yi, Yasha
Subjects: Physics - Applied Physics, Physics - Optics
Abstract: Grating lobes arise from the periodic nature of element spacing in the optical phased array. Essentially, the phased array performs the Spatial Fourier Transform on light; the steering capability of the main lobe is governed by phase shift variations among waveguides, and the Sidelobe Suppression Ratio (SLSR) correlates with the uniformity of emitter positions. Leveraging this understanding, we have optimized a 1x64 channel passive aperiodic OPAs with the uniform random shuffle in the emitter's position. Our conceptual simulations highlight a robust steering capability (18.60{\deg} / 10nm) and SLSR (-13.46 dB @ 0{\deg} / -8.27 dB @ +/-45{\deg}), and initial measurements demonstrate the steering capability (9.8 {\deg} / 10nm, with smaller phase shifts design) and SLSR (-6.1dB @ -33.4{\deg}) from the preliminary fabrication.
Published: 2024

21. Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment

Author: Lu, Keming, Yu, Bowen, Zhou, Chang, and Zhou, Jingren
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Considerable efforts have been invested in augmenting the role-playing proficiency of open-source large language models (LLMs) by emulating proprietary counterparts. Nevertheless, we posit that LLMs inherently harbor role-play capabilities, owing to the extensive knowledge of characters and potential dialogues ingrained in their vast training corpora. Thus, in this study, we introduce Ditto, a self-alignment method for role-play. Ditto capitalizes on character knowledge, encouraging an instruction-following LLM to simulate role-play dialogues as a variant of reading comprehension. This method creates a role-play training set comprising 4,000 characters, surpassing the scale of currently available datasets by tenfold regarding the number of roles. Subsequently, we fine-tune the LLM using this self-generated dataset to augment its role-playing capabilities. Upon evaluating our meticulously constructed and reproducible role-play benchmark and the roleplay subset of MT-Bench, Ditto, in various parameter scales, consistently maintains a consistent role identity and provides accurate role-specific knowledge in multi-turn role-play conversations. Notably, it outperforms all open-source role-play baselines, showcasing performance levels comparable to advanced proprietary chatbots. Furthermore, we present the first comprehensive cross-supervision alignment experiment in the role-play domain, revealing that the intrinsic capabilities of LLMs confine the knowledge within role-play. Meanwhile, the role-play styles can be easily acquired with the guidance of smaller models. We open-source related resources at https://github.com/OFA-Sys/Ditto.
Published: 2024

22. Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

Author: Yu, Le, Yu, Bowen, Yu, Haiyang, Huang, Fei, and Li, Yongbin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In this paper, we unveil that Language Models (LMs) can acquire new capabilities by assimilating parameters from homologous models without retraining or GPUs. We first introduce DARE to set most delta parameters (i.e., the disparity between fine-tuned and pre-trained parameters) to zeros without affecting the abilities of Supervised Fine-Tuning (SFT) LMs, which randomly Drops delta parameters with a ratio $p$ And REscales the remaining ones by $1 / (1 - p)$ to approximate the original embeddings. Then, we use DARE as a versatile plug-in to sparsify delta parameters of multiple SFT homologous models for mitigating parameter interference and merge them into a single model by parameter fusing. We experiment with encoder- and decoder-based LMs, showing that: (1) SFT delta parameter value ranges are typically small (within 0.002) with extreme redundancy, and DARE can effortlessly eliminate 90% or even 99% of them; (2) DARE can merge multiple task-specific LMs into one LM with diverse capabilities. Notably, this phenomenon is more pronounced in large-scale LMs, where the merged LM reveals the potential to surpass the performance of any source LM, providing a new discovery. We also utilize DARE to create a merged LM that ranks first among models with 7 billion parameters on the Open LLM Leaderboard., Comment: Accepted at ICML 2024
Published: 2023

23. Diversify Question Generation with Retrieval-Augmented Style Transfer

Author: Gou, Qi, Xia, Zehua, Yu, Bowen, Yu, Haiyang, Huang, Fei, Li, Yongbin, and Cam-Tu, Nguyen
Subjects: Computer Science - Computation and Language
Abstract: Given a textual passage and an answer, humans are able to ask questions with various expressions, but this ability is still challenging for most question generation (QG) systems. Existing solutions mainly focus on the internal knowledge within the given passage or the semantic word space for diverse content planning. These methods, however, have not considered the potential of external knowledge for expression diversity. To bridge this gap, we propose RAST, a framework for Retrieval-Augmented Style Transfer, where the objective is to utilize the style of diverse templates for question generation. For training RAST, we develop a novel Reinforcement Learning (RL) based approach that maximizes a weighted combination of diversity reward and consistency reward. Here, the consistency reward is computed by a Question-Answering (QA) model, whereas the diversity reward measures how much the final output mimics the retrieved template. Experimental results show that our method outperforms previous diversity-driven baselines on diversity while being comparable in terms of consistency scores. Our code is available at https://github.com/gouqi666/RAST., Comment: EMNLP2023 camera-ready
Published: 2023

24. Improving Question Generation with Multi-level Content Planning

Author: Xia, Zehua, Gou, Qi, Yu, Bowen, Yu, Haiyang, Huang, Fei, Li, Yongbin, and Nguyen, Cam-Tu
Subjects: Computer Science - Computation and Language
Abstract: This paper addresses the problem of generating questions from a given context and an answer, specifically focusing on questions that require multi-hop reasoning across an extended context. Previous studies have suggested that key phrase selection is essential for question generation (QG), yet it is still challenging to connect such disjointed phrases into meaningful questions, particularly for long context. To mitigate this issue, we propose MultiFactor, a novel QG framework based on multi-level content planning. Specifically, MultiFactor includes two components: FA-model, which simultaneously selects key phrases and generates full answers, and Q-model which takes the generated full answer as an additional input to generate questions. Here, full answer generation is introduced to connect the short answer with the selected key phrases, thus forming an answer-aware summary to facilitate QG. Both FA-model and Q-model are formalized as simple-yet-effective Phrase-Enhanced Transformers, our joint model for phrase selection and text generation. Experimental results show that our method outperforms strong baselines on two popular QG datasets. Our code is available at https://github.com/zeaver/MultiFactor., Comment: Camera-ready. Accepted by EMNLP 2023 Findings
Published: 2023

25. A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair

Author: Zhang, Quanjun, Zhang, Tongke, Zhai, Juan, Fang, Chunrong, Yu, Bowen, Sun, Weisong, and Chen, Zhenyu
Subjects: Computer Science - Software Engineering
Abstract: Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of Software Engineering (SE) tasks, such as Automated Program Repair (APR), code summarization, and code completion. For example, ChatGPT, the latest black-box LLM, has been investigated by numerous recent research studies and has shown impressive performance in various tasks. However, there exists a potential risk of data leakage since these LLMs are usually close-sourced with unknown specific training details, e.g., pre-training datasets. In this paper, we seek to review the bug-fixing capabilities of ChatGPT on a clean APR benchmark with different research objectives. We first introduce {\benchmark}, a new benchmark with buggy and the corresponding fixed programs from competitive programming problems starting from 2023, after the training cutoff point of ChatGPT. The results on {\benchmark} show that ChatGPT is able to fix 109 out of 151 buggy programs using the basic prompt within 35 independent rounds, outperforming state-of-the-art LLMs CodeT5 and PLBART by 27.5\% and 62.4\% prediction accuracy. We also investigate the impact of three types of prompts, i.e., problem description, error feedback, and bug localization, leading to additional 34 fixed bugs. Besides, we provide additional discussion from the interactive nature of ChatGPT to illustrate the capacity of a dialog-based repair workflow with 9 additional fixed bugs. Inspired by the findings, we further pinpoint various challenges and opportunities for advanced SE study equipped with such LLMs (e.g.,~ChatGPT) in the near future. More importantly, our work calls for more research on the reevaluation of the achievements obtained by existing black-box LLMs across various SE tasks, not limited to ChatGPT on APR., Comment: add EvalGPTFix URL
Published: 2023

26. Quantifying and mitigating the impact of label errors on model disparity metrics

Author: Adebayo, Julius, Hall, Melissa, Yu, Bowen, and Chern, Bobbie
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Errors in labels obtained via human annotation adversely affect a model's performance. Existing approaches propose ways to mitigate the effect of label error on a model's downstream accuracy, yet little is known about its impact on a model's disparity metrics. Here we study the effect of label error on a model's disparity metrics. We empirically characterize how varying levels of label error, in both training and test data, affect these disparity metrics. We find that group calibration and other metrics are sensitive to train-time and test-time label error -- particularly for minority groups. This disparate effect persists even for models trained with noise-aware algorithms. To mitigate the impact of training-time label error, we present an approach to estimate the influence of a training input's label on a model's group disparity metric. We empirically assess the proposed approach on a variety of datasets and find significant improvement, compared to alternative approaches, in identifying training inputs that improve a model's disparity metric. We complement the approach with an automatic relabel-and-finetune scheme that produces updated models with, provably, improved group calibration error., Comment: Conference paper at ICLR 2023
Published: 2023

27. Qwen Technical Report

Author: Bai, Jinze, Bai, Shuai, Chu, Yunfei, Cui, Zeyu, Dang, Kai, Deng, Xiaodong, Fan, Yang, Ge, Wenbin, Han, Yu, Huang, Fei, Hui, Binyuan, Ji, Luo, Li, Mei, Lin, Junyang, Lin, Runji, Liu, Dayiheng, Liu, Gao, Lu, Chengqiang, Lu, Keming, Ma, Jianxin, Men, Rui, Ren, Xingzhang, Ren, Xuancheng, Tan, Chuanqi, Tan, Sinan, Tu, Jianhong, Wang, Peng, Wang, Shijie, Wang, Wei, Wu, Shengguang, Xu, Benfeng, Xu, Jin, Yang, An, Yang, Hao, Yang, Jian, Yang, Shusheng, Yao, Yang, Yu, Bowen, Yuan, Hongyi, Yuan, Zheng, Zhang, Jianwei, Zhang, Xingxuan, Zhang, Yichang, Zhang, Zhenru, Zhou, Chang, Zhou, Jingren, Zhou, Xiaohuan, and Zhu, Tianhang
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models., Comment: 59 pages, 5 figures
Published: 2023

28. GAMMA: Revisiting Template-based Automated Program Repair via Mask Prediction

Author: Zhang, Quanjun, Fang, Chunrong, Zhang, Tongke, Yu, Bowen, Sun, Weisong, and Chen, Zhenyu
Subjects: Computer Science - Software Engineering
Abstract: Automated program repair (APR) aims to fix software bugs without human intervention and template-based APR has been widely investigated with promising results. However, it is challenging for template-based APR to select the appropriate donor code, which is an important repair ingredient for generating candidate patches. Inappropriate donor code may cause plausible but incorrect patch generation even with correct fix patterns, limiting the repair performance. In this paper, we aim to revisit template-based APR, and propose GAMMA, to directly leverage large pre-trained language models for donor code generation. Our main insight is that instead of retrieving donor code in the local buggy file, we can directly predict the correct code tokens based on the context code snippets and repair patterns by a cloze task. Specifically, (1) GAMMA revises a variety of fix templates from state-of-the-art template-based APR techniques (i.e., TBar) and transforms them into mask patterns. (2) GAMMA adopts a pre-trained language model to predict the correct code for masked code as a fill-in-the-blank task. The experimental results demonstrate that GAMMA correctly repairs 82 bugs on Defects4J-v1.2, which achieves 20.59\% (14 bugs) and 26.15\% (17 bugs) improvement over the previous state-of-the-art template-based approach TBar and learning-based one Recoder. Furthermore, GAMMA repairs 45 bugs and 22 bugs from the additional Defects4J-v2.0 and QuixBugs, indicating the generalizability of GAMMA in addressing the dataset overfitting issue. We also prove that adopting other pre-trained language models can provide substantial advancement, e.g., CodeBERT-based and ChatGPT-based GAMMA is able to fix 80 and 67 bugs on Defects4J-v1.2, indicating the scalability of GAMMA. Overall, our study highlights the promising future of adopting pre-trained models to generate correct patches on top of fix patterns., Comment: Accepted to 38th IEEE/ACM International Conference on Automated Software Engineering (ASE2023)
Published: 2023

29. Nanodevice-Enabled Near-Field Thermal Radiation between Sub-Wavelength Surfaces

Author: Luo, Xiao, Salihoglu, Hakan, Wang, Zexiao, Li, Zhuo, Kim, Hyeonggyun, Li, Jiayu, Yu, Bowen, Du, Shen, and Shen, Sheng
Subjects: Physics - Optics, Condensed Matter - Mesoscale and Nanoscale Physics, Physics - Applied Physics
Abstract: With the continuous advancement of nanotechnology, nanodevices have become crucial components in computing, sensing and energy conversion applications. However, the structures of nanodevices typically possess sub-wavelength dimensions and separations, which pose significant challenges for understanding energy transport phenomena in nanodevices. Here, based on a judiciously designed thermal nanodevice, we report the first measurement of near-field energy transport between two coplanar sub-wavelength structures over temperature bias up to ~190 K. Our experimental results demonstrate a remarkable 20-fold enhancement in heat transfer beyond blackbody radiation. In contrast with the well-established near-field interactions between two semi-infinite bodies, the sub-wavelength confinements in nanodevices lead to the increased polariton scattering and the reduction of supporting modes and therefore a lower heat flow at a given separation. Our work unveils exciting opportunities for the rational design of nanodevices, particularly for on-chip near-field energy transport, with important implications for the development of efficient nanodevices for energy harvesting and thermal management., Comment: 35 pages, 19 figures, 2 tables
Published: 2023

30. Pre-trained Model-based Automated Software Vulnerability Repair: How Far are We?

Author: Zhang, Quanjun, Fang, Chunrong, Yu, Bowen, Sun, Weisong, Zhang, Tongke, and Chen, Zhenyu
Subjects: Computer Science - Software Engineering
Abstract: Various approaches are proposed to help under-resourced security researchers to detect and analyze software vulnerabilities. It is still incredibly time-consuming and labor-intensive for security researchers to fix vulnerabilities. The time lag between reporting and fixing a vulnerability causes software systems to suffer from significant exposure to possible attacks. Recently, some techniques have proposed applying pre-trained models to fix security vulnerabilities and have proved their success in improving repair accuracy. However, the effectiveness of existing pre-trained models has not been systematically analyzed, and little is known about their advantages and disadvantages. To bridge this gap, we perform the first extensive study on applying various pre-trained models to vulnerability repair. The results show that studied pre-trained models consistently outperform the state-of-the-art technique VRepair with a prediction accuracy of 32.94%~44.96%. We also investigate the impact of major phases in the vulnerability repair workflow. Surprisingly, a simplistic approach adopting transfer learning improves the prediction accuracy of pre-trained models by 9.40% on average. Besides, we provide additional discussion to illustrate the capacity and limitations of pre-trained models. Finally, we further pinpoint various practical guidelines for advancing pre-trained model-based vulnerability repair. Our study highlights the promising future of adopting pre-trained models to patch real-world vulnerabilities., Comment: Accepted to IEEE Transactions on Dependable and Secure Computing 2023 (TDSC'23)
Published: 2023

31. A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment

Author: Zhao, Yingxiu, Yu, Bowen, Hui, Binyuan, Yu, Haiyang, Huang, Fei, Li, Yongbin, and Zhang, Nevin L.
Subjects: Computer Science - Computation and Language
Abstract: Training large language models (LLMs) with open-domain instruction data has yielded remarkable success in aligning to end tasks and human preferences. Extensive research has highlighted the importance of the quality and diversity of instruction data. However, the impact of data complexity, as a crucial metric, remains relatively unexplored from three aspects: (1)where the sustainability of performance improvements with increasing complexity is uncertain; (2)whether the improvement brought by complexity merely comes from introducing more training tokens; and (3)where the potential benefits of incorporating instructions from easy to difficult are not yet fully understood. In this paper, we propose Tree-Instruct to systematically enhance the instruction complexity in a controllable manner. By adding a specified number of nodes to instructions' semantic trees, this approach not only yields new instruction data from the modified tree but also allows us to control the difficulty level of modified instructions. Our preliminary experiments reveal the following insights: (1)Increasing complexity consistently leads to sustained performance improvements of LLMs. (2)Under the same token budget, a few complex instructions outperform diverse yet simple instructions. (3)Curriculum instruction tuning might not yield the anticipated results; focusing on increasing complexity appears to be the key., Comment: LREC-Coling 2024
Published: 2023

32. Wider and Deeper LLM Networks are Fairer LLM Evaluators

Author: Zhang, Xinghua, Yu, Bowen, Yu, Haiyang, Lv, Yangyu, Liu, Tingwen, Huang, Fei, Xu, Hongbo, and Li, Yongbin
Subjects: Computer Science - Computation and Language
Abstract: Measuring the quality of responses generated by LLMs is a challenging task, particularly when it comes to evaluating whether the response is aligned with human preference. A novel approach involves using the LLM itself to make evaluation and stabilizing the results through multiple independent evaluations, similar to a single-layer narrow LLM network. This network consists of a fixed number of neurons, with each neuron being the same LLM. In this paper, we draw upon the extensive research on deep neural networks to explore whether deeper and wider networks can lead to fairer evaluations. Specifically, inspired by the observation that different neurons in a neural network are responsible for detecting different concepts, we first adaptively generate as many neuron roles as possible for each evaluation sample. Each perspective corresponds to the role of a specific LLM neuron in the first layer. In subsequent layers, we follow the idea that higher layers in deep networks are responsible for more comprehensive features, each layer receives representations from all neurons in the previous layer, integrating the locally learned evaluation information to obtain a more comprehensive evaluation result. Interestingly, this network design resembles the process of academic paper reviewing. To validate the effectiveness of our method, we construct the largest and most diverse English evaluation benchmark LLMEval$^2$ for LLM evaluators, comprising 15 tasks, 8 abilities, and 2,553 samples. Experimental results demonstrate that a wider network (involving many reviewers) with 2 layers (one round of discussion) performs the best, improving kappa correlation coefficient from 0.28 to 0.34. We also leverage WideDeep to aid in the assessment of Chinese LLMs, which has accelerated the evaluation time by 4.6 times, resulting in a 60% cost saving. WideDeep achieves a remarkable 93% agreement level among humans., Comment: Work in Progress
Published: 2023

33. PolyLM: An Open Source Polyglot Large Language Model

Author: Wei, Xiangpeng, Wei, Haoran, Lin, Huan, Li, Tianhao, Zhang, Pei, Ren, Xingzhang, Li, Mei, Wan, Yu, Cao, Zhiwei, Xie, Binbin, Hu, Tianxiang, Li, Shangjie, Hui, Binyuan, Yu, Bowen, Liu, Dayiheng, Yang, Baosong, Huang, Fei, and Xie, Jun
Subjects: Computer Science - Computation and Language
Abstract: Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning. To assess the model's performance, we collect several existing multilingual tasks, including multilingual understanding, question answering, generation, and translation. Extensive experiments show that PolyLM surpasses other open-source models such as LLaMA and BLOOM on multilingual tasks while maintaining comparable performance in English. Our models, alone with the instruction data and multilingual benchmark, are available at: \url{https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation}.
Published: 2023

34. Preference Ranking Optimization for Human Alignment

Author: Song, Feifan, Yu, Bowen, Li, Minghao, Yu, Haiyang, Huang, Fei, Li, Yongbin, and Wang, Houfeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) often contain misleading content, emphasizing the need to align them with human values to ensure secure AI systems. Reinforcement learning from human feedback (RLHF) has been employed to achieve this alignment. However, it encompasses two main drawbacks: (1) RLHF exhibits complexity, instability, and sensitivity to hyperparameters in contrast to SFT. (2) Despite massive trial-and-error, multiple sampling is reduced to pair-wise contrast, thus lacking contrasts from a macro perspective. In this paper, we propose Preference Ranking Optimization (PRO) as an efficient SFT algorithm to directly fine-tune LLMs for human alignment. PRO extends the pair-wise contrast to accommodate preference rankings of any length. By iteratively contrasting candidates, PRO instructs the LLM to prioritize the best response while progressively ranking the rest responses. In this manner, PRO effectively transforms human alignment into aligning the probability ranking of n responses generated by LLM with the preference ranking of humans towards these responses. Experiments have shown that PRO outperforms baseline algorithms, achieving comparable results to ChatGPT and human responses through automatic-based, reward-based, GPT-4, and human evaluations., Comment: Accepted by AAAI 2024
Published: 2023

35. Unified Language Representation for Question Answering over Text, Tables, and Images

Author: Yu, Bowen, Fu, Cheng, Yu, Haiyang, Huang, Fei, and Li, Yongbin
Subjects: Computer Science - Computation and Language
Abstract: When trying to answer complex questions, people often rely on multiple sources of information, such as visual, textual, and tabular data. Previous approaches to this problem have focused on designing input features or model structure in the multi-modal space, which is inflexible for cross-modal reasoning or data-efficient training. In this paper, we call for an alternative paradigm, which transforms the images and tables into unified language representations, so that we can simplify the task into a simpler textual QA problem that can be solved using three steps: retrieval, ranking, and generation, all within a language space. This idea takes advantage of the power of pre-trained language models and is implemented in a framework called Solar. Our experimental results show that Solar outperforms all existing methods by 10.6-32.3 pts on two datasets, MultimodalQA and MMCoQA, across ten different metrics. Additionally, Solar achieves the best performance on the WebQA leaderboard, Comment: Findings of ACL 2023
Published: 2023

36. Causal Document-Grounded Dialogue Pre-training

Author: Zhao, Yingxiu, Yu, Bowen, Yu, Haiyang, Li, Bowen, Li, Jinyang, Wang, Chao, Huang, Fei, Li, Yongbin, and Zhang, Nevin L.
Subjects: Computer Science - Computation and Language
Abstract: The goal of document-grounded dialogue (DocGD) is to generate a response by grounding the evidence in a supporting document in accordance with the dialogue context. This process involves four variables that are causally connected. Recently, task-specific pre-training has greatly boosted performances on many downstream tasks. Existing DocGD methods, however, continue to rely on general pre-trained language models without a specifically tailored pre-training approach that explicitly captures the causal relationships. To tackle this issue, we are the first to present a causally-complete dataset construction strategy for building million-level DocGD pre-training corpora. To better capture causality, we further propose a causally-perturbed pre-training strategy, which introduces causal perturbations on the variables and optimizes the overall causal effect. Experiments on three benchmark datasets demonstrate that our causal pre-training achieves considerable and consistent improvements under fully-supervised, low-resource, few-shot, and zero-shot settings., Comment: EMNLP 2023 main
Published: 2023

37. Domain Incremental Lifelong Learning in an Open World

Author: Dai, Yi, Lang, Hao, Zheng, Yinhe, Yu, Bowen, Huang, Fei, and Li, Yongbin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Lifelong learning (LL) is an important ability for NLP models to learn new tasks continuously. Architecture-based approaches are reported to be effective implementations for LL models. However, it is non-trivial to extend previous approaches to domain incremental LL scenarios since they either require access to task identities in the testing phase or cannot handle samples from unseen tasks. In this paper, we propose \textbf{Diana}: a \underline{d}ynam\underline{i}c \underline{a}rchitecture-based lifelo\underline{n}g le\underline{a}rning model that tries to learn a sequence of tasks with a prompt-enhanced language model. Four types of hierarchically organized prompts are used in Diana to capture knowledge from different granularities. Specifically, we dedicate task-level prompts to capture task-specific knowledge to retain high LL performances and maintain instance-level prompts to learn knowledge shared across input samples to improve the model's generalization performance. Moreover, we dedicate separate prompts to explicitly model unseen tasks and introduce a set of prompt key vectors to facilitate knowledge sharing between tasks. Extensive experiments demonstrate that Diana outperforms state-of-the-art LL models, especially in handling unseen tasks. We release the code and data at \url{https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/diana}., Comment: ACL2023 Findings Long Paper. arXiv admin note: substantial text overlap with arXiv:2208.14602
Published: 2023

38. ID-MixGCL: Identity Mixup for Graph Contrastive Learning

Author: Zhang, Gehang, Yu, Bowen, Cao, Jiangxia, Zhang, Xinghua, Sheng, Jiawei, Zhou, Chuan, and Liu, Tingwen
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Graph contrastive learning (GCL) has recently achieved substantial advancements. Existing GCL approaches compare two different ``views'' of the same graph in order to learn node/graph representations. The underlying assumption of these studies is that the graph augmentation strategy is capable of generating several different graph views such that the graph views are structurally different but semantically similar to the original graphs, and thus the ground-truth labels of the original and augmented graph/nodes can be regarded identical in contrastive learning. However, we observe that this assumption does not always hold. For instance, the deletion of a super-node within a social network can exert a substantial influence on the partitioning of communities for other nodes. Similarly, any perturbation to nodes or edges in a molecular graph will change the labels of the graph. Therefore, we believe that augmenting the graph, accompanied by an adaptation of the labels used for the contrastive loss, will facilitate the encoder to learn a better representation. Based on this idea, we propose ID-MixGCL, which allows the simultaneous interpolation of input nodes and corresponding identity labels to obtain soft-confidence samples, with a controllable degree of change, leading to the capture of fine-grained representations from self-supervised training on unlabeled graphs. Experimental results demonstrate that ID-MixGCL improves performance on graph classification and node classification tasks, as demonstrated by significant improvements on the Cora, IMDB-B, IMDB-M, and PROTEINS datasets compared to state-of-the-art techniques, by 3-29% absolute points., Comment: 10 pages, 7 figures, accepted by IEEE BigData 2023
Published: 2023

39. API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

Author: Li, Minghao, Zhao, Yingxiu, Yu, Bowen, Song, Feifan, Li, Hangyu, Yu, Haiyang, Li, Zhoujun, Huang, Fei, and Li, Yongbin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent research has demonstrated that Large Language Models (LLMs) can enhance their capabilities by utilizing external tools. However, three pivotal questions remain unanswered: (1) How effective are current LLMs in utilizing tools? (2) How can we enhance LLMs' ability to utilize tools? (3) What obstacles need to be overcome to leverage tools? To address these questions, we introduce API-Bank, a groundbreaking benchmark, specifically designed for tool-augmented LLMs. For the first question, we develop a runnable evaluation system consisting of 73 API tools. We annotate 314 tool-use dialogues with 753 API calls to assess the existing LLMs' capabilities in planning, retrieving, and calling APIs. For the second question, we construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains. Using this dataset, we train Lynx, a tool-augmented LLM initialized from Alpaca. Experimental results demonstrate that GPT-3.5 exhibits improved tool utilization compared to GPT-3, while GPT-4 excels in planning. However, there is still significant potential for further improvement. Moreover, Lynx surpasses Alpaca's tool utilization performance by more than 26 pts and approaches the effectiveness of GPT-3.5. Through error analysis, we highlight the key challenges for future research in this field to answer the third question., Comment: EMNLP 2023
Published: 2023

40. Towards Generalized Open Information Extraction

Author: Yu, Bowen, Zhang, Zhenyu, Li, Jingyang, Yu, Haiyang, Liu, Tingwen, Sun, Jian, Li, Yongbin, and Wang, Bin
Subjects: Computer Science - Computation and Language
Abstract: Open Information Extraction (OpenIE) facilitates the open-domain discovery of textual facts. However, the prevailing solutions evaluate OpenIE models on in-domain test sets aside from the training corpus, which certainly violates the initial task principle of domain-independence. In this paper, we propose to advance OpenIE towards a more realistic scenario: generalizing over unseen target domains with different data distributions from the source training domains, termed Generalized OpenIE. For this purpose, we first introduce GLOBE, a large-scale human-annotated multi-domain OpenIE benchmark, to examine the robustness of recent OpenIE models to domain shifts, and the relative performance degradation of up to 70% implies the challenges of generalized OpenIE. Then, we propose DragonIE, which explores a minimalist graph expression of textual fact: directed acyclic graph, to improve the OpenIE generalization. Extensive experiments demonstrate that DragonIE beats the previous methods in both in-domain and out-of-domain settings by as much as 6.0% in F1 score absolutely, but there is still ample room for improvement., Comment: EMNLP 2022
Published: 2022

41. Semi-Supervised Lifelong Language Learning

Author: Zhao, Yingxiu, Zheng, Yinhe, Yu, Bowen, Tian, Zhiliang, Lee, Dongkyu, Sun, Jian, Yu, Haiyang, Li, Yongbin, and Zhang, Nevin L.
Subjects: Computer Science - Computation and Language
Abstract: Lifelong learning aims to accumulate knowledge and alleviate catastrophic forgetting when learning tasks sequentially. However, existing lifelong language learning methods only focus on the supervised learning setting. Unlabeled data, which can be easily accessed in real-world scenarios, are underexplored. In this paper, we explore a novel setting, semi-supervised lifelong language learning (SSLL), where a model learns sequentially arriving language tasks with both labeled and unlabeled data. We propose an unlabeled data enhanced lifelong learner to explore SSLL. Specially, we dedicate task-specific modules to alleviate catastrophic forgetting and design two modules to exploit unlabeled data: (1) a virtual supervision enhanced task solver is constructed on a teacher-student framework to mine the underlying knowledge from unlabeled data; and (2) a backward augmented learner is built to encourage knowledge transfer from newly arrived unlabeled data to previous tasks. Experimental results on various language tasks demonstrate our model's effectiveness and superiority over competitive baselines under the new setting SSLL., Comment: EMNLP Findings 2022 Long Paper
Published: 2022

42. Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong Learning in Task-Oriented Dialogue

Author: Zhao, Yingxiu, Zheng, Yinhe, Tian, Zhiliang, Gao, Chang, Yu, Bowen, Yu, Haiyang, Li, Yongbin, Sun, Jian, and Zhang, Nevin L.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Lifelong learning (LL) is vital for advanced task-oriented dialogue (ToD) systems. To address the catastrophic forgetting issue of LL, generative replay methods are widely employed to consolidate past knowledge with generated pseudo samples. However, most existing generative replay methods use only a single task-specific token to control their models. This scheme is usually not strong enough to constrain the generative model due to insufficient information involved. In this paper, we propose a novel method, prompt conditioned VAE for lifelong learning (PCLL), to enhance generative replay by incorporating tasks' statistics. PCLL captures task-specific distributions with a conditional variational autoencoder, conditioned on natural language prompts to guide the pseudo-sample generation. Moreover, it leverages a distillation process to further consolidate past knowledge by alleviating the noise in pseudo samples. Experiments on natural language understanding tasks of ToD systems demonstrate that PCLL significantly outperforms competitive baselines in building LL models., Comment: EMNLP2022 Long Paper (Main Track)
Published: 2022

43. Central recirculation zone in a V-shaped premixed swirling flame

Author: Wang, Qiuxiao, Ren, Yongzhi, Gu, Mingming, Yu, Bowen, Feng, Xiaoxing, Qi, Fei, and Xia, Xi
Subjects: Physics - Fluid Dynamics
Abstract: This paper presents an experimental study on the emergence of the central recirculation zone (CRZ) in a V-shaped premixed swirling flame, using simultaneous measurement of particle image velocimetry (PIV) and CH* chemiluminescence. The results show that either increasing the Reynolds number (Re) or decreasing the equivalence ratio ({\Phi}) would facilitate the emergence of CRZ. Further analysis demonstrates that the CRZ characteristics and its emergence are strongly influenced by the inner shear layer (ISL) surrounding the CRZ, while the swirl intensity remains unchanged. Dimensional analysis is performed to understand the underlying mechanism, suggesting the CRZ emergence is controlled by a non-dimensional parameter, Re_s=|{\gamma}|_max D/{\nu}_s, defined based on the maximum ISL intensity (|{\gamma}|_max), the exit diameter (D), and the kinematic viscosity ({\nu}_s) of the burnt gas. By estimating the temperature and viscosity with a simple heat-loss model, we show in the |{\gamma}|_max D-{\nu}_s regime diagram that the cases with and without CRZ are separated by a single boundary line, corresponding to a critical Re_s of about 424. This verifies the applicability of the proposed Re_s criterion to lean-premixed V-shaped swirling flames under various conditions. Unlike most previous works that attribute the CRZ of swirling flames to vortex breakdown, the present work reveals the non-negligible effect of the ISL, especially the CRZ suppression when the ISL is weakened by flame heating.
Published: 2022

44. Accurate Direct Measurements of Far-Field Thermal Infrared Emission and its Dynamics

Author: Liu, Xiu, Salihoglu, Hakan, Luo, Xiao, Yun, Hyeong Seok, Jing, Lin, Yu, Bowen, and Shen, Sheng
Subjects: Physics - Instrumentation and Detectors, Physics - Optics
Abstract: Accurate direct measurements of far-field thermal infrared emission become increasingly important because conventional methods, relying on indirect assessments, such as reflectance/transmittance, are inaccurate or even unfeasible to characterize state-of-art devices with novel spectra, directionalities, and polarizations. The direct collection of the far-field emission from these tiny devices is also challenging because of their shrinking footprints and uncontrollable radiation noises from their surroundings. Here, we demonstrate a microscopic lock-in FTIR system that realizes significant improvement in signal-to-noise ratio (SNR) by combining a microscope and a lock-in amplifier with an FTIR. The lock-in FTIR is ultrasensitive, with a specific detectivity 10^6 times higher than commercial ones, to overcome the optical loss and background noise during the emission light collection. Based on an analytical model of the signal detection process, we also explore the combination of modulated Joule heating and global heating to fulfill the potential of our system for noise reduction. Our findings show that, compared to previous studies, more than 3 times lower temperatures are sufficient to generate a measurable signal. Under a heating temperature of around 125 {\deg}C, we can achieve an SNR of about 23.7, which is far above the true-signal-threshold (SNR of about 3.0). Furthermore, the system can respond fast enough (up to 175kHz) to record spectral-resolved dynamics of microdevices in the frequency domain. The measurable frequency range can be extended up to MHz or even GHz level by a high-speed circuit model. We believe the system together with the analytical signal processing can be beneficial for next-generation thermal infrared material and device exploration, boosting the applications in lighting, sensing, imaging, and energy harvesting on a small scale., Comment: 19 pages, 4 figures
Published: 2022
Full Text: View/download PDF

45. Electrically Driven Thermal Infrared Metasurface with Narrowband Emission

Author: Liu, Xiu, Jing, Lin, Luo, Xiao, Yu, Bowen, Du, Shen, Wang, Zexiao, Kim, Hyeonggyun, Zhong, Yibai, and Shen, Sheng
Subjects: Physics - Optics, Physics - Applied Physics
Abstract: Metasurfaces consisting of an array of planar sub-wavelength structures have shown great potentials in controlling thermal infrared radiation, including intensity, coherence, and polarization. These capabilities together with the two-dimensional nature make thermal metasurfaces an ultracompact multifunctional platform for infrared light manipulation. Integrating the functionalities, such as amplitude, phase (spectrum and directionality), and polarization, on a single metasurface offers fascinating device responses. However, it remains a significant challenge to concurrently optimize the optical, electrical, and thermal responses of a thermal metasurface in a small footprint. In this work, we develop a center-contacted electrode line design for a thermal infrared metasurface based on a gold nanorod array, which allows local Joule heating to electrically excite the emission without undermining the localized surface plasmonic resonance. The narrowband emission of thermal metasurfaces and their robustness against temperature nonuniformity demonstrated in this work have important implications for the applications in infrared imaging, sensing, and energy harvesting., Comment: 9 pages, 4 figures
Published: 2022
Full Text: View/download PDF

46. Photonic integrated circuit with multiple waveguide layers for broadband high-efficient on-chip 3-D optical phased arrays in light detection and ranging applications

Author: Wu, Dachuan, Yu, Bowen, Kakdarvishi, Venus, and Yi, Yasha
Subjects: Physics - Optics, Physics - Applied Physics
Abstract: Traditional photonic integrated circuit (PIC) inherits the mature CMOS fabrication process from the electronic integrated circuit (IC) industry. However, this process also limits the PIC structure to a single-waveguide-layer configuration. In this work, we explore the possibility of the multi-waveguide-layer PIC by proposing and demonstrating a true 3-D optical phased array (OPA) device, with the light exiting from the edge of the device, based on a multi-layer Si3N4/SiO2 platform. The multi-waveguide-layer configuration offers the possibility of utilizing edge couplers at both the input and the emitting ends to achieve broadband high efficiency. This uniqueness provides the potential for a more extended detection range in the Lidar application. The device has been studied by numerical simulation, and proof-of-concept samples have been fabricated and tested with a CMOS-compatible process. To the best of our knowledge, this is the first experimental proof-of-concept of a true 3-D OPA with a multi-waveguide-layer configuration all over the device.
Published: 2022

47. Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration

Author: Zhang, Zhenyu, Yu, Bowen, Yu, Haiyang, Liu, Tingwen, Fu, Cheng, Li, Jingyang, Tang, Chengguang, Sun, Jian, and Li, Yongbin
Subjects: Computer Science - Computation and Language, Computer Science - Multimedia
Abstract: Building document-grounded dialogue systems have received growing interest as documents convey a wealth of human knowledge and commonly exist in enterprises. Wherein, how to comprehend and retrieve information from documents is a challenging research problem. Previous work ignores the visual property of documents and treats them as plain text, resulting in incomplete modality. In this paper, we propose a Layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents (VRDs), so as to generate accurate responses in dialogue systems. LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents, becoming the largest VRD-based information extraction dataset to the best of our knowledge. We also develop benchmark methods that extend the token-based language model to consider layout features like humans. Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about., Comment: Accepted to ACM Multimedia (MM) Industry Track 2022
Published: 2022

48. A Survey on Neural Open Information Extraction: Current Status and Future Directions

Author: Zhou, Shaowen, Yu, Bowen, Sun, Aixin, Long, Cheng, Li, Jingyang, Yu, Haiyang, Sun, Jian, and Li, Yongbin
Subjects: Computer Science - Computation and Language
Abstract: Open Information Extraction (OpenIE) facilitates domain-independent discovery of relational facts from large corpora. The technique well suits many open-world natural language understanding scenarios, such as automatic knowledge base construction, open-domain question answering, and explicit reasoning. Thanks to the rapid development in deep learning technologies, numerous neural OpenIE architectures have been proposed and achieve considerable performance improvement. In this survey, we provide an extensive overview of the-state-of-the-art neural OpenIE models, their key design decisions, strengths and weakness. Then, we discuss limitations of current solutions and the open issues in OpenIE problem itself. Finally we list recent trends that could help expand its scope and applicability, setting up promising directions for future research in OpenIE. To our best knowledge, this paper is the first review on this specific topic., Comment: Accepted by IJCAI22 survey track
Published: 2022

49. Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid

Author: Cao, Huanqi, Tang, Shizhi, Zhu, Qianchao, Yu, Bowen, and Chen, Wenguang
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Partial differential equation (PDE) solvers are extensively utilized across numerous scientific and engineering fields. However, achieving high performance and scalability often necessitates intricate and low-level programming, particularly when leveraging deterministic sparsity patterns in structured grids. In this paper, we propose an innovative domain-specific language (DSL), Mat2Stencil, with its compiler, for PDE solvers on structured grids. Mat2Stencil introduces a structured sparse matrix abstraction, facilitating modular, flexible, and easy-to-use expression of solvers across a broad spectrum, encompassing components such as Jacobi or Gauss-Seidel preconditioners, incomplete LU or Cholesky decompositions, and multigrid methods built upon them. Our DSL compiler subsequently generates matrix-free code consisting of generalized stencils through multi-stage programming. The code allows spatial loop-carried dependence in the form of quasi-affine loops, in addition to the Jacobi-style stencil's embarrassingly parallel on spatial dimensions. We further propose a novel automatic parallelization technique for the spatially dependent loops, which offers a compile-time deterministic task partitioning for threading, calculates necessary inter-thread synchronization automatically, and generates an efficient multi-threaded implementation with fine-grained synchronization. Implementing 4 benchmarking programs, 3 of them being the pseudo-applications in NAS Parallel Benchmarks with $6.3\%$ lines of code and 1 being matrix-free High Performance Conjugate Gradients with $16.4\%$ lines of code, we achieve up to $1.67\times$ and on average $1.03\times$ performance compared to manual implementations.
Published: 2022

50. Profiling and Evolution of Intellectual Property

Author: Yu, Bowen, Shao, Yingxia, and Li, Ang
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: In recent years, with the rapid growth of Internet data, the number and types of scientific and technological resources are also rapidly expanding. However, the increase in the number and category of information data will also increase the cost of information acquisition. For technology-based enterprises or users, in addition to general papers, patents, etc., policies related to technology or the development of their industries should also belong to a type of scientific and technological resources. The cost and difficulty of acquiring users. Extracting valuable science and technology policy resources from a huge amount of data with mixed contents and providing accurate and fast retrieval will help to break down information barriers and reduce the cost of information acquisition, which has profound social significance and social utility. This article focuses on the difficulties and problems in the field of science and technology policy, and introduces related technologies and developments., Comment: There are some problems in the conclusions and analysis of this paper, and we need to withdraw this paper
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

68 results on '"Yu, Bowen"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources