Author: "Yu, Bowen" / Topic: computer science - artificial intelligence - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yu, Bowen"' showing total 17 results

Start Over Author "Yu, Bowen" Topic computer science - artificial intelligence

17 results on '"Yu, Bowen"'

1. Rethinking Data Selection at Scale: Random Selection is Almost All You Need

Author: Xia, Tingyu, Yu, Bowen, Dang, Kai, Yang, An, Wu, Yuan, Tian, Yuan, Chang, Yi, and Lin, Junyang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Supervised fine-tuning (SFT) is crucial for aligning Large Language Models (LLMs) with human instructions. The primary goal during SFT is to select a small yet representative subset of training data from the larger pool, such that fine-tuning with this subset achieves results comparable to or even exceeding those obtained using the entire dataset. However, most existing data selection techniques are designed for small-scale data pools, which fail to meet the demands of real-world SFT scenarios. In this paper, we replicated several self-scoring methods those that do not rely on external model assistance on two million scale datasets, and found that nearly all methods struggled to significantly outperform random selection when dealing with such large-scale data pools. Moreover, our comparisons suggest that, during SFT, diversity in data selection is more critical than simply focusing on high quality data. We also analyzed the limitations of several current approaches, explaining why they perform poorly on large-scale datasets and why they are unsuitable for such contexts. Finally, we found that filtering data by token length offers a stable and efficient method for improving results. This approach, particularly when training on long text data, proves highly beneficial for relatively weaker base models, such as Llama3.
Published: 2024

2. Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

Author: Yang, An, Zhang, Beichen, Hui, Binyuan, Gao, Bofei, Yu, Bowen, Li, Chengpeng, Liu, Dayiheng, Tu, Jianhong, Zhou, Jingren, Lin, Junyang, Lu, Keming, Xue, Mingfeng, Lin, Runji, Liu, Tianyu, Ren, Xingzhang, and Zhang, Zhenru
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: In this report, we present a series of math-specific large language models: Qwen2.5-Math and Qwen2.5-Math-Instruct-1.5B/7B/72B. The core innovation of the Qwen2.5 series lies in integrating the philosophy of self-improvement throughout the entire pipeline, from pre-training and post-training to inference: (1) During the pre-training phase, Qwen2-Math-Instruct is utilized to generate large-scale, high-quality mathematical data. (2) In the post-training phase, we develop a reward model (RM) by conducting massive sampling from Qwen2-Math-Instruct. This RM is then applied to the iterative evolution of data in supervised fine-tuning (SFT). With a stronger SFT model, it's possible to iteratively train and update the RM, which in turn guides the next round of SFT data iteration. On the final SFT model, we employ the ultimate RM for reinforcement learning, resulting in the Qwen2.5-Math-Instruct. (3) Furthermore, during the inference stage, the RM is used to guide sampling, optimizing the model's performance. Qwen2.5-Math-Instruct supports both Chinese and English, and possess advanced mathematical reasoning capabilities, including Chain-of-Thought (CoT) and Tool-Integrated Reasoning (TIR). We evaluate our models on 10 mathematics datasets in both English and Chinese, such as GSM8K, MATH, GaoKao, AMC23, and AIME24, covering a range of difficulties from grade school level to math competition problems.
Published: 2024

3. Qwen2 Technical Report

Author: Yang, An, Yang, Baosong, Hui, Binyuan, Zheng, Bo, Yu, Bowen, Zhou, Chang, Li, Chengpeng, Li, Chengyuan, Liu, Dayiheng, Huang, Fei, Dong, Guanting, Wei, Haoran, Lin, Huan, Tang, Jialong, Wang, Jialin, Yang, Jian, Tu, Jianhong, Zhang, Jianwei, Ma, Jianxin, Yang, Jianxin, Xu, Jin, Zhou, Jingren, Bai, Jinze, He, Jinzheng, Lin, Junyang, Dang, Kai, Lu, Keming, Chen, Keqin, Yang, Kexin, Li, Mei, Xue, Mingfeng, Ni, Na, Zhang, Pei, Wang, Peng, Peng, Ru, Men, Rui, Gao, Ruize, Lin, Runji, Wang, Shijie, Bai, Shuai, Tan, Sinan, Zhu, Tianhang, Li, Tianhao, Liu, Tianyu, Ge, Wenbin, Deng, Xiaodong, Zhou, Xiaohuan, Ren, Xingzhang, Zhang, Xinyu, Wei, Xipin, Ren, Xuancheng, Liu, Xuejing, Fan, Yang, Yao, Yang, Zhang, Yichang, Wan, Yu, Chu, Yunfei, Liu, Yuqiong, Cui, Zeyu, Zhang, Zhenru, Guo, Zhifang, and Fan, Zhihao
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, showcases remarkable performance: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as a base language model. The instruction-tuned variant, Qwen2-72B-Instruct, attains 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench. Moreover, Qwen2 demonstrates robust multilingual capabilities, proficient in approximately 30 languages, spanning English, Chinese, Spanish, French, German, Arabic, Russian, Korean, Japanese, Thai, Vietnamese, and more, underscoring its versatility and global reach. To foster community innovation and accessibility, we have made the Qwen2 model weights openly available on Hugging Face and ModelScope, and the supplementary materials including example code on GitHub. These platforms also include resources for quantization, fine-tuning, and deployment, facilitating a wide range of applications and research endeavors., Comment: 26 pages, 1 figure
Published: 2024

4. Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Author: Dong, Guanting, Lu, Keming, Li, Chengpeng, Xia, Tingyu, Yu, Bowen, Zhou, Chang, and Zhou, Jingren
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: One core capability of large language models (LLMs) is to follow natural language instructions. However, the issue of automatically constructing high-quality training data to enhance the complex instruction-following abilities of LLMs without manual annotation remains unresolved. In this paper, we introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data. AutoIF transforms the validation of instruction-following data quality into code verification, requiring LLMs to generate instructions, the corresponding code to check the correctness of the instruction responses, and unit test samples to verify the code's correctness. Then, execution feedback-based rejection sampling can generate data for Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) training. AutoIF achieves significant improvements across three training algorithms, SFT, Offline DPO, and Online DPO, when applied to the top open-source LLMs, Qwen2 and LLaMA3, in self-alignment and strong-to-weak distillation settings. Our code is publicly available at https://github.com/QwenLM/AutoIF., Comment: Work in progress
Published: 2024

5. Towards Scalable Automated Alignment of LLMs: A Survey

Author: Cao, Boxi, Lu, Keming, Lu, Xinyu, Chen, Jiawei, Ren, Mengjie, Xiang, Hao, Liu, Peilin, Lu, Yaojie, He, Ben, Han, Xianpei, Sun, Le, Lin, Hongyu, and Yu, Bowen
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approaches. In this paper, we systematically review the recently emerging methods of automated alignment, attempting to explore how to achieve effective, scalable, automated alignment once the capabilities of LLMs exceed those of humans. Specifically, we categorize existing automated alignment methods into 4 major categories based on the sources of alignment signals and discuss the current status and potential development of each category. Additionally, we explore the underlying mechanisms that enable automated alignment and discuss the essential factors that make automated alignment technologies feasible and effective from the fundamental role of alignment., Comment: Paper List: https://github.com/cascip/awesome-auto-alignment
Published: 2024

6. Language Models can Evaluate Themselves via Probability Discrepancy

Author: Xia, Tingyu, Yu, Bowen, Wu, Yuan, Chang, Yi, and Zhou, Chang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: In this paper, we initiate our discussion by demonstrating how Large Language Models (LLMs), when tasked with responding to queries, display a more even probability distribution in their answers if they are more adept, as opposed to their less skilled counterparts. Expanding on this foundational insight, we propose a new self-evaluation method ProbDiff for assessing the efficacy of various LLMs. This approach obviates the necessity for an additional evaluation model or the dependence on external, proprietary models like GPT-4 for judgment. It uniquely utilizes the LLMs being tested to compute the probability discrepancy between the initial response and its revised versions. A higher discrepancy for a given query between two LLMs indicates a relatively weaker capability. Our findings reveal that ProbDiff achieves results on par with those obtained from evaluations based on GPT-4, spanning a range of scenarios that include natural language generation (NLG) tasks such as translation, summarization, and our proposed Xiaohongshu blog writing task, and benchmarks for LLM evaluation like AlignBench, MT-Bench, and AlpacaEval, across LLMs of varying magnitudes., Comment: ACL 2024 Findings
Published: 2024

7. Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment

Author: Song, Feifan, Yu, Bowen, Lang, Hao, Yu, Haiyang, Huang, Fei, Wang, Houfeng, and Li, Yongbin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Alignment with human preference prevents large language models (LLMs) from generating misleading or toxic content while requiring high-cost human feedback. Assuming resources of human annotation are limited, there are two different ways of allocating considered: more diverse PROMPTS or more diverse RESPONSES to be labeled. Nonetheless, a straightforward comparison between their impact is absent. In this work, we first control the diversity of both sides according to the number of samples for fine-tuning, which can directly reflect their influence. We find that instead of numerous prompts, more responses but fewer prompts better trigger LLMs for human alignment. Additionally, the concept of diversity for prompts can be more complex than responses that are typically quantified by single digits. Consequently, a new formulation of prompt diversity is proposed, further implying a linear correlation with the final performance of LLMs after fine-tuning. We also leverage it on data augmentation and conduct experiments to show its effect on different algorithms., Comment: Accepted by LREC-COLING 2024
Published: 2024

8. Self-Retrieval: Building an Information Retrieval System with One Large Language Model

Author: Tang, Qiaoyu, Chen, Jiawei, Yu, Bowen, Lu, Yaojie, Fu, Cheng, Yu, Haiyang, Lin, Hongyu, Huang, Fei, He, Ben, Han, Xianpei, Sun, Le, and Li, Yongbin
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: The rise of large language models (LLMs) has transformed the role of information retrieval (IR) systems in the way to humans accessing information. Due to the isolated architecture and the limited interaction, existing IR systems are unable to fully accommodate the shift from directly providing information to humans to indirectly serving large language models. In this paper, we propose Self-Retrieval, an end-to-end, LLM-driven information retrieval architecture that can fully internalize the required abilities of IR systems into a single LLM and deeply leverage the capabilities of LLMs during IR process. Specifically, Self-retrieval internalizes the corpus to retrieve into a LLM via a natural language indexing architecture. Then the entire retrieval process is redefined as a procedure of document generation and self-assessment, which can be end-to-end executed using a single large language model. Experimental results demonstrate that Self-Retrieval not only significantly outperforms previous retrieval approaches by a large margin, but also can significantly boost the performance of LLM-driven downstream applications like retrieval augumented generation.
Published: 2024

9. Preference Ranking Optimization for Human Alignment

Author: Song, Feifan, Yu, Bowen, Li, Minghao, Yu, Haiyang, Huang, Fei, Li, Yongbin, and Wang, Houfeng
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) often contain misleading content, emphasizing the need to align them with human values to ensure secure AI systems. Reinforcement learning from human feedback (RLHF) has been employed to achieve this alignment. However, it encompasses two main drawbacks: (1) RLHF exhibits complexity, instability, and sensitivity to hyperparameters in contrast to SFT. (2) Despite massive trial-and-error, multiple sampling is reduced to pair-wise contrast, thus lacking contrasts from a macro perspective. In this paper, we propose Preference Ranking Optimization (PRO) as an efficient SFT algorithm to directly fine-tune LLMs for human alignment. PRO extends the pair-wise contrast to accommodate preference rankings of any length. By iteratively contrasting candidates, PRO instructs the LLM to prioritize the best response while progressively ranking the rest responses. In this manner, PRO effectively transforms human alignment into aligning the probability ranking of n responses generated by LLM with the preference ranking of humans towards these responses. Experiments have shown that PRO outperforms baseline algorithms, achieving comparable results to ChatGPT and human responses through automatic-based, reward-based, GPT-4, and human evaluations., Comment: Accepted by AAAI 2024
Published: 2023

10. Domain Incremental Lifelong Learning in an Open World

Author: Dai, Yi, Lang, Hao, Zheng, Yinhe, Yu, Bowen, Huang, Fei, and Li, Yongbin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Lifelong learning (LL) is an important ability for NLP models to learn new tasks continuously. Architecture-based approaches are reported to be effective implementations for LL models. However, it is non-trivial to extend previous approaches to domain incremental LL scenarios since they either require access to task identities in the testing phase or cannot handle samples from unseen tasks. In this paper, we propose \textbf{Diana}: a \underline{d}ynam\underline{i}c \underline{a}rchitecture-based lifelo\underline{n}g le\underline{a}rning model that tries to learn a sequence of tasks with a prompt-enhanced language model. Four types of hierarchically organized prompts are used in Diana to capture knowledge from different granularities. Specifically, we dedicate task-level prompts to capture task-specific knowledge to retain high LL performances and maintain instance-level prompts to learn knowledge shared across input samples to improve the model's generalization performance. Moreover, we dedicate separate prompts to explicitly model unseen tasks and introduce a set of prompt key vectors to facilitate knowledge sharing between tasks. Extensive experiments demonstrate that Diana outperforms state-of-the-art LL models, especially in handling unseen tasks. We release the code and data at \url{https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/diana}., Comment: ACL2023 Findings Long Paper. arXiv admin note: substantial text overlap with arXiv:2208.14602
Published: 2023

11. ID-MixGCL: Identity Mixup for Graph Contrastive Learning

Author: Zhang, Gehang, Yu, Bowen, Cao, Jiangxia, Zhang, Xinghua, Sheng, Jiawei, Zhou, Chuan, and Liu, Tingwen
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Graph contrastive learning (GCL) has recently achieved substantial advancements. Existing GCL approaches compare two different ``views'' of the same graph in order to learn node/graph representations. The underlying assumption of these studies is that the graph augmentation strategy is capable of generating several different graph views such that the graph views are structurally different but semantically similar to the original graphs, and thus the ground-truth labels of the original and augmented graph/nodes can be regarded identical in contrastive learning. However, we observe that this assumption does not always hold. For instance, the deletion of a super-node within a social network can exert a substantial influence on the partitioning of communities for other nodes. Similarly, any perturbation to nodes or edges in a molecular graph will change the labels of the graph. Therefore, we believe that augmenting the graph, accompanied by an adaptation of the labels used for the contrastive loss, will facilitate the encoder to learn a better representation. Based on this idea, we propose ID-MixGCL, which allows the simultaneous interpolation of input nodes and corresponding identity labels to obtain soft-confidence samples, with a controllable degree of change, leading to the capture of fine-grained representations from self-supervised training on unlabeled graphs. Experimental results demonstrate that ID-MixGCL improves performance on graph classification and node classification tasks, as demonstrated by significant improvements on the Cora, IMDB-B, IMDB-M, and PROTEINS datasets compared to state-of-the-art techniques, by 3-29% absolute points., Comment: 10 pages, 7 figures, accepted by IEEE BigData 2023
Published: 2023

12. API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

Author: Li, Minghao, Zhao, Yingxiu, Yu, Bowen, Song, Feifan, Li, Hangyu, Yu, Haiyang, Li, Zhoujun, Huang, Fei, and Li, Yongbin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Recent research has demonstrated that Large Language Models (LLMs) can enhance their capabilities by utilizing external tools. However, three pivotal questions remain unanswered: (1) How effective are current LLMs in utilizing tools? (2) How can we enhance LLMs' ability to utilize tools? (3) What obstacles need to be overcome to leverage tools? To address these questions, we introduce API-Bank, a groundbreaking benchmark, specifically designed for tool-augmented LLMs. For the first question, we develop a runnable evaluation system consisting of 73 API tools. We annotate 314 tool-use dialogues with 753 API calls to assess the existing LLMs' capabilities in planning, retrieving, and calling APIs. For the second question, we construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains. Using this dataset, we train Lynx, a tool-augmented LLM initialized from Alpaca. Experimental results demonstrate that GPT-3.5 exhibits improved tool utilization compared to GPT-3, while GPT-4 excels in planning. However, there is still significant potential for further improvement. Moreover, Lynx surpasses Alpaca's tool utilization performance by more than 26 pts and approaches the effectiveness of GPT-3.5. Through error analysis, we highlight the key challenges for future research in this field to answer the third question., Comment: EMNLP 2023
Published: 2023

13. Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong Learning in Task-Oriented Dialogue

Author: Zhao, Yingxiu, Zheng, Yinhe, Tian, Zhiliang, Gao, Chang, Yu, Bowen, Yu, Haiyang, Li, Yongbin, Sun, Jian, and Zhang, Nevin L.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Lifelong learning (LL) is vital for advanced task-oriented dialogue (ToD) systems. To address the catastrophic forgetting issue of LL, generative replay methods are widely employed to consolidate past knowledge with generated pseudo samples. However, most existing generative replay methods use only a single task-specific token to control their models. This scheme is usually not strong enough to constrain the generative model due to insufficient information involved. In this paper, we propose a novel method, prompt conditioned VAE for lifelong learning (PCLL), to enhance generative replay by incorporating tasks' statistics. PCLL captures task-specific distributions with a conditional variational autoencoder, conditioned on natural language prompts to guide the pseudo-sample generation. Moreover, it leverages a distillation process to further consolidate past knowledge by alleviating the noise in pseudo samples. Experiments on natural language understanding tasks of ToD systems demonstrate that PCLL significantly outperforms competitive baselines in building LL models., Comment: EMNLP2022 Long Paper (Main Track)
Published: 2022

14. Profiling and Evolution of Intellectual Property

Author: Yu, Bowen, Shao, Yingxia, and Li, Ang
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: In recent years, with the rapid growth of Internet data, the number and types of scientific and technological resources are also rapidly expanding. However, the increase in the number and category of information data will also increase the cost of information acquisition. For technology-based enterprises or users, in addition to general papers, patents, etc., policies related to technology or the development of their industries should also belong to a type of scientific and technological resources. The cost and difficulty of acquiring users. Extracting valuable science and technology policy resources from a huge amount of data with mixed contents and providing accurate and fast retrieval will help to break down information barriers and reduce the cost of information acquisition, which has profound social significance and social utility. This article focuses on the difficulties and problems in the field of science and technology policy, and introduces related technologies and developments., Comment: There are some problems in the conclusions and analysis of this paper, and we need to withdraw this paper
Published: 2022

15. Web Page Content Extraction Based on Multi-feature Fusion

Author: Yu, Bowen, Du, Junping, and Shao, Yingxia
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence
Abstract: With the rapid development of Internet technology, people have more and more access to a variety of web page resources. At the same time, the current rapid development of deep learning technology is often inseparable from the huge amount of Web data resources. On the other hand, NLP is also an important part of data processing technology, such as web page data extraction. At present, the extraction technology of web page text mainly uses a single heuristic function or strategy, and most of them need to determine the threshold manually. With the rapid growth of the number and types of web resources, there are still problems to be solved when using a single strategy to extract the text information of different pages. This paper proposes a web page text extraction algorithm based on multi-feature fusion. According to the text information characteristics of web resources, DOM nodes are used as the extraction unit to design multiple statistical features, and high-order features are designed according to heuristic strategies. This method establishes a small neural network, takes multiple features of DOM nodes as input, predicts whether the nodes contain text information, makes full use of different statistical information and extraction strategies, and adapts to more types of pages. Experimental results show that this method has a good ability of web page text extraction and avoids the problem of manually determining the threshold., Comment: Part of the content is for other purposes
Published: 2022

16. Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

Author: Cranshaw, Justin, Elwany, Emad, Newman, Todd, Kocielnik, Rafal, Yu, Bowen, Soni, Sandeep, Teevan, Jaime, and Monroy-Hernández, Andrés
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Although information workers may complain about meetings, they are an essential part of their work life. Consequently, busy people spend a significant amount of time scheduling meetings. We present Calendar.help, a system that provides fast, efficient scheduling through structured workflows. Users interact with the system via email, delegating their scheduling needs to the system as if it were a human personal assistant. Common scheduling scenarios are broken down using well-defined workflows and completed as a series of microtasks that are automated when possible and executed by a human otherwise. Unusual scenarios fall back to a trained human assistant who executes them as unstructured macrotasks. We describe the iterative approach we used to develop Calendar.help, and share the lessons learned from scheduling thousands of meetings during a year of real-world deployments. Our findings provide insight into how complex information tasks can be broken down into repeatable components that can be executed efficiently to improve productivity., Comment: 10 pages
Published: 2017
Full Text: View/download PDF

17. API-Bank: A Benchmark for Tool-Augmented LLMs

Author: Li, Minghao, Song, Feifan, Yu, Bowen, Yu, Haiyang, Li, Zhoujun, Huang, Fei, and Li, Yongbin
Subjects: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computation and Language (cs.CL)
Abstract: Recent research has shown that Large Language Models (LLMs) can utilize external tools to improve their contextual processing abilities, moving away from the pure language modeling paradigm and paving the way for Artificial General Intelligence. Despite this, there has been a lack of systematic evaluation to demonstrate the efficacy of LLMs using tools to respond to human instructions. This paper presents API-Bank, the first benchmark tailored for Tool-Augmented LLMs. API-Bank includes 53 commonly used API tools, a complete Tool-Augmented LLM workflow, and 264 annotated dialogues that encompass a total of 568 API calls. These resources have been designed to thoroughly evaluate LLMs' ability to plan step-by-step API calls, retrieve relevant APIs, and correctly execute API calls to meet human needs. The experimental results show that GPT-3.5 emerges the ability to use the tools relative to GPT3, while GPT-4 has stronger planning performance. Nevertheless, there remains considerable scope for further improvement when compared to human performance. Additionally, detailed error analysis and case studies demonstrate the feasibility of Tool-Augmented LLMs for daily use, as well as the primary challenges that future research needs to address.
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"Yu, Bowen"'

1. Rethinking Data Selection at Scale: Random Selection is Almost All You Need

2. Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

3. Qwen2 Technical Report

4. Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

5. Towards Scalable Automated Alignment of LLMs: A Survey

6. Language Models can Evaluate Themselves via Probability Discrepancy

7. Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment

8. Self-Retrieval: Building an Information Retrieval System with One Large Language Model

9. Preference Ranking Optimization for Human Alignment

10. Domain Incremental Lifelong Learning in an Open World

11. ID-MixGCL: Identity Mixup for Graph Contrastive Learning

12. API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

13. Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong Learning in Task-Oriented Dialogue

14. Profiling and Evolution of Intellectual Property

15. Web Page Content Extraction Based on Multi-feature Fusion

16. Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

17. API-Bank: A Benchmark for Tool-Augmented LLMs

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

17 results on '"Yu, Bowen"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources