Author: "Liu, Zhengying" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Liu, Zhengying"' showing total 374 results

Start Over Author "Liu, Zhengying"

374 results on '"Liu, Zhengying"'

1. FormalAlign: Automated Alignment Evaluation for Autoformalization

Author: Lu, Jianqiao, Wan, Yingjia, Huang, Yinya, Xiong, Jing, Liu, Zhengying, and Guo, Zhijiang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Formal Languages and Automata Theory, Computer Science - Machine Learning
Abstract: Autoformalization aims to convert informal mathematical proofs into machine-verifiable formats, bridging the gap between natural and formal languages. However, ensuring semantic alignment between the informal and formalized statements remains challenging. Existing approaches heavily rely on manual verification, hindering scalability. To address this, we introduce \textsc{FormalAlign}, the first automated framework designed for evaluating the alignment between natural and formal languages in autoformalization. \textsc{FormalAlign} trains on both the autoformalization sequence generation task and the representational alignment between input and output, employing a dual loss that combines a pair of mutually enhancing autoformalization and alignment tasks. Evaluated across four benchmarks augmented by our proposed misalignment strategies, \textsc{FormalAlign} demonstrates superior performance. In our experiments, \textsc{FormalAlign} outperforms GPT-4, achieving an Alignment-Selection Score 11.58\% higher on \forml-Basic (99.21\% vs. 88.91\%) and 3.19\% higher on MiniF2F-Valid (66.39\% vs. 64.34\%). This effective alignment evaluation significantly reduces the need for manual verification. Both the dataset and code can be accessed via~\url{https://github.com/rookie-joe/FormalAlign}., Comment: 23 pages, 13 tables, 3 figures
Published: 2024

2. ToolACE: Winning the Points of LLM Function Calling

Author: Liu, Weiwen, Huang, Xu, Zeng, Xingshan, Hao, Xinlong, Yu, Shuai, Li, Dexun, Wang, Shuai, Gan, Weinan, Liu, Zhengying, Yu, Yuanqing, Wang, Zezhong, Wang, Yuxian, Ning, Wu, Hou, Yutai, Wang, Bin, Wu, Chuhan, Wang, Xinzhi, Liu, Yong, Wang, Yasheng, Tang, Duyu, Tu, Dandan, Shang, Lifeng, Jiang, Xin, Tang, Ruiming, Lian, Defu, Liu, Qun, and Chen, Enhong
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE., Comment: 21 pages, 22 figures
Published: 2024

3. FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

Author: Lin, Xiaohan, Cao, Qingxing, Huang, Yinya, Wang, Haiming, Lu, Jianqiao, Liu, Zhengying, Song, Linqi, and Liang, Xiaodan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as another line of rigorous verification, are maintained with comprehensive rules and theorems. In this paper, we propose FVEL, an interactive Formal Verification Environment with LLMs. Specifically, FVEL transforms a given code to be verified into Isabelle, and then conducts verification via neural automated theorem proving with an LLM. The joined paradigm leverages the rigorous yet abundant formulated and organized rules in Isabelle and is also convenient for introducing and adjusting cutting-edge LLMs. To achieve this goal, we extract a large-scale FVELER3. The FVELER dataset includes code dependencies and verification processes that are formulated in Isabelle, containing 758 theories, 29,125 lemmas, and 200,646 proof steps in total with in-depth dependencies. We benchmark FVELER in the FVEL environment by first fine-tuning LLMs with FVELER and then evaluating them on Code2Inv and SV-COMP. The results show that FVEL with FVELER fine-tuned Llama3- 8B solves 17.39% (69 -> 81) more problems, and Mistral-7B 12% (75 -> 84) more problems in SV-COMP. And the proportion of proof errors is reduced. Project page: https://fveler.github.io/.
Published: 2024

4. Process-Driven Autoformalization in Lean 4

Author: Lu, Jianqiao, Wan, Yingjia, Liu, Zhengying, Huang, Yinya, Xiong, Jing, Liu, Chengwu, Shen, Jianhao, Jin, Hui, Zhang, Jipeng, Wang, Haiming, Yang, Zhicheng, Tang, Jing, and Guo, Zhijiang
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Logic in Computer Science
Abstract: Autoformalization, the conversion of natural language mathematics into formal languages, offers significant potential for advancing mathematical reasoning. However, existing efforts are limited to formal languages with substantial online corpora and struggle to keep pace with rapidly evolving languages like Lean 4. To bridge this gap, we propose a new benchmark \textbf{Form}alization for \textbf{L}ean~\textbf{4} (\textbf{\name}) designed to evaluate the autoformalization capabilities of large language models (LLMs). This benchmark encompasses a comprehensive assessment of questions, answers, formal statements, and proofs. Additionally, we introduce a \textbf{P}rocess-\textbf{S}upervised \textbf{V}erifier (\textbf{PSV}) model that leverages the precise feedback from Lean 4 compilers to enhance autoformalization. Our experiments demonstrate that the PSV method improves autoformalization, enabling higher accuracy using less filtered training data. Furthermore, when fine-tuned with data containing detailed process information, PSV can leverage the data more effectively, leading to more significant improvements in autoformalization for Lean 4. Our dataset and code are available at \url{https://github.com/rookie-joe/PDA}., Comment: 32 pages, 1 figures, 15 tables
Published: 2024

5. Proving Theorems Recursively

Author: Wang, Haiming, Xin, Huajian, Liu, Zhengying, Li, Wenda, Huang, Yinya, Lu, Jianqiao, Yang, Zhicheng, Tang, Jing, Yin, Jian, Li, Zhenguo, and Liang, Xiaodan
Subjects: Computer Science - Artificial Intelligence
Abstract: Recent advances in automated theorem proving leverages language models to explore expanded search spaces by step-by-step proof generation. However, such approaches are usually based on short-sighted heuristics (e.g., log probability or value function scores) that potentially lead to suboptimal or even distracting subgoals, preventing us from finding longer proofs. To address this challenge, we propose POETRY (PrOvE Theorems RecursivelY), which proves theorems in a recursive, level-by-level manner in the Isabelle theorem prover. Unlike previous step-by-step methods, POETRY searches for a verifiable sketch of the proof at each level and focuses on solving the current level's theorem or conjecture. Detailed proofs of intermediate conjectures within the sketch are temporarily replaced by a placeholder tactic called sorry, deferring their proofs to subsequent levels. This approach allows the theorem to be tackled incrementally by outlining the overall theorem at the first level and then solving the intermediate conjectures at deeper levels. Experiments are conducted on the miniF2F and PISA datasets and significant performance gains are observed in our POETRY approach over state-of-the-art methods. POETRY on miniF2F achieves an average proving success rate improvement of 5.1%. Moreover, we observe a substantial increase in the maximum proof length found by POETRY, from 10 to 26., Comment: 21 pages, 5 figures, 3 tables
Published: 2024

6. ATG: Benchmarking Automated Theorem Generation for Generative Language Models

Author: Lin, Xiaohan, Cao, Qingxing, Huang, Yinya, Yang, Zhicheng, Liu, Zhengying, Li, Zhenguo, and Liang, Xiaodan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Humans can develop new theorems to explore broader and more complex mathematical results. While current generative language models (LMs) have achieved significant improvement in automatically proving theorems, their ability to generate new or reusable theorems is still under-explored. Without the new theorems, current LMs struggle to prove harder theorems that are distant from the given hypotheses with the exponentially growing search space. Therefore, this paper proposes an Automated Theorem Generation (ATG) benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems that are applicable for downstream theorem proving as reusable knowledge. Specifically, we construct the ATG benchmark by splitting the Metamath library into three sets: axioms, library, and problem based on their proving depth. We conduct extensive experiments to investigate whether current LMs can generate theorems in the library and benefit the problem theorems proving. The results demonstrate that high-quality ATG data facilitates models' performances on downstream ATP. However, there is still room for current LMs to develop better ATG and generate more advanced and human-like theorems. We hope the new ATG challenge can shed some light on advanced complex theorem proving.
Published: 2024

7. MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

Author: Huang, Yinya, Lin, Xiaohan, Liu, Zhengying, Cao, Qingxing, Xin, Huajian, Wang, Haiming, Li, Zhenguo, Song, Linqi, and Liang, Xiaodan
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Formal Languages and Automata Theory, Computer Science - Machine Learning, Computer Science - Programming Languages
Abstract: Recent large language models (LLMs) have witnessed significant advancement in various tasks, including mathematical reasoning and theorem proving. As these two tasks require strict and formal multi-step inference, they are appealing domains for exploring the reasoning ability of LLMs but still face important challenges. Previous studies such as Chain-of-Thought (CoT) have revealed the effectiveness of intermediate steps guidance. However, such step-wise annotation requires heavy labor, leading to insufficient training steps for current benchmarks. To fill this gap, this work introduces MUSTARD, a data generation framework that masters uniform synthesis of theorem and proof data of high quality and diversity. MUSTARD synthesizes data in three stages: (1) It samples a few mathematical concept seeds as the problem category. (2) Then, it prompts a generative language model with the sampled concepts to obtain both the problems and their step-wise formal solutions. (3) Lastly, the framework utilizes a proof assistant (e.g., Lean Prover) to filter the valid proofs. With the proposed MUSTARD, we present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points. Each data point contains an informal statement, an informal proof, and a translated formal proof that passes the prover validation. We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data. We further apply the MUSTARDSAUCE for fine-tuning smaller language models. The fine-tuned Llama 2-7B achieves a 15.41% average relative performance gain in automated theorem proving, and 8.18% in math word problems. Codes and data are available at https://github.com/Eleanor-H/MUSTARD.
Published: 2024

8. A Survey of Reasoning with Foundation Models

Author: Sun, Jiankai, Zheng, Chuanyang, Xie, Enze, Liu, Zhengying, Chu, Ruihang, Qiu, Jianing, Xu, Jiaqi, Ding, Mingyu, Li, Hongyang, Geng, Mengzhe, Wu, Yue, Wang, Wenhai, Chen, Junsong, Yin, Zhangyue, Ren, Xiaozhe, Fu, Jie, He, Junxian, Yuan, Wu, Liu, Qi, Liu, Xihui, Li, Yu, Dong, Hao, Cheng, Yu, Zhang, Ming, Heng, Pheng Ann, Dai, Jifeng, Luo, Ping, Wang, Jingdong, Wen, Ji-Rong, Qiu, Xipeng, Guo, Yike, Xiong, Hui, Liu, Qun, and Li, Zhenguo
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI., Comment: 20 Figures, 160 Pages, 750+ References, Project Page https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models
Published: 2023

9. Large Language Models as Automated Aligners for benchmarking Vision-Language Models

Author: Ji, Yuanfeng, Ge, Chongjian, Kong, Weikai, Xie, Enze, Liu, Zhengying, Li, Zhengguo, and Luo, Ping
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: With the advancements in Large Language Models (LLMs), Vision-Language Models (VLMs) have reached a new level of sophistication, showing notable competence in executing intricate cognition and reasoning tasks. However, existing evaluation benchmarks, primarily relying on rigid, hand-crafted datasets to measure task-specific performance, face significant limitations in assessing the alignment of these increasingly anthropomorphic models with human intelligence. In this work, we address the limitations via Auto-Bench, which delves into exploring LLMs as proficient aligners, measuring the alignment between VLMs and human intelligence and value through automatic data curation and assessment. Specifically, for data curation, Auto-Bench utilizes LLMs (e.g., GPT-4) to automatically generate a vast set of question-answer-reasoning triplets via prompting on visual symbolic representations (e.g., captions, object locations, instance relationships, and etc.). The curated data closely matches human intent, owing to the extensive world knowledge embedded in LLMs. Through this pipeline, a total of 28.5K human-verified and 3,504K unfiltered question-answer-reasoning triplets have been curated, covering 4 primary abilities and 16 sub-abilities. We subsequently engage LLMs like GPT-3.5 to serve as judges, implementing the quantitative and qualitative automated assessments to facilitate a comprehensive evaluation of VLMs. Our validation results reveal that LLMs are proficient in both evaluation data curation and model assessment, achieving an average agreement rate of 85%. We envision Auto-Bench as a flexible, scalable, and comprehensive benchmark for evaluating the evolving sophisticated VLMs.
Published: 2023

10. Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Author: Chen, Kai, Wang, Chunwei, Yang, Kuo, Han, Jianhua, Hong, Lanqing, Mi, Fei, Xu, Hang, Liu, Zhengying, Huang, Wenyong, Li, Zhenguo, Yeung, Dit-Yan, Shang, Lifeng, Jiang, Xin, and Liu, Qun
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges. This becomes particularly evident when LLMs inadvertently generate harmful or toxic content, either unintentionally or because of intentional inducement. Existing alignment methods usually direct LLMs toward the favorable outcomes by utilizing human-annotated, flawless instruction-response pairs. Conversely, this study proposes a novel alignment technique based on mistake analysis, which deliberately exposes LLMs to erroneous content to learn the reasons for mistakes and how to avoid them. In this case, mistakes are repurposed into valuable data for alignment, effectively helping to avoid the production of erroneous responses. Without external models or human annotations, our method leverages a model's intrinsic ability to discern undesirable mistakes and improves the safety of its generated responses. Experimental results reveal that our method outperforms existing alignment approaches in enhancing model safety while maintaining the overall utility., Comment: Accepted by ICLR 2024
Published: 2023

11. TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models

Author: Xiong, Jing, Shen, Jianhao, Yuan, Ye, Wang, Haiming, Yin, Yichun, Liu, Zhengying, Li, Lin, Guo, Zhijiang, Cao, Qingxing, Huang, Yinya, Zheng, Chuanyang, Liang, Xiaodan, Zhang, Ming, and Liu, Qun
Subjects: Computer Science - Computation and Language
Abstract: Automated theorem proving (ATP) has become an appealing domain for exploring the reasoning ability of the recent successful generative language models. However, current ATP benchmarks mainly focus on symbolic inference, but rarely involve the understanding of complex number combination reasoning. In this work, we propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas and its capability to manipulate, group, and factor number terms. We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system. We then automatically generate additional examples from the annotated samples to expand the dataset. Furthermore, we develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability. Our extensive experiments show our proposed TRIGO poses a new challenge for advanced generative LM's including GPT-4 which is pre-trained on a considerable amount of open-source formal theorem-proving language data, and provide a new tool to study the generative LM's ability on both formal and mathematical reasoning., Comment: Accepted by EMNLP 2023. Code is available at https://github.com/menik1126/TRIGO
Published: 2023

12. LEGO-Prover: Neural Theorem Proving with Growing Libraries

Author: Wang, Haiming, Xin, Huajian, Zheng, Chuanyang, Li, Lin, Liu, Zhengying, Cao, Qingxing, Huang, Yinya, Xiong, Jing, Shi, Han, Xie, Enze, Yin, Jian, Li, Zhenguo, Liao, Heng, and Liang, Xiaodan
Subjects: Computer Science - Artificial Intelligence
Abstract: Despite the success of large language models (LLMs), the task of theorem proving still remains one of the hardest reasoning tasks that is far from being fully solved. Prior methods using language models have demonstrated promising results, but they still struggle to prove even middle school level theorems. One common limitation of these methods is that they assume a fixed theorem library during the whole theorem proving process. However, as we all know, creating new useful theorems or even new theories is not only helpful but crucial and necessary for advancing mathematics and proving harder and deeper results. In this work, we present LEGO-Prover, which employs a growing skill library containing verified lemmas as skills to augment the capability of LLMs used in theorem proving. By constructing the proof modularly, LEGO-Prover enables LLMs to utilize existing skills retrieved from the library and to create new skills during the proving process. These skills are further evolved (by prompting an LLM) to enrich the library on another scale. Modular and reusable skills are constantly added to the library to enable tackling increasingly intricate mathematical problems. Moreover, the learned library further bridges the gap between human proofs and formal proofs by making it easier to impute missing steps. LEGO-Prover advances the state-of-the-art pass rate on miniF2F-valid (48.0% to 57.0%) and miniF2F-test (45.5% to 47.1%). During the proving process, LEGO-Prover also manages to generate over 20,000 skills (theorems/lemmas) and adds them to the growing library. Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47.1% to 50.4%. We also release our code and all the generated skills.
Published: 2023

13. Lyra: Orchestrating Dual Correction in Automated Theorem Proving

Author: Zheng, Chuanyang, Wang, Haiming, Xie, Enze, Liu, Zhengying, Sun, Jiankai, Xin, Huajian, Shen, Jianhao, Li, Zhenguo, and Li, Yu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) present an intriguing avenue for exploration in the field of formal theorem proving. Nevertheless, their full potential, particularly concerning the mitigation of hallucinations and refinement through prover error messages, remains an area that has yet to be thoroughly investigated. To enhance the effectiveness of LLMs in the field, we introduce the Lyra, a new framework that employs two distinct correction mechanisms: Tool Correction (TC) and Conjecture Correction (CC). To implement Tool Correction in the post-processing of formal proofs, we leverage prior knowledge to utilize predefined prover tools (e.g., Sledgehammer) for guiding the replacement of incorrect tools. Tool Correction significantly contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof. In addition, we introduce Conjecture Correction, an error feedback mechanism designed to interact with prover to refine formal proof conjectures with prover error messages. Compared to the previous refinement framework, the proposed Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts. Our method has achieved state-of-the-art (SOTA) performance on both miniF2F validation (48.0% -> 55.3%) and test (45.5% -> 51.2%). We also present 3 IMO problems solved by Lyra. We believe Tool Correction (post-process for hallucination mitigation) and Conjecture Correction (subgoal adjustment from interaction with environment) could provide a promising avenue for future research in this field., Comment: Accepted to TMLR: https://openreview.net/forum?id=9Z0yB8rmQ2
Published: 2023

14. MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Author: Yu, Longhui, Jiang, Weisen, Shi, Han, Yu, Jincheng, Liu, Zhengying, Zhang, Yu, Kwok, James T., Li, Zhenguo, Weller, Adrian, and Liu, Weiyang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from satisfactory for solving mathematical problem due to the complex reasoning procedures. To bridge this gap, we propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge, which results in a new dataset called MetaMathQA. Then we fine-tune the LLaMA-2 models on MetaMathQA. Experimental results on two popular benchmarks (i.e., GSM8K and MATH) for mathematical reasoning demonstrate that MetaMath outperforms a suite of open-source LLMs by a significant margin. Our MetaMath-7B model achieves 66.4% on GSM8K and 19.4% on MATH, exceeding the state-of-the-art models of the same size by 11.5% and 8.7%. Particularly, MetaMath-70B achieves an accuracy of 82.3% on GSM8K, slightly better than GPT-3.5-Turbo. We release all the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use., Comment: To appear at ICLR 2024 (Spotlight). Project Page: https://meta-math.github.io/
Published: 2023

15. FIMO: A Challenge Formal Dataset for Automated Theorem Proving

Author: Liu, Chengwu, Shen, Jianhao, Xin, Huajian, Liu, Zhengying, Yuan, Ye, Wang, Haiming, Ju, Wei, Zheng, Chuanyang, Yin, Yichun, Li, Lin, Zhang, Ming, and Liu, Qun
Subjects: Computer Science - Artificial Intelligence
Abstract: We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems. Designed to facilitate advanced automated theorem proving at the IMO level, FIMO is currently tailored for the Lean formal language. It comprises 149 formal problem statements, accompanied by both informal problem descriptions and their corresponding LaTeX-based informal proofs. Through initial experiments involving GPT-4, our findings underscore the existing limitations in current methodologies, indicating a substantial journey ahead before achieving satisfactory IMO-level automated theorem proving outcomes., Comment: Added a hyperlink to the dataset made accessible on GitHub
Published: 2023

16. Forward-Backward Reasoning in Large Language Models for Mathematical Verification

Author: Jiang, Weisen, Shi, Han, Yu, Longhui, Liu, Zhengying, Zhang, Yu, Li, Zhenguo, and Kwok, James T.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Self-Consistency samples diverse reasoning chains with answers and chooses the final answer by majority voting. It is based on forward reasoning and cannot further improve performance by sampling more reasoning chains when saturated. To further boost performance, we introduce backward reasoning to verify candidate answers. Specifically, for mathematical tasks, we mask a number in the question and ask the LLM to answer a backward question created by a simple template, i.e., to predict the masked number when a candidate answer is provided. Instead of using forward or backward reasoning alone, we propose FOBAR to combine FOrward and BAckward Reasoning for verification. Extensive experiments on six standard mathematical data sets and three LLMs show that FOBAR achieves state-of-the-art performance. In particular, FOBAR outperforms Self-Consistency, which uses forward reasoning alone, demonstrating that combining forward and forward reasoning is better. In addition, FOBAR performs better than existing verification methods, showing the effectiveness of the simple template used in backward reasoning and the proposed combination. Extensions to non-mathematical problems are also discussed and validated empirically., Comment: Accepted by Findings of ACL 2024
Published: 2023

17. Progressive-Hint Prompting Improves Reasoning in Large Language Models

Author: Zheng, Chuanyang, Liu, Zhengying, Xie, Enze, Li, Zhenguo, and Li, Yu
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (89.1% -> 91.9%), GSM8K (92% -> 95.5%), AQuA (76.4% -> 79.9%) and MATH (50.3% -> 53.9%)., Comment: Accepted to ICML AI4MATH 2024
Published: 2023

18. Learning to Prove Trigonometric Identities

Author: Liu, Zhou, Li, Yujun, Liu, Zhengying, Li, Lin, and Li, Zhenguo
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Automatic theorem proving with deep learning methods has attracted attentions recently. In this paper, we construct an automatic proof system for trigonometric identities. We define the normalized form of trigonometric identities, design a set of rules for the proof and put forward a method which can generate theoretically infinite trigonometric identities. Our goal is not only to complete the proof, but to complete the proof in as few steps as possible. For this reason, we design a model to learn proof data generated by random BFS (rBFS), and it is proved theoretically and experimentally that the model can outperform rBFS after a simple imitation learning. After further improvement through reinforcement learning, we get AutoTrig, which can give proof steps for identities in almost as short steps as BFS (theoretically shortest method), with a time cost of only one-thousandth. In addition, AutoTrig also beats Sympy, Matlab and human in the synthetic dataset, and performs well in many generalization tasks.
Published: 2022

19. Lessons learned from the NeurIPS 2021 MetaDL challenge: Backbone fine-tuning without episodic meta-learning dominates for few-shot learning image classification

Author: Baz, Adrian El, Ullah, Ihsan, Alcobaça, Edesio, Carvalho, André C. P. L. F., Chen, Hong, Ferreira, Fabio, Gouk, Henry, Guan, Chaoyu, Guyon, Isabelle, Hospedales, Timothy, Hu, Shell, Huisman, Mike, Hutter, Frank, Liu, Zhengying, Mohr, Felix, Öztürk, Ekrem, van Rijn, Jan N., Sun, Haozhe, Wang, Xin, and Zhu, Wenwu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Neural and Evolutionary Computing
Abstract: Although deep neural networks are capable of achieving performance superior to humans on various tasks, they are notorious for requiring large amounts of data and computing resources, restricting their success to domains where such resources are available. Metalearning methods can address this problem by transferring knowledge from related tasks, thus reducing the amount of data and computing resources needed to learn new tasks. We organize the MetaDL competition series, which provide opportunities for research groups all over the world to create and experimentally assess new meta-(deep)learning solutions for real problems. In this paper, authored collaboratively between the competition organizers and the top-ranked participants, we describe the design of the competition, the datasets, the best experimental results, as well as the top-ranked methods in the NeurIPS 2021 challenge, which attracted 15 active teams who made it to the final phase (by outperforming the baseline), making over 100 code submissions during the feedback phase. The solutions of the top participants have been open-sourced. The lessons learned include that learning good representations is essential for effective transfer learning., Comment: version 2 is the correct version, including supplementary material at the end
Published: 2022

20. Advances in MetaDL: AAAI 2021 challenge and workshop

Author: Baz, Adrian El, Guyon, Isabelle, Liu, Zhengying, van Rijn, Jan, Treguer, Sebastien, and Vanschoren, Joaquin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: To stimulate advances in metalearning using deep learning techniques (MetaDL), we organized in 2021 a challenge and an associated workshop. This paper presents the design of the challenge and its results, and summarizes presentations made at the workshop. The challenge focused on few-shot learning classification tasks of small images. Participants' code submissions were run in a uniform manner, under tight computational constraints. This put pressure on solution designs to use existing architecture backbones and/or pre-trained networks. Winning methods featured various classifiers trained on top of the second last layer of popular CNN backbones, fined-tuned on the meta-training data (not necessarily in an episodic manner), then trained on the labeled support and tested on the unlabeled query sets of the meta-test data., Comment: Proceedings of Machine Learning Research, PMLR, 2021
Published: 2022

21. Winning solutions and post-challenge analyses of the ChaLearn AutoDL challenge 2019

Author: Liu, Zhengying, Pavao, Adrien, Xu, Zhen, Escalera, Sergio, Ferreira, Fabio, Guyon, Isabelle, Hong, Sirui, Hutter, Frank, Ji, Rongrong, Junior, Julio C. S. Jacques, Li, Ge, Lindauer, Marius, Luo, Zhipeng, Madadi, Meysam, Nierhoff, Thomas, Niu, Kangning, Pan, Chunguang, Stoll, Danny, Treguer, Sebastien, Wang, Jin, Wang, Peng, Wu, Chenglin, Xiong, Youcheng, Zela, Arbe r, and Zhang, Yang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series, which helped sorting out a profusion of AutoML solutions for Deep Learning (DL) that had been introduced in a variety of settings, but lacked fair comparisons. All input data modalities (time series, images, videos, text, tabular) were formatted as tensors and all tasks were multi-label classification problems. Code submissions were executed on hidden tasks, with limited time and computational resources, pushing solutions that get results quickly. In this setting, DL methods dominated, though popular Neural Architecture Search (NAS) was impractical. Solutions relied on fine-tuned pre-trained networks, with architectures matching data modality. Post-challenge tests did not reveal improvements beyond the imposed time limit. While no component is particularly original or novel, a high level modular organization emerged featuring a "meta-learner", "data ingestor", "model selector", "model/learner", and "evaluator". This modularity enabled ablation studies, which revealed the importance of (off-platform) meta-learning, ensembling, and efficient data management. Experiments on heterogeneous module combinations further confirm the (local) optimality of the winning solutions. Our challenge legacy includes an ever-lasting benchmark (http://autodl.chalearn.org), the open-sourced code of the winners, and a free "AutoDL self-service"., Comment: The first three authors contributed equally; This is only a draft version
Published: 2022

22. Exposure to 6-PPD quinone causes ferroptosis activation associated with induction of reproductive toxicity in Caenorhabditis elegans

Author: Liu, Zhengying, Bian, Qian, and Wang, Dayong
Published: 2024
Full Text: View/download PDF

23. Reconceptualising transport-related social exclusion in rural China

Author: Liu, Qiyang, Ma, Tianyu, and Liu, Zhengying
Published: 2024
Full Text: View/download PDF

24. Uncovering spatial and social gaps in rural mobility via mobile phone big data

Author: Liu, Zhengying, Zhao, Pengjun, Liu, Qiyang, He, Zhangyuan, and Kang, Tingting
Published: 2023
Full Text: View/download PDF

25. Polystyrene nanoparticles strengthen high glucose toxicity associated with alteration in insulin signaling pathway in C. elegans

Author: Zhuang, Ziheng, Liu, Tianwen, Liu, Zhengying, and Wang, Dayong
Published: 2024
Full Text: View/download PDF

26. Polyethylene nanoplastics cause reproductive toxicity associated with activation of both estrogenic hormone receptor NHR-14 and DNA damage checkpoints in C. elegans

Author: Liu, Zhengying, Hua, Xin, Zhao, Yue, Bian, Qian, and Wang, Dayong
Published: 2024
Full Text: View/download PDF

27. AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with Autotuned Data-Parallel Training for Tabular Data

Author: Egele, Romain, Balaprakash, Prasanna, Vishwanath, Venkatram, Guyon, Isabelle, and Liu, Zhengying
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Statistics - Machine Learning
Abstract: Developing high-performing predictive models for large tabular data sets is a challenging task. The state-of-the-art methods are based on expert-developed model ensembles from different supervised learning methods. Recently, automated machine learning (AutoML) is emerging as a promising approach to automate predictive model development. Neural architecture search (NAS) is an AutoML approach that generates and evaluates multiple neural network architectures concurrently and improves the accuracy of the generated models iteratively. A key issue in NAS, particularly for large data sets, is the large computation time required to evaluate each generated architecture. While data-parallel training is a promising approach that can address this issue, its use within NAS is difficult. For different data sets, the data-parallel training settings such as the number of parallel processes, learning rate, and batch size need to be adapted to achieve high accuracy and reduction in training time. To that end, we have developed AgEBO-Tabular, an approach to combine aging evolution (AgE), a parallel NAS method that searches over neural architecture space, and an asynchronous Bayesian optimization method for tuning the hyperparameters of the data-parallel training simultaneously. We demonstrate the efficacy of the proposed method to generate high-performing neural network models for large tabular benchmark data sets. Furthermore, we demonstrate that the automatically discovered neural network models using our method outperform the state-of-the-art AutoML ensemble models in inference speed by two orders of magnitude while reaching similar accuracy values.
Published: 2020

28. A facile and scalable approach to tear-and-use polyethylene (PE) tape with adjustable hydrophobicity for water transferring

Author: Huang, Yanhao, Chen, Libo, Zhang, Ruiyan, Liu, Lei, Liu, Zhengying, Yang, Wei, Wang, Feng, and Yang, Mingbo
Published: 2023
Full Text: View/download PDF

29. Exploring the spatial characteristics of the human mobility network in rural settings of China's Greater Bay Area

Author: Liu, Zhengying, Zhao, Pengjun, Liu, Qiyang, Cui, Yanzhe, Yang, Yuan, Liu, Juan, Li, Buhui, and Li, Jingwei
Published: 2023
Full Text: View/download PDF

30. LEAP nets for power grid perturbations

Author: Donnot, Benjamin, Donon, Balthazar, Guyon, Isabelle, Liu, Zhengying, Marot, Antoine, Panciatici, Patrick, and Schoenauer, Marc
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We propose a novel neural network embedding approach to model power transmission grids, in which high voltage lines are disconnected and reconnected with one-another from time to time, either accidentally or willfully. We call our architeture LEAP net, for Latent Encoding of Atypical Perturbation. Our method implements a form of transfer learning, permitting to train on a few source domains, then generalize to new target domains, without learning on any example of that domain. We evaluate the viability of this technique to rapidly assess cu-rative actions that human operators take in emergency situations, using real historical data, from the French high voltage power grid.
Published: 2019

31. Effect of molecular chain mobility on the electrothermal stability of polyethylene/multi-walled carbon nanotube electrothermal composites

Author: Du, Tianlong, Zhang, Ganghong, Bao, Ruiying, Chen, Jun, Liu, Zhengying, and Yang, Wei
Published: 2023
Full Text: View/download PDF

32. A modal shift due to a free within-destination tourist bus scheme: Multimodality and transport equity implications

Author: Liu, Qiyang, Liu, Zhengying, An, Zihao, Zhao, Pengjun, and Zhao, Dongyi
Published: 2023
Full Text: View/download PDF

33. The effects of urban land use on energy-related CO2 emissions in China

Author: Kang, Tingting, Wang, Han, He, Zhangyuan, Liu, Zhengying, Ren, Yang, and Zhao, Pengjun
Published: 2023
Full Text: View/download PDF

34. The living environment and intravillage activity-travel: A conceptual framework based on participant observation in Guangdong, China

Author: Liu, Qiyang, Liu, Zhengying, Yu, Zhao, and Zhao, Pengjun
Published: 2023
Full Text: View/download PDF

35. Tailoring crosslinking networks to fabricate photocurable polyurethane acrylate (PUA) dielectric elastomer with balanced electromechanical performance

Author: Xiong, Lulu, Li, Delong, Yang, Yongfei, Ye, Xiaoxiao, Huang, Yu, Xu, E., Xia, Chuanhui, Yang, Mingbo, Liu, Zhengying, Cui, Xudong, Wang, Feng, and Huang, Yanhao
Published: 2023
Full Text: View/download PDF

36. Transgenerational Response of Germline Nuclear Hormone Receptor Genes to Nanoplastics at Predicted Environmental Doses in Caenorhabditis elegans

Author: Liu, Zhengying, primary, Wang, Yuxing, additional, Bian, Qian, additional, and Wang, Dayong, additional
Published: 2024
Full Text: View/download PDF

37. Transport inequities through the lens of environmental racism: Rural-urban migrants under Covid-19

Author: Liu, Qiyang, Liu, Zhengying, Kang, Tingting, Zhu, Le, and Zhao, Pengjun
Published: 2022
Full Text: View/download PDF

38. Perceived accessibility and mental health consequences of COVID-19 containment policies

Author: Liu, Qiyang, Liu, Zhengying, Lin, Siyi, and Zhao, Pengjun
Published: 2022
Full Text: View/download PDF

39. Study on the improved electromechanical properties of composited dielectric elastomer by tailoring three-dimensional segregated multi-walled carbon nanotube (MWCNT) network

Author: Huang, Yanhao, Xu, Zewang, Shi, Xiaohui, Zheng, Shaodi, Wu, Xiaotian, Liu, Zhengying, Bao, Ruiying, Yang, Wei, and Yang, Mingbo
Published: 2022
Full Text: View/download PDF

40. Faster and better: A polymeric chaperone binder for microenvironment management in thick battery electrodes

Author: Jing, Lei, Ji, Yuan, Feng, Lanxiang, Fu, Xuewei, He, Xuewei, He, Yan, Zhu, Zhiwei, Sun, Xiaorong, Liu, Zhengying, Yang, Mingbo, Yang, Wei, and Wang, Yu
Published: 2022
Full Text: View/download PDF

41. Investigating access to periodic markets in rural China

Author: Liu, Zhengying, Liu, Qiyang, Zhao, Pengjun, Tang, Junqing, and Gong, Zhaoya
Published: 2022
Full Text: View/download PDF

42. Construction of dual conductive network in paper-based composites towards flexible degradable dual-mode sensor

Author: Zheng, Shaodi, Du, Ronghuan, Wang, Ning, Cao, Minghui, Zhang, Yunxiu, Jiang, Yuanping, Liu, Zhengying, Yang, Wei, Yang, Mingbo, and Xia, Xiaochao
Published: 2021
Full Text: View/download PDF

43. Shaping urban form for solar energy self-sufficiency city

Author: Zhao, Pengjun, primary, Jin, Yanxiu, additional, Zhang, Haoran, additional, Liu, Zhaoru, additional, Yu, Qing, additional, Liu, Zhengying, additional, Guo, Zhiling, additional, Yan, Da, additional, Shibasaki, Ryosuke, additional, and Yan, Jinyue, additional
Published: 2024
Full Text: View/download PDF

44. Optimization of the Thermally Conductive Low-k Polymer Dielectrics Based on Multisource Free-Volume Effects

Author: Xu, Weidi, primary, Wang, Ziyang, additional, Cao, Hong, additional, Zhou, Ling, additional, Jiang, Niu, additional, Ke, Kai, additional, Liu, Zhengying, additional, Yang, Wei, additional, and Yang, Mingbo, additional
Published: 2024
Full Text: View/download PDF

45. Heterogeneity in physical activity participation of older adults: A latent class analysis

Author: Liu, Zhengying, Kemperman, Astrid, Timmermans, Harry, and Yang, Dongfeng
Published: 2021
Full Text: View/download PDF

46. Electrolyte permeation and ion diffusion enhanced architectures for high performance all-solid-state flexible supercapacitors

Author: Wu, Xiaotian, Zheng, Shaodi, Huang, Yanhao, Xu, Zewang, Liu, Zhengying, Yang, Wei, and Yang, Mingbo
Published: 2021
Full Text: View/download PDF

47. Highly sensitive pressure sensor with broad linearity via constructing a hollow structure in polyaniline/polydimethylsiloxane composite

Author: Zheng, Shaodi, Jiang, Yuanping, Wu, Xiaotian, Xu, Zewang, Liu, Zhengying, Yang, Wei, and Yang, Mingbo
Published: 2021
Full Text: View/download PDF

48. Correlates of frequency of outdoor activities of older adults: Empirical evidence from Dalian, China

Author: Liu, Zhengying, Kemperman, Astrid, and Timmermans, Harry
Published: 2021
Full Text: View/download PDF

49. MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Author: Yu, Longhui, Jiang, Weisen, Shi, Han, Yu, Jincheng, Liu, Zhengying, Zhang, Yu, Kwok, Tin Yau, Li, Zhenguo, Weller, Adrian, Liu, Weiyang, Yu, Longhui, Jiang, Weisen, Shi, Han, Yu, Jincheng, Liu, Zhengying, Zhang, Yu, Kwok, Tin Yau, Li, Zhenguo, Weller, Adrian, and Liu, Weiyang
Published: 2024

50. Construction of “core–shell” structure for improved thermal conductivity and mechanical properties of polyamide 6 composites

Author: Liu, Renpeng, Han, Hui, Wu, Xiaotian, Liu, Zhengying, Yang, Wei, and Yang, Mingbo
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

374 results on '"Liu, Zhengying"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources