Back to Search Start Over

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

Authors :
Gao, Bofei
Cai, Zefan
Xu, Runxin
Wang, Peiyi
Zheng, Ce
Lin, Runji
Lu, Keming
Liu, Dayiheng
Zhou, Chang
Xiao, Wen
Hu, Junjie
Liu, Tianyu
Chang, Baobao
Publication Year :
2024

Abstract

In recent progress, mathematical verifiers have achieved success in mathematical reasoning tasks by validating the correctness of solutions generated by policy models. However, existing verifiers are trained with binary classification labels, which are not informative enough for the model to accurately assess the solutions. To mitigate the aforementioned insufficiency of binary labels, we introduce step-wise natural language feedback as rationale labels, that is, the correctness of each step and the detailed explanations. In this paper, we propose Math-Minos, a natural language feedback-enhanced verifier by constructing automatically generated training data and a two-stage training paradigm for effective training and efficient inference. Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier in both verification and reinforcement learning. We have released the code and data for further exploration.<br />Comment: 15 pages

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2406.14024
Document Type :
Working Paper