Back to Search Start Over

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought

Authors :
Xiang, Violet
Snell, Charlie
Gandhi, Kanishk
Albalak, Alon
Singh, Anikait
Blagden, Chase
Phung, Duy
Rafailov, Rafael
Lile, Nathan
Mahan, Dakota
Castricato, Louis
Franken, Jan-Philipp
Haber, Nick
Finn, Chelsea
Publication Year :
2025

Abstract

We propose a novel framework, Meta Chain-of-Thought (Meta-CoT), which extends traditional Chain-of-Thought (CoT) by explicitly modeling the underlying reasoning required to arrive at a particular CoT. We present empirical evidence from state-of-the-art models exhibiting behaviors consistent with in-context search, and explore methods for producing Meta-CoT via process supervision, synthetic data generation, and search algorithms. Finally, we outline a concrete pipeline for training a model to produce Meta-CoTs, incorporating instruction tuning with linearized search traces and reinforcement learning post-training. Finally, we discuss open research questions, including scaling laws, verifier roles, and the potential for discovering novel reasoning algorithms. This work provides a theoretical and practical roadmap to enable Meta-CoT in LLMs, paving the way for more powerful and human-like reasoning in artificial intelligence.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2501.04682
Document Type :
Working Paper