← All papers
First page of Leanabell-Prover: Posttraining Scaling in Formal Reasoning

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

Jingyuan Zhang, Qi Wang, Xingguang Ji, Yahui Liu, Yang Yue, Fuzheng Zhang, Di Zhang, Guorui Zhou, Kun Gai

cs.AI Apr 8, 2025 · v1
Posttrains Lean 4 LLM theorem provers with continual training and reinforcement learning using the Lean 4 compiler as reward.
Recent advances in automated theorem proving (ATP) through LLMs have highlighted the potential of formal reasoning with Lean 4 codes. However, ATP has not yet be revolutionized by the recent posttraining scaling as demonstrated by Open AI O1/O3 and Deepseek R1. In this work, we investigate the entire posttraining of ATP, aiming to align it with breakthroughs in reasoning models in natural languages. To begin, we continual train current ATP models with a hybrid dataset, which consists of numerous statement-proof pairs, and additional data aimed at incorporating cognitive behaviors that emulate human reasoning and hypothesis refinement. Next, we explore reinforcement learning with the use of outcome reward returned by Lean 4 compiler. Through our designed continual training and reinforcement learning processes, we have successfully improved existing formal provers, including both DeepSeek-Prover-v1.5 and Goedel-Prover, achieving state-of-the-art performance in the field of whole-proof generation. For example, we achieve a 59.8% pass rate (pass@32) on MiniF2F. This is an on-going project and we will progressively update our findings, release our data and training details.

Automated theorem proving in Lean 4 has not yet been advanced by posttraining scaling techniques (reinforcement learning with compiler feedback) that have proven effective for natural language reasoning.

The authors continually train existing ATP models with a hybrid dataset combining statement-proof pairs and data incorporating cognitive behaviors that emulate human reasoning and hypothesis refinement. They then apply reinforcement learning using GRPO with outcome rewards from the Lean 4 compiler (success/fail binary signal). The approach is applied on top of both DeepSeek-Prover-v1.5 and Goedel-Prover.

Leanabell-Prover achieves 59.8% pass@32 on MiniF2F-test, establishing state-of-the-art for whole-proof generation. RL training improves both baseline models, with the Goedel-Prover variant showing the largest gains.

MethodSample BudgetMiniF2F-test
DeepSeek-Prover-v1.5-Base640042.2%
DeepSeek-Prover-v1.5-RL640050.0%
Goedel-Prover3257.6%
Leanabell-Prover-GD-RL3259.8%
Performance on MiniF2F-test