StepFun-Prover Preview: Let's Think and Verify Step by Step
Automated theorem provers for Lean 4 require extensive sampling to find proofs, limiting practical utility. Achieving high pass@1 rates with minimal interaction remains challenging.
The authors present StepFun-Prover Preview, an LLM for formal theorem proving through tool-integrated reasoning. A reinforcement learning pipeline incorporating tool-based interactions enables the model to emulate human-like problem-solving by iteratively refining proofs based on real-time Lean 4 REPL feedback. The training framework consists of a cold start phase compiling open-source formal math datasets, followed by tool-integrated GRPO where the model learns to interact with the Lean environment, plus iterative RL-SFT cycles.
On miniF2F-test, StepFun-Prover-Preview-32B achieves 70.0% pass@1, outperforming DeepSeek-Prover-V2-671B (61.9%) and Kimina-Prover-72B (63.9%). The 7B variant achieves 66.0%. Performance scales with generation length, reaching 70.0% at 20,480 tokens versus 58.3% at 4,096.
| Model | Pass@1 |
|---|---|
| DeepSeek-Prover-V2-7B | 58.6% |
| DeepSeek-Prover-V2-671B | 61.9% |
| Kimina-Prover-8B | 61.1% |
| Kimina-Prover-72B | 63.9% |
| StepFun-Prover-Preview-7B | 66.0% |
| StepFun-Prover-Preview-32B | 70.0% |
