Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Wenlei Shi, Zhihong Wang, Mingxuan Wang, Chenrui Wei, Shufa Wei, Huajian Xin, Fan Yang, Weihao Gao, Zheng Yuan, Tianyang Zhan, Zeyu Zheng, Tianxi Zhou, Thomas Hanwen Zhu

cs.CL Dec 19, 2025 · v1

theorem-proving-ml benchmark ai-agents

Read PDF arXiv abstract Code

TL;DR

Trains a formal theorem-proving model via agentic reinforcement learning that interacts with Lean to prove undergraduate-and-beyond problems.

Abstract

Large language models have recently made significant progress to generate rigorous mathematical proofs. In contrast, utilizing LLMs for theorem proving in formal languages (such as Lean) remains challenging and computationally expensive, particularly when addressing problems at the undergraduate level and beyond. In this work, we present \textbf{Seed-Prover 1.5}, a formal theorem-proving model trained via large-scale agentic reinforcement learning, alongside an efficient test-time scaling (TTS) workflow. Through extensive interactions with Lean and other tools, the model continuously accumulates experience during the RL process, substantially enhancing the capability and efficiency of formal theorem proving. Furthermore, leveraging recent advancements in natural language proving, our TTS workflow efficiently bridges the gap between natural and formal languages. Compared to state-of-the-art methods, Seed-Prover 1.5 achieves superior performance with a smaller compute budget. It solves \textbf{88\% of PutnamBench} (undergraduate-level), \textbf{80\% of Fate-H} (graduate-level), and \textbf{33\% of Fate-X} (PhD-level) problems. Notably, using our system, we solved \textbf{11 out of 12 problems} from Putnam 2025 within 9 hours. Our findings suggest that scaling learning from experience, driven by high-quality formal feedback, holds immense potential for the future of formal mathematical reasoning.

Problem

Formal theorem proving in Lean remains challenging and computationally expensive for undergraduate-level problems and beyond, and has not yet benefited from posttraining scaling techniques like reinforcement learning from experience.

Approach

Seed-Prover 1.5 is trained via large-scale agentic reinforcement learning with extensive Lean compiler interactions. An agentic prover incrementally constructs proofs step by step using multiple tools and caches successfully compiled lemmas. A test-time scaling workflow bridges natural and formal languages by leveraging advances in natural language proving. The model accumulates experience during RL, improving both capability and efficiency.

Results

Seed-Prover 1.5 solves 88% of PutnamBench (undergraduate-level), 80% of Fate-H (graduate-level), and 33% of Fate-X (PhD-level) problems. It solved 11 of 12 Putnam 2025 problems within 9 hours and all 6 IMO 2025 problems (5 within 16.5 hours).

Method	Budget	Putnam	Fate-H	Fate-X
Goedel-Prover-V2-32B	pass@64	86/660	2/100	0/100
Seed-Prover 1.0	18 H20 days/prob	331/660	35/100	9/100
Seed-Prover 1.5	pass@8 + TTS	581/660	80/100	33/100

Performance comparison on formal theorem proving benchmarks

Papers With

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience