Prover Agent: An Agent-Based Framework for Formal Mathematical Proofs
Kaito Baba, Chaoran Liu, Shuhei Kurita, Akiyoshi Sannai
cs.AI
Jun 24, 2025 · v1
TL;DR
Prover Agent couples LLMs with the Lean proof assistant and auxiliary-lemma generation for automated theorem proving on MiniF2F and PutnamBench.
Abstract
We present Prover Agent, a novel AI agent for automated theorem proving that integrates large language models (LLMs) with a formal proof assistant, Lean. Prover Agent coordinates an informal reasoning LLM, a formal prover model, and feedback from Lean while also generating auxiliary lemmas. These auxiliary lemmas are not limited to subgoals in the formal proof but can also include special cases or potentially useful facts derived from the assumptions, which help in discovering a viable proof strategy. It achieves an 88.1% success rate on MiniF2F and solves 25 problems on the PutnamBench with a smaller sample budget than previous approaches, establishing a new state-of-the-art on both benchmarks among methods using small language models (SLMs). We also present theoretical analyses and case studies that illustrate how these generated lemmas contribute to solving challenging problems. Our code is publicly available at
https://github.com/kAIto47802/Prover-Agent.
Problem
Automated theorem proving systems struggle with complex formal proofs that require discovering non-obvious proof strategies, particularly when useful intermediate lemmas are not subgoals of the main proof but rather special cases or auxiliary facts.
Approach
Prover Agent is an AI agent that integrates an informal reasoning LLM, a formal prover model, and feedback from Lean. It coordinates these components while also generating auxiliary lemmas that are not limited to subgoals in the formal proof but can include special cases or potentially useful facts derived from assumptions. These auxiliary lemmas help discover viable proof strategies that direct proof search might miss.
Results
Prover Agent achieves an 88.1% success rate on MiniF2F and solves 25 problems on PutnamBench with a budget of 1600 attempts per problem.