← All papers
First page of BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving

BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving

Ran Xin, Chenguang Xi, Jie Yang, Feng Chen, Hang Wu, Xia Xiao, Yifan Sun, Shen Zheng, Kai Shen

cs.AI Feb 5, 2025 · v3
BFS-Prover scales best-first tree search with expert iteration and DPO for LLM theorem proving in Lean4.
Recent advancements in large language models (LLMs) have spurred growing interest in automatic theorem proving using Lean4, where effective tree search methods are crucial for navigating the underlying large proof search spaces. While the existing approaches primarily rely on value functions and/or Monte Carlo Tree Search (MCTS), the potential of simpler methods like Best-First Tree Search (BFS) remains underexplored. In this paper, we investigate whether BFS can achieve competitive performance in large-scale theorem proving tasks. We present BFS-Prover, a scalable expert iteration framework, featuring three key innovations. First, we implement strategic data filtering at each expert iteration round, excluding problems solvable via beam search node expansion to focus on harder cases. Second, we improve the sample efficiency of BFS through Direct Preference Optimization (DPO) applied to state-tactic pairs automatically annotated with compiler error feedback, refining the LLM's policy to prioritize productive expansions. Third, we employ length normalization in BFS to encourage exploration of deeper proof paths. BFS-Prover achieves a state-of-the-art score of $72.95\%$ on the MiniF2F test set and therefore challenges the perceived necessity of complex tree search methods, demonstrating that BFS can achieve competitive performance when properly scaled. To facilitate further research and development in this area, we have open-sourced our model at https://huggingface.co/ByteDance-Seed/BFS-Prover-V1-7B.

LLM theorem proving in Lean4 is dominated by value functions and Monte Carlo Tree Search; the potential of simpler best-first search is underexplored.

BFS-Prover is a scalable expert-iteration framework over best-first tree search with three additions: strategic per-round data filtering of beam-solvable problems, Direct Preference Optimization on state-tactic pairs annotated with compiler error feedback, and length normalization to encourage deeper proof paths. The policy is a fine-tuned Qwen2.5-Math-7B over Mathlib data via LeanDojo.

Figure 1: An overview of BFS-Prover . The figure illustrates two key components: a) Best-First Tree Search (BFS) mechanism for automatic theorem proving, showing how the system explores proof paths by expanding nodes based on accumulated probabilities, with different node types (proof finish, error, intermediate, and unexplored) represented via color coding. The identified proof path demonstrates

BFS-Prover reaches a state-of-the-art 72.95% on the MiniF2F test set, challenging the assumed need for complex tree search and showing that properly scaled BFS is competitive.

Figure 4: Evolution of proof accuracies with increasing number of proof search passes on the MiniF2F test. We compare two training methods: (blue) only using SFT on all state-tactic pairs accumulated in the expert iteration pipeline of BFS-Prover ; and (red) , on top of (blue) , perform another round of DPO refinement using negative tactic examples described in Section 2.3 . It can be seen that DP