← All papers

Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs

David Yin, Jing Gao

cs.LO Feb 16, 2025 · v1

dataset theorem-proving-ml tactic-automation mathlib

Read PDF arXiv abstract Code

TL;DR

Generates 4.7M Lean theorems with proofs from Mathlib4 by exploring proof state-transition graphs to train a prover.

Abstract

Large Language Models (LLMs) have demonstrated significant potential in generating mathematical proofs. However, a persistent challenge is that LLMs occasionally make mistakes, while even a minor mistake can invalidate an entire proof. Proof assistants like Lean offer a great remedy. They are designed for verifying each step of a proof in a formal language, and in recent years researchers have created AI models to generate proofs in their languages. However, the scarcity of large-scale datasets of Lean proofs restrict the performance of such Automated Theorem Proving (ATP) models. We developed LeanNavigator, a novel method for generating a large-scale dataset of Lean theorems and proofs by finding new ways to prove existing Lean theorems. By leveraging an interactive Lean client and an efficient method for proof step generation, LeanNavigator efficiently produces new theorems with corresponding proofs. Applying this approach to Mathlib4, we generated 4.7 million theorems totaling 1 billion tokens, surpassing previous datasets by more than an order of magnitude. Using this extensive dataset, we trained an AI model that outperforms the state-of-the-art ReProver model in theorem-proving tasks. These results confirm our hypothesis and demonstrate the critical role of large datasets in improving the performance of automated theorem provers.

Problem

AI models for automated theorem proving in Lean are limited by the scarcity of large-scale training datasets. Existing datasets like ReProver contain only around 112K theorems.

Approach

LeanNavigator generates millions of Lean theorems with proofs by exploring state transition graphs. Starting from known theorems, it applies tactics to create new proof states, then traverses the resulting graph to extract new theorems. Each theorem accumulates an average of 2035 intermediate states (compared to ReProver's 21.69), producing a dataset of 4.7 million theorems with 1 billion tokens.

Results

LeanNavigator generates 4.7M theorems (vs. ReProver's 112K) with 1B tokens (vs. 57M). A flan-t5-base model trained on this data achieves 39/117 on MIL and 104/493 on MiniF2F, outperforming ReProver's 30/117 and 99/493 respectively.

Model	MIL	MiniF2F
LeanNavigator flan-t5 base	39/117	104/493
LeanNavigator flan-t5 small	25/117	52/493
ReProver	30/117	99/493

Theorem proving benchmark results

Papers With

Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs