ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis
Haoxiong Liu, Jiacheng Sun, Zhenguo Li, Andrew C Yao
cs.LG
Jan 30, 2025 · v1
TL;DR
ProofAug augments LLM proofs with automation tools at multiple granularities; a Lean 4 version improves a prover's miniF2F pass@1.
Abstract
The synergy between deep learning models and traditional automation tools, such as built-in tactics of the proof assistant and off-the-shelf automated theorem provers, plays a crucial role in developing robust and efficient neural theorem provers(NTPs). However, for proof synthesis with LLMs, previous work applies automation tools either only when explicitly invoked by the model or at a single granularity level, failing to fully exploit their power. To solve this issue, we propose ProofAug, a procedure that equips LLMs with automation methods at various granularities through fine-grained structure analysis of model-generated proof proposals. ProofAug also serves as a versatile plug-and-play module that seamlessly integrates with any tree-search algorithm, enabling our construction of an efficient recursive proving (ERP) module to further enhance performance. The superiority of our method is validated on the miniF2F benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant. Notably, by additionally employing a mixed prompting strategy, we achieve a cumulative pass rate of 66.0% after curation of the dataset (61.9% for the original version) with 2100 queries to the model per problem (In contrast, the previous SOTA in Isabelle, Subgoal-XL, only achieves 56.1% using 16384 queries per problem). We also implement a Lean 4 version of ProofAug that can improve the pass@1 performance of Kimina-Prover-Preview-Distill-1.5B from 44.3% to 50.4% on miniF2F-test. Our code is available at
https://github.com/haoxiongliu/ProofAug.
Problem
Previous neural theorem provers apply automation tools either only when explicitly invoked or at a single granularity, failing to fully exploit their power for proof synthesis.
Approach
ProofAug equips LLMs with automation methods at multiple granularities through fine-grained structure analysis of model-generated proof proposals. It finds the Maximal Compatible Semi-Proof from a failed attempt, then applies automation tools to fill gaps at different levels. It includes an Efficient Recursive Proving (ERP) module and integrates as a plug-and-play module with any tree-search algorithm. Both Isabelle and Lean 4 versions are implemented.
Results
On miniF2F-test using deepseek-math-7b-base with Isabelle, ProofAug achieves 66.0% (curated) with 2100 queries per problem, compared to the previous SOTA of 56.1% using 16384 queries. The Lean 4 version improves Kimina-Prover-Preview-Distill-1.5B pass@1 from 44.3% to 50.4%.
| Method | Model | Budget | Score |
|---|
| Subgoal-XL | Llama-8B | 16384 | 56.1% |
| LEGO-Prover | mixed GPTs | 100 | 50.0% |
| ProofAug | dsmath-7b | 100 | 52.5% |
| ProofAug+Mixed+ERP | dsmath-7b | 2100 | 66.0% |
ProofAug results on miniF2F-test (Isabelle)