ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis

Haoxiong Liu, Jiacheng Sun, Zhenguo Li, Andrew C Yao

cs.LG Jan 30, 2025 · v1

theorem-proving-ml tactic-automation benchmark

Read PDF arXiv abstract Code

TL;DR

ProofAug augments LLM proofs with automation tools at multiple granularities; a Lean 4 version improves a prover's miniF2F pass@1.

Abstract

The synergy between deep learning models and traditional automation tools, such as built-in tactics of the proof assistant and off-the-shelf automated theorem provers, plays a crucial role in developing robust and efficient neural theorem provers(NTPs). However, for proof synthesis with LLMs, previous work applies automation tools either only when explicitly invoked by the model or at a single granularity level, failing to fully exploit their power. To solve this issue, we propose ProofAug, a procedure that equips LLMs with automation methods at various granularities through fine-grained structure analysis of model-generated proof proposals. ProofAug also serves as a versatile plug-and-play module that seamlessly integrates with any tree-search algorithm, enabling our construction of an efficient recursive proving (ERP) module to further enhance performance. The superiority of our method is validated on the miniF2F benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant. Notably, by additionally employing a mixed prompting strategy, we achieve a cumulative pass rate of 66.0% after curation of the dataset (61.9% for the original version) with 2100 queries to the model per problem (In contrast, the previous SOTA in Isabelle, Subgoal-XL, only achieves 56.1% using 16384 queries per problem). We also implement a Lean 4 version of ProofAug that can improve the pass@1 performance of Kimina-Prover-Preview-Distill-1.5B from 44.3% to 50.4% on miniF2F-test. Our code is available at https://github.com/haoxiongliu/ProofAug.

Problem

Previous neural theorem provers apply automation tools either only when explicitly invoked or at a single granularity, failing to fully exploit their power for proof synthesis.

Approach

ProofAug equips LLMs with automation methods at multiple granularities through fine-grained structure analysis of model-generated proof proposals. It finds the Maximal Compatible Semi-Proof from a failed attempt, then applies automation tools to fill gaps at different levels. It includes an Efficient Recursive Proving (ERP) module and integrates as a plug-and-play module with any tree-search algorithm. Both Isabelle and Lean 4 versions are implemented.

Results

On miniF2F-test using deepseek-math-7b-base with Isabelle, ProofAug achieves 66.0% (curated) with 2100 queries per problem, compared to the previous SOTA of 56.1% using 16384 queries. The Lean 4 version improves Kimina-Prover-Preview-Distill-1.5B pass@1 from 44.3% to 50.4%.

Method	Model	Budget	Score
Subgoal-XL	Llama-8B	16384	56.1%
LEGO-Prover	mixed GPTs	100	50.0%
ProofAug	dsmath-7b	100	52.5%
ProofAug+Mixed+ERP	dsmath-7b	2100	66.0%

ProofAug results on miniF2F-test (Isabelle)

Papers With

ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis