ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction

Murari Ambati

cs.AI May 30, 2025 · v1

theorem-proving-ml ai-agents verification dataset

Read PDF arXiv abstract

TL;DR

Uses Lean 4 as a backend proof verifier and reward source in an RL theorem-proving loop, evaluating on miniF2F and a Lean mathlib-derived dataset.

Abstract

We propose ProofNet++, a neuro-symbolic framework that enhances automated theorem proving by combining large language models (LLMs) with formal proof verification and self-correction mechanisms. Current LLM-based systems suffer from hallucinated logical steps and unverifiable reasoning. ProofNet++ mitigates these limitations by integrating symbolic proof tree supervision, a reinforcement learning loop using verifiers as reward functions, and an iterative self-correction module. Our experiments on miniF2F, Lean's mathlib, and HOL Light show that ProofNet++ significantly improves proof accuracy, correctness, and formal verifiability over prior models. We provide theoretical analysis of the convergence and stability of the verifier-guided RL framework and release our datasets and codebase for future research.

Problem

LLM-based theorem proving systems suffer from hallucinated logical steps and unverifiable reasoning. Current approaches lack mechanisms to detect and correct errors in generated proofs before final output.

Approach

ProofNet++ is a neuro-symbolic framework combining a transformer-based LLM (fine-tuned on proof corpora) with symbolic proof tree supervision, a reinforcement learning loop using formal verifiers as reward functions, and an iterative self-correction module. The symbolic reasoning interface bridges autoregressive LLM outputs and formal proof states. The system targets Lean, Isabelle/HOL, and HOL Light environments.

Results

On miniF2F, ProofNet++ achieves 68.4% Formal Proof Success Rate (FPSR) and 81.2% Proof Path Correctness (PPC) with average error distance of 3.2 steps. On mathlib-extract, it reaches 74.9% FPSR and 88.0% PPC. On HOL Light, it achieves 63.5% FPSR and 76.5% PPC.

Dataset	FPSR (%)	PPC (%)	EDPT	Latency (ms)
miniF2F	68.4	81.2	3.2	198
mathlib-extract	74.9	88.0	2.4	176
HOL Light Testbed	63.5	76.5	4.0	214

ProofNet++ evaluation results

Papers With

ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction