← All papers
First page of ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction

ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction

Murari Ambati

cs.AI May 30, 2025 · v1
Uses Lean 4 as a backend proof verifier and reward source in an RL theorem-proving loop, evaluating on miniF2F and a Lean mathlib-derived dataset.
We propose ProofNet++, a neuro-symbolic framework that enhances automated theorem proving by combining large language models (LLMs) with formal proof verification and self-correction mechanisms. Current LLM-based systems suffer from hallucinated logical steps and unverifiable reasoning. ProofNet++ mitigates these limitations by integrating symbolic proof tree supervision, a reinforcement learning loop using verifiers as reward functions, and an iterative self-correction module. Our experiments on miniF2F, Lean's mathlib, and HOL Light show that ProofNet++ significantly improves proof accuracy, correctness, and formal verifiability over prior models. We provide theoretical analysis of the convergence and stability of the verifier-guided RL framework and release our datasets and codebase for future research.

LLM-based theorem proving systems suffer from hallucinated logical steps and unverifiable reasoning. Current approaches lack mechanisms to detect and correct errors in generated proofs before final output.

ProofNet++ is a neuro-symbolic framework combining a transformer-based LLM (fine-tuned on proof corpora) with symbolic proof tree supervision, a reinforcement learning loop using formal verifiers as reward functions, and an iterative self-correction module. The symbolic reasoning interface bridges autoregressive LLM outputs and formal proof states. The system targets Lean, Isabelle/HOL, and HOL Light environments.

On miniF2F, ProofNet++ achieves 68.4% Formal Proof Success Rate (FPSR) and 81.2% Proof Path Correctness (PPC) with average error distance of 3.2 steps. On mathlib-extract, it reaches 74.9% FPSR and 88.0% PPC. On HOL Light, it achieves 63.5% FPSR and 76.5% PPC.

DatasetFPSR (%)PPC (%)EDPTLatency (ms)
miniF2F68.481.23.2198
mathlib-extract74.988.02.4176
HOL Light Testbed63.576.54.0214
ProofNet++ evaluation results