← All papers

Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques

Sangjun Han, Taeil Hur, Youngmi Hur, Kathy Sangkyung Lee, Myungyoon Lee, Hyojae Lim

cs.LO Feb 5, 2025 · v1

theorem-proving-ml benchmark ai-agents

Read PDF arXiv abstract Code

TL;DR

Combines ChatGPT with the Lean prover and simple search to generate verifiable formal proofs on the miniF2F dataset.

Abstract

The challenge of formal proof generation has a rich history, but with modern techniques, we may finally be at the stage of making actual progress in real-life mathematical problems. This paper explores the integration of ChatGPT and basic searching techniques to simplify generating formal proofs, with a particular focus on the miniF2F dataset. We demonstrate how combining a large language model like ChatGPT with a formal language such as Lean, which has the added advantage of being verifiable, enhances the efficiency and accessibility of formal proof generation. Despite its simplicity, our best-performing Lean-based model surpasses all known benchmarks with a 31.15% pass rate. We extend our experiments to include other datasets and employ alternative language models, showcasing our models' comparable performance in diverse settings and allowing for a more nuanced analysis of our results. Our findings offer insights into AI-assisted formal proof generation, suggesting a promising direction for future research in formal mathematical proof.

Problem

Generating formal proofs in Lean automatically remains difficult, and it is unclear how effectively a general-purpose LLM like ChatGPT can be combined with simple search techniques to produce verified proofs.

Approach

The authors integrate ChatGPT with basic searching techniques (including feedback-based algorithms) for generating Lean proofs on the miniF2F dataset. They use Lean's verifier as an oracle to confirm correctness, iterating with different feedback strategies (including a Bad(O)-based feedback algorithm that communicates error information back to the LLM).

Results

The best-performing Lean-based model achieves a 31.15% pass rate on miniF2F, surpassing all previously known benchmarks at the time. The approach demonstrates comparable performance across different datasets and alternative language models.

Papers With

Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques