← All papers

Automated Formal Proofs of Combinatorial Identities via Wilf-Zeilberger Guidance and LLMs

Beibei Xiong, Hangyu Lv, Junqi Liu, Yisen Wang, Shaoshi Chen, Jianlin Wang, Zhengfeng Yang, Lihong Zhi

cs.LG May 6, 2026 · v1

theorem-proving-ml combinatorics

Read PDF arXiv abstract

TL;DR

WZ-LLM turns Wilf-Zeilberger plans into Lean 4 proof sketches discharged by an LLM prover.

Abstract

Automating formal proofs of combinatorial identities is challenging for LLM-based provers, as long-horizon proof planning is required and unconstrained search quickly explodes. Symbolic methods such as the Wilf-Zeilberger (WZ) method can achieve a mechanized proof of combinatorial identities by constructing special auxiliary functions and demonstrating that they satisfy specific recurrence relations. We propose WZ-LLM, a neuro-symbolic framework that turns WZ proof plans into executable proof sketches in Lean 4 and uses an LLM-based prover to discharge the resulting machine-checkable subgoals. We also train a dedicated WZ-Prover via a Lean-kernel-verified bootstrapping loop with expert-verified iteration, followed by DAPO-based refinement. Experiments show that WZ-LLM achieves a 34% proof success rate on LCI-Test (100 classic combinatorial identities), outperforming strong baselines such as DeepSeek-V3 and Goedel-Prover-V2, and delivering consistent gains on CombiBench and PutnamBench-Comb. These results indicate that our framework provides two complementary strengths: improved direct proving for identities beyond the scope of WZ, and substantially higher end-to-end success when WZ sketches guide a specialized prover.

Problem

Automating formal proofs of combinatorial identities in Lean 4 is difficult for LLM-based provers because long-horizon proof planning is required and unconstrained search quickly explodes. Data scarcity and the complexity of boundary conditions further limit existing methods.

Approach

The authors propose WZ-LLM, a neuro-symbolic framework that translates Wilf-Zeilberger (WZ) proof plans into executable proof sketches in Lean 4 and uses an LLM-based prover to discharge the resulting machine-checkable subgoals. The system constructs WZ certificates, decomposes identities into structured obligations (recurrence verification, boundary conditions, initial values), and compiles them into Lean proof sketches. A dedicated WZ-Prover is trained via a Lean-kernel-verified bootstrapping loop with expert-verified iteration, followed by DAPO-based reinforcement learning refinement.

Figure 1 : Overview of WZ-LLM. (A) Symbolic decomposition. Given combinatorial-identity statements, the symbolic engine checks WZ applicability and, when applicable, constructs a WZ proof sketch that is decomposed into sub-goals. Statements that are not amenable to the WZ method are labeled as WZ-uncovered targets. All sub-goals and uncovered targets form a shared pool of proving tasks. (B) Model

Results

WZ-LLM achieves a 34% proof success rate on LCI-Test (100 classic combinatorial identities) at pass@32, outperforming DeepSeek-V3 (1/100), Goedel-Prover-V2 (9/100), DeepSeek-Prover-V2 (6/100), and Kimina-Prover-Distill (6/100). The WZ-Prover alone solves 29/46 WZ-applicable identities via sketch guidance, and 5 additional WZ-uncovered identities are proved directly.

Method	Model size	Sample budget	LCI-Test
DeepSeek-V3	685B	pass@32	1/100
Goedel-Prover-V2	8B	pass@32	9/100
DeepSeek-Prover-V2	7B	pass@32	6/100
WZ-Prover	8B	pass@32	12/100
WZ-Sketch + WZ-Prover	8B	pass@32	29/100
WZ-LLM	8B	pass@32	34/100

Main results on LCI-Test

Papers With

Automated Formal Proofs of Combinatorial Identities via Wilf-Zeilberger Guidance and LLMs