A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation

Beibei Xiong, Hangyu Lv, Haojia Shan, Jianlin Wang, Zhengfeng Yang, Lihong Zhi

cs.AI Feb 25, 2025 · v1

benchmark dataset theorem-proving-ml combinatorics

Read PDF arXiv abstract

TL;DR

Builds LeanComb, a Lean benchmark of combinatorial identities, plus a 260K-theorem Lean dataset generated by an automated theorem generator.

Abstract

Large language models (LLMs) have significantly advanced formal theorem proving, yet the scarcity of high-quality training data constrains their capabilities in complex mathematical domains. Combinatorics, a cornerstone of mathematics, provides essential tools for analyzing discrete structures and solving optimization problems. However, its inherent complexity makes it particularly challenging for automated theorem proving (ATP) for combinatorial identities. To address this, we manually construct LeanComb, combinatorial identities benchmark in Lean, which is, to our knowledge, the first formalized theorem proving benchmark built for combinatorial identities. We develop an Automated Theorem Generator for Combinatorial Identities, ATG4CI, which combines candidate tactics suggested by a self-improving large language model with a Reinforcement Learning Tree Search approach for tactic prediction. By utilizing ATG4CI, we generate a LeanComb-Enhanced dataset comprising 260K combinatorial identities theorems, each with a complete formal proof in Lean, and experimental evaluations demonstrate that models trained on this dataset can generate more effective tactics, thereby improving success rates in automated theorem proving for combinatorial identities.

Problem

Automated theorem proving for combinatorial identities is limited by scarcity of high-quality formalized training data. No benchmark existed specifically for combinatorial identities in Lean.

Approach

The authors construct LeanComb, the first formalized benchmark for combinatorial identity theorem proving in Lean. They develop ATG4CI, an Automated Theorem Generator that combines candidate tactics from a self-improving LLM with Reinforcement Learning Tree Search for tactic prediction. The system generates theorems with complete formal proofs iteratively.

Results

ATG4CI produces LeanComb-Enhanced, a dataset of 260K combinatorial identity theorems each with a complete Lean proof. Models trained on this dataset generate more effective tactics and improve success rates on combinatorial identity proving.

Category	Training	Test
Generalized Binomial Coefficient	42	16
Absorption Identity	35	12
Binomial Inversion	46	6
Alternating Binomial Sums	23	6
Inclusion-Exclusion	27	5

LeanComb benchmark categories

Papers With

A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation