← All papers

MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics

Xinyu Liu, Zixuan Xie, Amir Moeini, Claire Chen, Shuze Daniel Liu, Yu Meng, Aidong Zhang, Shangtong Zhang

cs.LO Jan 30, 2026 · v1

benchmark mathlib theorem-proving-ml

Read PDF arXiv abstract

TL;DR

Builds MathlibLemma, an LLM pipeline mining and proving folklore lemmas, with a 4028-statement Lean benchmark.

Abstract

While the ecosystem of Lean and Mathlib has enjoyed celebrated success in formal mathematical reasoning with the help of large language models (LLMs), the absence of many folklore lemmas in Mathlib remains a persistent barrier that limits Lean's usability as an everyday tool for mathematicians like \LaTeX{} or Maple. To address this, we introduce MathlibLemma, a modular LLM-based pipeline for automated folklore-lemma mining: the discovery, formalization, and proving of reusable intermediate facts that mathematicians often take for granted but that are not always present in formal libraries. At its core, MathlibLemma proactively mines the missing connective tissue of mathematics. The pipeline produces a verified library of folklore-style lemmas, including 1,506 Lean-checked proofs that pass a proof-bypass screen; a small curated pilot subset has also been merged into Mathlib, providing external evidence that selected outputs can meet expert library standards. Leveraging this pipeline, we further construct the MathlibLemma benchmark, a suite of 4,028 non-trivial type-checked Lean statements spanning a broad range of mathematical domains. By transforming the role of LLMs from passive consumers to active contributors, this work takes a step toward AI-assisted expansion of formal mathematical libraries.

Problem

The absence of many folklore lemmas in Mathlib remains a persistent barrier that limits Lean's usability as an everyday tool for mathematicians. These reusable intermediate facts are often taken for granted but not present in formal libraries.

Approach

MathlibLemma is a modular LLM-based pipeline for automated folklore-lemma mining: discovering, formalizing, and proving reusable intermediate facts. The pipeline produces a verified library of folklore-style lemmas by proactively mining the missing connective tissue of mathematics, using LLMs to generate candidate lemmas and then verifying them in Lean. The system transforms the role of LLMs from passive consumers to active contributors to formal libraries.

Results

The pipeline produces 1,506 Lean-checked proofs that pass a proof-bypass screen. A curated pilot subset has been merged into Mathlib, providing external evidence that selected outputs meet expert library standards. The MathlibLemma benchmark contains 4,028 non-trivial type-checked Lean statements spanning a broad range of mathematical domains.

Papers With

MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics