MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics
Xinyu Liu, Zixuan Xie, Amir Moeini, Claire Chen, Shuze Daniel Liu, Yu Meng, Aidong Zhang, Shangtong Zhang
cs.LO
Jan 30, 2026 · v1
TL;DR
Builds MathlibLemma, an LLM pipeline mining and proving folklore lemmas, with a 4028-statement Lean benchmark.
Abstract
While the ecosystem of Lean and Mathlib has enjoyed celebrated success in formal mathematical reasoning with the help of large language models (LLMs), the absence of many folklore lemmas in Mathlib remains a persistent barrier that limits Lean's usability as an everyday tool for mathematicians like \LaTeX{} or Maple. To address this, we introduce MathlibLemma, a modular LLM-based pipeline for automated folklore-lemma mining: the discovery, formalization, and proving of reusable intermediate facts that mathematicians often take for granted but that are not always present in formal libraries. At its core, MathlibLemma proactively mines the missing connective tissue of mathematics. The pipeline produces a verified library of folklore-style lemmas, including 1,506 Lean-checked proofs that pass a proof-bypass screen; a small curated pilot subset has also been merged into Mathlib, providing external evidence that selected outputs can meet expert library standards. Leveraging this pipeline, we further construct the MathlibLemma benchmark, a suite of 4,028 non-trivial type-checked Lean statements spanning a broad range of mathematical domains. By transforming the role of LLMs from passive consumers to active contributors, this work takes a step toward AI-assisted expansion of formal mathematical libraries.
Problem
The absence of many folklore lemmas in Mathlib remains a persistent barrier that limits Lean's usability as an everyday tool for mathematicians. These reusable intermediate facts are often taken for granted but not present in formal libraries.
Approach
MathlibLemma is a modular LLM-based pipeline for automated folklore-lemma mining: discovering, formalizing, and proving reusable intermediate facts. The pipeline produces a verified library of folklore-style lemmas by proactively mining the missing connective tissue of mathematics, using LLMs to generate candidate lemmas and then verifying them in Lean. The system transforms the role of LLMs from passive consumers to active contributors to formal libraries.
Results
The pipeline produces 1,506 Lean-checked proofs that pass a proof-bypass screen. A curated pilot subset has been merged into Mathlib, providing external evidence that selected outputs meet expert library standards. The MathlibLemma benchmark contains 4,028 non-trivial type-checked Lean statements spanning a broad range of mathematical domains.