Automated Formalization via Conceptual Retrieval-Augmented LLMs

Wangyue Lu, Lun Du, Sirui Li, Ke Weng, Haozhe Sun, Hengyu Liu, Minghe Yu, Tiancheng Zhang, Ge Yu

cs.AI Aug 9, 2025 · v1

autoformalization mathlib theorem-proving-ml

Read PDF arXiv abstract Code

TL;DR

CRAMF builds a concept-definition knowledge base from Mathlib4 to retrieval-augment LLM autoformalization of math into Lean 4.

Abstract

Interactive theorem provers (ITPs) require manual formalization, which is labor-intensive and demands expert knowledge. While automated formalization offers a potential solution, it faces two major challenges: model hallucination (e.g., undefined predicates, symbol misuse, and version incompatibility) and the semantic gap caused by ambiguous or missing premises in natural language descriptions. To address these issues, we propose CRAMF, a Concept-driven Retrieval-Augmented Mathematical Formalization framework. CRAMF enhances LLM-based autoformalization by retrieving formal definitions of core mathematical concepts, providing contextual grounding during code generation. However, applying retrieval-augmented generation (RAG) in this setting is non-trivial due to the lack of structured knowledge bases, the polymorphic nature of mathematical concepts, and the high precision required in formal retrieval. We introduce a framework for automatically constructing a concept-definition knowledge base from Mathlib4, the standard mathematical library for the Lean 4 theorem prover, indexing over 26,000 formal definitions and 1,000+ core mathematical concepts. To address conceptual polymorphism, we propose contextual query augmentation with domain- and application-level signals. In addition, we design a dual-channel hybrid retrieval strategy with reranking to ensure accurate and relevant definition retrieval. Experiments on miniF2F, ProofNet, and our newly proposed AdvancedMath benchmark show that CRAMF can be seamlessly integrated into LLM-based autoformalizers, yielding consistent improvements in translation accuracy, achieving up to 62.1% and an average of 29.9% relative improvement.

Problem

Automated formalization faces two challenges: model hallucination (undefined predicates, symbol misuse, version incompatibility) and semantic gaps caused by ambiguous or missing premises in natural language descriptions.

Approach

CRAMF is a Concept-driven Retrieval-Augmented Mathematical Formalization framework that retrieves formal definitions of core mathematical concepts from a structured knowledge base to provide contextual grounding during code generation. The knowledge base is automatically constructed from Mathlib4, indexing over 26,000 formal definitions and 1,000+ core concepts. Contextual query augmentation addresses conceptual polymorphism, and a dual-channel hybrid retrieval strategy with reranking ensures accurate definition retrieval.

Figure 2: Overview of the CRAMF framework. (a) Construction of the concept-definition knowledge base via back-translation and concept extraction from Mathlib definitions. (b) Integration into autoformalization: extracted concepts retrieve relevant definitions to guide formal code generation.

Results

On miniF2F, ProofNet, and a new AdvancedMath benchmark, CRAMF achieves up to 62.1% and an average of 29.9% relative improvement in translation accuracy when integrated into LLM-based autoformalizers.

Papers With

Automated Formalization via Conceptual Retrieval-Augmented LLMs