First page of Natural Language Translation of Formal Proofs through Informalization of Proof Steps and Recursive Summarization along Proof Structure

Natural Language Translation of Formal Proofs through Informalization of Proof Steps and Recursive Summarization along Proof Structure

Seiji Hattori, Takuya Matsuzaki, Makoto Fujiwara

cs.CL Sep 10, 2025 · v1

Read PDF arXiv abstract Code

TL;DR

Translates formal proofs from the Lean proof assistant library into natural language via LLM informalization and recursive summarization.

Abstract

This paper proposes a natural language translation method for machine-verifiable formal proofs that leverages the informalization (verbalization of formal language proof steps) and summarization capabilities of LLMs. For evaluation, it was applied to formal proof data created in accordance with natural language proofs taken from an undergraduate-level textbook, and the quality of the generated natural language proofs was analyzed in comparison with the original natural language proofs. Furthermore, we will demonstrate that this method can output highly readable and accurate natural language proofs by applying it to existing formal proof library of the Lean proof assistant.

Problem

Formal proofs in theorem provers like Lean are machine-verifiable but difficult for humans to read. Translating them to natural language while preserving accuracy and readability is challenging, especially for multi-step proofs with complex dependency structures.

Approach

The method leverages LLM informalization (verbalization of formal proof steps) and recursive summarization along the proof structure. Individual proof steps are informalized into natural language, then the proof's dependency graph is traversed to recursively summarize sub-proofs into coherent narratives. A premise library and template system provide context for the informalization. The method is evaluated on formal proofs in Lean created from undergraduate textbook proofs.

Results

With both template and premise library, the method achieves 89.05% correct translations, with 5.15% misinformation, 3.87% unnecessary mentions, and only 0.40% untranslated expressions. Without the premise library, correctness drops to 83.09%. The method produces highly readable natural language proofs that closely match the structure of original textbook proofs.

Config	Correct	Misinformation	Insufficient	Unnecessary	Untranslated
Template + Premise	89.05	5.15	3.87	1.50	0.40
Template only	83.09	8.05	3.46	4.51	0.89
Premise only	53.95	2.50	24.40	18.80	0.40

Translation quality by configuration

Papers With

Natural Language Translation of Formal Proofs through Informalization of Proof Steps and Recursive Summarization along Proof Structure