← All papers

The Faithfulness Gap: Certifying Semantic Equivalence Between Natural-Language and Formal Mathematical Statements

Noor Islam S. Mohammad, Tamim Sheikh

cs.AI Jun 15, 2026 · v1 cs.LG

autoformalization benchmark

Read PDF arXiv abstract

TL;DR

Releases DriftBench, 2,183 NL/Lean 4 pairs over six mathlib4 subfields, to certify autoformalization faithfulness.

Abstract

Autoformalization, translating natural-language mathematics into formal proof assistants, is bottlenecked not by translation fluency but by \emph{faithfulness}: a formal statement can typecheck and be provable, yet still encode a different theorem than the source intended. We introduce \emph{Bidirectional Provability Fingerprinting} (\bpf{}), a framework that certifies faithfulness by characterizing each candidate through its forward and backward consequence neighborhoods in the ambient theory and matching these against probes derived from the natural-language statement. We further introduce four novel components: (i) \emph{Counterfactual Probe Generation} (\cpg{}), a contrastive procedure that synthesizes probes targeting specific drift directions; (ii) the \emph{Equivalence Spectrum}, a continuous faithfulness score that replaces brittle binary verdicts; (iii) \emph{Adaptive Probe Budget Allocation} (\apba{}), an information-theoretic budget router; and (iv) \emph{Faithfulness-Guided Decoding} (\fgd{}), which uses \bpf{} signals as a reward during autoformalization. We prove a \emph{drift detection theorem} and a \emph{PAC-faithfulness} result establishing that the equivalence class of a natural language statement is learnable from $\mathcal{O}(\log(1/δ)/\varepsilon)$ probes under mild assumptions. We release \driftbench{}, a benchmark of $2{,}183$ NL/Lean~4 pairs with controlled drift labels across six subfields of mathlib4. \bpf{}\,+\,\cpg{} detects $89.6\%$ of drifted formalizations at a $3.0\%$ false-positive rate-against $41.2\%$ for typecheck and $63.3\%$ for LLM-judge baselines, and \fgd{} reduces the rate at which a state-of-the-art autoformalizer emits drifted statements by $47\%$. https://pmlrbd.github.io/BPF/

Problem

Autoformalization is bottlenecked by faithfulness rather than fluency: a formal statement can typecheck and be provable yet encode a different theorem than the source intended.

Approach

Bidirectional Provability Fingerprinting characterizes each candidate by its forward and backward consequence neighborhoods and matches them against probes derived from the natural-language statement. Added components include counterfactual probe generation, a continuous equivalence-spectrum score, adaptive probe budget allocation, and faithfulness-guided decoding, with drift-detection and PAC-faithfulness theorems. The work targets Lean 4 and mathlib4.

Results

DriftBench contains 2,183 NL/Lean 4 pairs with controlled drift labels. BPF+CPG detects 89.6% of drifted formalizations at a 3.0% false-positive rate, versus 41.2% for typecheck and 63.3% for an LLM-judge, and faithfulness-guided decoding cuts a state-of-the-art autoformalizer's drift rate by 47%.

Papers With

The Faithfulness Gap: Certifying Semantic Equivalence Between Natural-Language and Formal Mathematical Statements