FormaRL: Enhancing Autoformalization with no Labeled Data

Yanxing Huang, Xinling Jin, Sijie Liang, Peng Li, Yang Liu

cs.AI Aug 26, 2025 · v1

autoformalization theorem-proving-ml dataset

Read PDF arXiv abstract Code

TL;DR

FormaRL uses the Lean compiler's syntax check plus an LLM consistency check as RL reward for autoformalization into Lean.

Abstract

Autoformalization is one of the central tasks in formal verification, while its advancement remains hindered due to the data scarcity and the absence efficient methods. In this work we propose \textbf{FormaRL}, a simple yet efficient reinforcement learning framework for autoformalization which only requires a small amount of unlabeled data. FormaRL integrates syntax check from Lean compiler and consistency check from large language model to calculate the reward, and adopts GRPO algorithm to update the formalizer. We also curated a proof problem dataset from undergraduate-level math materials, named \textbf{uproof}, in the hope to facilitate the exploration of autoformalization and theorem proving in advanced math. Experiments show that FormaRL can increase the pass@1 autoformalization accuracy of Qwen2.5-Coder-7B-Instruct by 4 $\sim$ 6x (4.04\% $\to$ 26.15\% on ProofNet and 2.4\% $\to$ 9.6\% on uproof) with merely 859 unlabeled data. And on uproof our method also achieved a strong improvement in out-of-distribution performance compared to existing open-source state-of-the-art autoformalizers on both pass@1 accuracy (6.2\% $\to$ 9.6\%) and pass@16 accuracy (24.4\% $\to$ 33.6\%). Training code of FormaRL is open-sourced at https://github.com/THUNLP-MT/FormaRL.

Problem

Autoformalization advancement is hindered by data scarcity and the absence of efficient training methods, since labeled formal-informal parallel data is rare.

Approach

FormaRL is a reinforcement learning framework for autoformalization requiring only unlabeled data. It integrates syntax checking from the Lean compiler and consistency checking from a large language model to compute rewards, then uses GRPO to update the formalizer. The authors also curate uproof, a proof problem dataset from undergraduate-level math materials.

Results

FormaRL increases Qwen2.5-Coder-7B-Instruct pass@1 autoformalization accuracy by 4-6x (4.04% to 26.15% on ProofNet, 2.4% to 9.6% on uproof) using only 859 unlabeled examples. On uproof it achieves state-of-the-art out-of-distribution performance at both pass@1 (9.6%) and pass@16 (33.6%).

Papers With

FormaRL: Enhancing Autoformalization with no Labeled Data