← All papers

RLMEval: Evaluating Research-Level Neural Theorem Proving

Auguste Poiroux, Antoine Bosselut, Viktor Kunčak

cs.CL Oct 29, 2025 · v1

benchmark theorem-proving-ml autoformalization formalization

Read PDF arXiv abstract

TL;DR

Builds an evaluation suite of 613 theorems drawn from six real Lean Blueprint formalization projects.

Abstract

Despite impressive results on curated benchmarks, the practical impact of large language models (LLMs) on research-level neural theorem proving and proof autoformalization is still limited. We introduce RLMEval, an evaluation suite for these tasks, focusing on research-level mathematics from real-world Lean formalization projects. RLMEval targets the evaluation of neural theorem proving and proof autoformalization on challenging research-level theorems by leveraging real Lean Blueprint formalization projects. Our evaluation of state-of-the-art models on RLMEval, comprising 613 theorems from 6 Lean projects, reveals a significant gap: progress on existing benchmarks does not readily translate to these more realistic settings, with the best model achieving only a 10.3 % pass rate. RLMEval provides a new, challenging benchmark designed to guide and accelerate progress in automated reasoning for formal mathematics.

Problem

Despite impressive results on curated benchmarks, the practical impact of LLMs on research-level neural theorem proving and proof autoformalization remains limited, and existing benchmarks do not reflect real-world difficulty.

Approach

RLMEval is an evaluation suite for neural theorem proving and proof autoformalization that targets research-level mathematics from real Lean formalization projects. It leverages real Lean Blueprint formalization projects to construct challenging evaluation sets, comprising 613 theorems from 6 Lean projects.

Results

The best model achieves only a 10.3% pass rate on RLMEval, revealing a significant gap between performance on existing benchmarks and more realistic research-level settings.

Papers With

RLMEval: Evaluating Research-Level Neural Theorem Proving