Combining Textual and Structural Information for Premise Selection in Lean

Job Petrovčič, David Eliecer Narvaez Denis, Ljupčo Todorovski

cs.LG Oct 24, 2025 · v1

theorem-proving-ml premise-selection benchmark

Read PDF arXiv abstract Code

TL;DR

Combines text embeddings of Lean formalizations with a GNN over a dependency graph for premise selection on the LeanDojo benchmark.

Abstract

Premise selection is a key bottleneck for scaling theorem proving in large formal libraries. Yet existing language-based methods often treat premises in isolation, ignoring the web of dependencies that connects them. We present a graph-augmented approach that combines dense text embeddings of Lean formalizations with graph neural networks over a heterogeneous dependency graph capturing both state-premise and premise-premise relations. On the LeanDojo Benchmark, our method outperforms the ReProver language-based baseline by over 25\% across standard retrieval metrics. These results suggest that relational information is beneficial for premise selection.

Problem

Premise selection in large formal libraries like Lean's Mathlib is a key bottleneck, and existing language-based methods treat premises in isolation, ignoring the dependency structure connecting them.

Approach

The authors present a graph-augmented approach that combines dense text embeddings of Lean formalizations (using ByT5) with graph neural networks over a heterogeneous dependency graph capturing state-premise and premise-premise relations. The GNN propagates structural information to refine text embeddings before scoring premise relevance.

Results

On the LeanDojo Benchmark, the method outperforms the ReProver language-based baseline by over 25% across standard retrieval metrics (Recall@1, Recall@10, MRR). Ablation suggests the improvement may partly stem from different training paradigm choices rather than purely from graph structure utilization.

Papers With

Combining Textual and Structural Information for Premise Selection in Lean