← All papers
First page of LeanExplore: A search engine for Lean 4 declarations

LeanExplore: A search engine for Lean 4 declarations

Justin Asher

cs.SE Jun 4, 2025 · v1
LeanExplore is a semantic search engine indexing Lean 4 declarations across Batteries, Init, Lean, Mathlib, PhysLean, and Std.
The expanding Lean 4 ecosystem poses challenges for navigating its vast libraries. This paper introduces LeanExplore, a search engine for Lean 4 declarations. LeanExplore enables users to semantically search for statements, both formally and informally, across select Lean 4 packages (including Batteries, Init, Lean, Mathlib, PhysLean, and Std). This search capability is powered by a hybrid ranking strategy, integrating scores from a multi-source semantic embedding model (capturing conceptual meaning from formal Lean code, docstrings, AI-generated informal translations, and declaration titles), BM25+ for keyword-based lexical relevance, and a PageRank-based score reflecting declaration importance and interconnectedness. The search engine is accessible via a dedicated website (https://www.leanexplore.com/) and a Python API (https://github.com/justincasher/lean-explore). Furthermore, the database can be downloaded, allowing users to self-host the service. LeanExplore integrates easily with LLMs via the model context protocol (MCP), enabling users to chat with an AI assistant about Lean declarations or utilize the search engine for building theorem-proving agents. This work details LeanExplore's architecture, data processing, functionalities, and its potential to enhance Lean 4 workflows and AI-driven mathematical research

The expanding Lean 4 ecosystem (including Mathlib with tens of thousands of declarations) makes it difficult for users to find relevant definitions, lemmas, and theorems across libraries.

LeanExplore is a search engine for Lean 4 declarations that uses a hybrid ranking strategy integrating a multi-source semantic embedding model (capturing meaning from Lean code, docstrings, AI-generated informal translations, and declaration titles), BM25+ for keyword-based relevance, and PageRank-based scores reflecting declaration importance. The system supports Batteries, Init, Lean, Mathlib, PhysLean, and Std, and integrates with LLMs via the Model Context Protocol (MCP).

In a comparative study using 300 queries evaluated by an LLM judge, LeanExplore outperforms LeanSearch and Moogle on typical Mathlib search tasks. The system is available via a website, Python API, and downloadable database for self-hosting.