← All papers
First page of The Network Structure of Mathlib

The Network Structure of Mathlib

Xinze Li, Nanyun Peng, Simone Severini, Patrick Shafto

cs.LO Apr 26, 2026 · v1
Network analysis of Lean 4's Mathlib as a multilayer dependency graph (308k declarations).
The ongoing development of Lean 4's Mathlib has produced a macroscopic structural complexity that interweaves logical, mathematical, and infrastructural dependencies. We present a network analysis of this library, extracting its dependency structure into a multilayer graph of 308,129 declarations, 8.4 million edges, and 7,563 modules. By introducing graph decompositions that isolate explicit edges from those synthesized by the compiler or driven by proofs, we quantify the structural properties of formalized mathematics. Our analysis reveals three findings. First, taxonomies designed by humans diverge from logical structures, exhibiting a 50.9% coupling across namespaces. Second, developers utilize a median of 1.6% of the imported scope. Third, formalization compresses semantic hierarchies, with network centrality capturing language infrastructure rather than mathematical relevance.

Lean 4's Mathlib has grown to a scale where its macroscopic structural complexity interweaves logical, mathematical, and infrastructural dependencies. No systematic network analysis of the full library's dependency structure has been performed.

The authors extract Mathlib's dependency structure into a multilayer graph comprising a declaration-level graph (308,129 nodes, 8.4M edges), a module import graph (7,563 modules), and namespace graphs. They introduce graph decompositions that separate explicit edges from compiler-synthesized or proof-driven ones. Analysis uses standard network tools (NetworkX, Louvain community detection, power-law fitting) applied to the Lean-specific structure. The dataset is publicly available on HuggingFace.

Three main findings emerge. Human-designed taxonomies (namespaces) diverge from logical dependency structures, with 50.9% of declaration edges crossing namespace boundaries. Developers use a median of only 1.6% of imported scope. Network centrality captures language infrastructure (typeclass machinery) rather than mathematically relevant declarations. Transitive redundancy in module imports is 17.5%, and 74.2% of declaration edges are compiler-synthesized.

MetricModuleDeclarationNamespace
Nodes7,563308,12910,097
Edges23,5708,436,366332,081
DAG depth153848
Cross-namespace edges37.1%50.9%85.8%
Synthesized edges--74.2%--
Import utilization (median)----1.6%
Cross-level summary of structural findings