The Network Structure of Mathlib
Lean 4's Mathlib has grown to a scale where its macroscopic structural complexity interweaves logical, mathematical, and infrastructural dependencies. No systematic network analysis of the full library's dependency structure has been performed.
The authors extract Mathlib's dependency structure into a multilayer graph comprising a declaration-level graph (308,129 nodes, 8.4M edges), a module import graph (7,563 modules), and namespace graphs. They introduce graph decompositions that separate explicit edges from compiler-synthesized or proof-driven ones. Analysis uses standard network tools (NetworkX, Louvain community detection, power-law fitting) applied to the Lean-specific structure. The dataset is publicly available on HuggingFace.
Three main findings emerge. Human-designed taxonomies (namespaces) diverge from logical dependency structures, with 50.9% of declaration edges crossing namespace boundaries. Developers use a median of only 1.6% of imported scope. Network centrality captures language infrastructure (typeclass machinery) rather than mathematically relevant declarations. Transitive redundancy in module imports is 17.5%, and 74.2% of declaration edges are compiler-synthesized.
| Metric | Module | Declaration | Namespace |
|---|---|---|---|
| Nodes | 7,563 | 308,129 | 10,097 |
| Edges | 23,570 | 8,436,366 | 332,081 |
| DAG depth | 153 | 84 | 8 |
| Cross-namespace edges | 37.1% | 50.9% | 85.8% |
| Synthesized edges | -- | 74.2% | -- |
| Import utilization (median) | -- | -- | 1.6% |
