← All papers
First page of Lean Formalization of Generalization Error Bound by Rademacher Complexity and Dudley's Entropy Integral

Lean Formalization of Generalization Error Bound by Rademacher Complexity and Dudley's Entropy Integral

Sho Sonoda, Kazumi Kasaura, Yuma Mizuno, Kei Tsukamoto, Naoto Onda

cs.LG Mar 25, 2025 · v1
Formalizes generalization error bounds via Rademacher complexity and Dudley's entropy integral in Lean 4 on top of Mathlib.
Understanding and certifying the generalization performance of machine learning algorithms -- i.e. obtaining theoretical estimates of the test error from the training error -- is a central theme of statistical learning theory. Among the many complexity measures used to derive such guarantees, Rademacher complexity yields sharp, data-dependent bounds that apply well beyond classical VC-dimension theory. In this study, we formalize the generalization error bound by Rademacher complexity in Lean 4, building on measure-theoretic probability theory available in the Mathlib library. Our development provides a mechanically-checked pipeline from the definitions of empirical and expected Rademacher complexity, through a formal symmetrization argument and a bounded-differences analysis, to high-probability uniform deviation bounds via a formally proved McDiarmid inequality. A key technical contribution is a reusable mechanism for lifting results from countable hypothesis classes (where measurability of suprema is straightforward in Mathlib) to separable topological index sets via a reduction to a countable dense subset. As worked applications of the abstract theorem, we mechanize standard empirical Rademacher bounds for linear predictors under $\ell_2$ and $\ell_1$ regularizations, and we also formalize a Dudley-type entropy integral bound based on covering numbers and a chaining construction.

Generalization error bounds based on Rademacher complexity are central to statistical learning theory, but they lack machine-checked verification despite their theoretical importance.

The authors formalize the generalization error bound by Rademacher complexity in Lean 4, building on Mathlib's measure-theoretic probability theory. The development provides a mechanically-checked pipeline from empirical and expected Rademacher complexity definitions, through formal symmetrization and bounded-differences analysis, to high-probability uniform deviation bounds via a formally proved McDiarmid inequality. A key technical contribution is a reusable mechanism for lifting results from countable hypothesis classes to separable topological index sets via reduction to a countable dense subset.

The formalization covers the full pipeline from definitions to bounds, including empirical Rademacher bounds for linear predictors under L2 and L1 regularizations, and a Dudley-type entropy integral bound based on covering numbers and a chaining construction.