SITA: A Framework for Structure-to-Instance Theorem Autoformalization
Chenyi Li, Wanli Ma, Zichen Wang, Zaiwen Wen
cs.AI
Nov 13, 2025 · v1
TL;DR
SITA autoformalizes instantiations of abstract math structures into Lean definitions and theorems using Lean's typeclass mechanism.
Abstract
While large language models (LLMs) have shown progress in mathematical reasoning, they still face challenges in formalizing theorems that arise from instantiating abstract structures in concrete settings. With the goal of auto-formalizing mathematical results at the research level, we develop a framework for structure-to-instance theorem autoformalization (SITA), which systematically bridges the gap between abstract mathematical theories and their concrete applications in Lean proof assistant. Formalized abstract structures are treated as modular templates that contain definitions, assumptions, operations, and theorems. These templates serve as reusable guides for the formalization of concrete instances. Given a specific instantiation, we generate corresponding Lean definitions and instance declarations, integrate them using Lean's typeclass mechanism, and construct verified theorems by checking structural assumptions. We incorporate LLM-based generation with feedback-guided refinement to ensure both automation and formal correctness. Experiments on a dataset of optimization problems demonstrate that SITA effectively formalizes diverse instances grounded in abstract structures.
Problem
LLMs face challenges formalizing theorems that arise from instantiating abstract mathematical structures in concrete settings. Bridging the gap between abstract theories and their concrete applications in Lean requires systematic approaches that current autoformalization methods do not address.
Approach
SITA (Structure-to-Instance Theorem Autoformalization) is a framework that systematically bridges abstract mathematical theories and concrete applications in Lean. The pipeline transforms natural language descriptions of concrete problems into verified Lean files by aligning underlying concepts with formalized abstract structures. It comprises three stages: skeleton construction, feedback-guided completion, and proof verification. The framework enables modular proof reuse by connecting concrete instances to their abstract parent structures.
Results
Experiments on diverse optimization problems (logistic regression, LASSO, gradient descent convergence) demonstrate that SITA correctly generates formal definitions, instance declarations, and verified proofs linking concrete cases to their abstract algorithmic frameworks.