Using Aristotle API for AI-Assisted Theorem Proving in Lean 4: A Formalisation Case Study of the Grasshopper Problem
Gabriel Rongyang Lau
cs.AI
May 19, 2026 · v1
TL;DR
Case study formalizing an Aristotle API proof attempt for the IMO Grasshopper problem in Lean 4.
Abstract
AI-assisted theorem proving can now generate substantial Lean developments for olympiad-level mathematics, but the evidential status of such developments depends on which declarations are actually verified. This paper reports a Lean 4 formalization case study of an Aristotle API proof attempt for the Grasshopper problem, originally posed as IMO 2009 Problem 6. The generated artifact states a generalized Lean version of the theorem, contains four verified helper lemmas for local components of a maximality and adjacent-swap exchange strategy, and leaves the main theorem grasshopper closed directly by one unresolved sorry. The verified components establish that the final partial sum equals the total sum, that an adjacent transposition can affect only the relevant intermediate partial sum, that the changed partial sum has the expected form, and that maximality at a position admitting an adjacent successor swap forces a corresponding forbidden-set membership fact. The Aristotle output summary identifies the intended remaining mathematical step as the global counting step needed to show that these membership facts produce at least n distinct forbidden values, contradicting the cardinality assumption |M| < n; the Lean source itself does not reduce the main theorem to a separately encoded counting lemma. This case study gives an inspectable example of a central limitation in AI-assisted formalization, namely that local proof search can succeed while the global combinatorial bookkeeping required for a theorem remains unresolved. The paper contributes a reproducible Lean artifact and a precise analysis of its verified and unverified proof content.
Problem
AI-assisted theorem proving can generate Lean developments for olympiad-level problems, but it is difficult to distinguish verified content from unverified placeholders. The Grasshopper problem (IMO 2009 Problem 6) is a challenging combinatorial theorem that resists automated formalization.
Approach
The authors report a Lean 4 formalization attempt using the Aristotle API. The generated artifact states a generalized version of the Grasshopper theorem and contains four verified helper lemmas implementing components of a maximality and adjacent-swap exchange strategy. The verified lemmas establish that the final partial sum equals the total sum, that adjacent transpositions affect only one intermediate partial sum, that the changed sum has the expected form, and that maximality forces forbidden-set membership. The main theorem remains closed by a single sorry.
Results
Four helper lemmas (PS_last, PS_swap, PS_swap_eq, maximizer_swap_in_M) are fully verified by the Lean kernel. The main theorem grasshopper is incomplete, closed by one unresolved sorry. The unresolved gap is the global counting argument showing that the membership facts produce at least n distinct forbidden values, contradicting |M| < n.
| Lean component | Mathematical role | Status |
|---|
| PS_last | Final partial sum equals total sum | Verified |
| PS_swap | Adjacent transposition affects only one partial sum | Verified |
| PS_swap_eq | Computes changed partial sum after swap | Verified |
| maximizer_swap_in_M | Maximality implies forbidden-set membership | Verified |
| grasshopper | Main theorem | Incomplete (one sorry) |
Verification status of the Aristotle-generated Lean development