LeanTutor: Towards a Verified AI Mathematical Proof Tutor
Manooshree Patel, Rayna Bhattacharyya, Thomas Lu, Arnav Mehta, Niels Voss, Narges Norouzi, Gireeja Ranade
cs.LG
Jan 24, 2026 · v1
TL;DR
An autoformalizer/proof-checker module uses Lean to verify student proof steps, paired with the PeanoBench Lean/NL proof dataset.
Abstract
This paper considers the development of an AI-based provably-correct mathematical proof tutor. While Large Language Models (LLMs) allow seamless communication in natural language, they are error prone. Theorem provers such as Lean allow for provable-correctness, but these are hard for students to learn. We present a proof-of-concept system (LeanTutor) by combining the complementary strengths of LLMs and theorem provers. LeanTutor is composed of three modules: (i) an autoformalizer/proof-checker, (ii) a next-step generator, and (iii) a natural language feedback generator. To evaluate the system, we introduce PeanoBench, a dataset of 371 Peano Arithmetic proofs in human-written natural language and formal language, derived from the Natural Numbers Game.
Problem
LLMs allow natural language communication but are error-prone in mathematical reasoning, while theorem provers like Lean provide provable correctness but are hard for students to learn. There is no system combining both for mathematical tutoring.
Approach
LeanTutor combines the complementary strengths of LLMs and theorem provers in three modules: an autoformalizer/proof-checker that translates student input to Lean and verifies it, a next-step generator that suggests proof continuations, and a natural language feedback generator. To evaluate the system, the authors introduce PeanoBench, a dataset of 371 Peano Arithmetic proofs in both natural and formal language, derived from the Natural Numbers Game.
Results
The paper presents a proof-of-concept system demonstrating that combining LLMs with Lean 4 verification can provide provably correct mathematical tutoring, evaluated on 371 proofs in PeanoBench.