← All papers
First page of LLM2SMT: Building an SMT Solver with Zero Human-Written Code

LLM2SMT: Building an SMT Solver with Zero Human-Written Code

Mikoláš Janota, Mirek Olšák

cs.LO Mar 6, 2026 · v1
An LLM-built DPLL(T)-style SMT solver for QF_UF emits Lean proofs for unsatisfiable instances, certified with Lean's grind tactic.
Whether LLMs can reason or write software is widely debated, but whether they can write software that itself reasons is largely unexplored. We present a case study in which an LLM coding agent builds a complete DPLL(T)-style SMT solver for QF_UF with zero human-written code. The solver implements the Nieuwenhuis-Oliveras congruence closure algorithm, includes preprocessing, and emits Lean proofs for unsatisfiable instances. We describe the development process and key challenges, and show that the resulting solver is competitive on SMT-LIB benchmarks.

Whether LLMs can build software that itself performs automated reasoning (as opposed to merely writing code or generating proofs) is largely unexplored.

An LLM coding agent (Claude Code with Sonnet 4.6) builds a complete DPLL(T)-style SMT solver for QF_UF with zero human-written code. The solver implements the Nieuwenhuis-Oliveras congruence closure algorithm, includes preprocessing and theory propagation, and emits Lean proofs for unsatisfiable instances.

Figure 1 : Comparison with and without theory propagation.

The solver (llm2smt) solves 7,468 of 7,500 QF_UF benchmarks from SMT-LIB, competitive with z3 (7,500) and cvc5 (7,494). It certifies 285 unsatisfiable instances with Lean proofs. No human-written code was involved.

SolverSolvedWins
z37,5007,491
cvc57,4947,308
llm2smt7,4686,486
llm2smt-no-prepro7,3556,150
llm2smt-no-th-prop7,4786,593
SMT-LIB QF_UF benchmark results