← All papers
First page of WybeCoder: Verified Imperative Code Generation

WybeCoder: Verified Imperative Code Generation

Fabian Gloeckle, Mantas Baksys, Darius Feher, Kunhao Zheng, Amaury Hayat, Sean B. Holden, Gabriel Synnaeve, Peter O'Hearn

cs.SE Mar 31, 2026 · v1
Builds on a framework combining VC generation, SMT solving, and interactive Lean proofs to co-evolve verified imperative code, invariants, and proofs.
Recent progress in large language models (LLMs) has substantially advanced automatic code generation and formal theorem proving, yet software verification has not seen comparable gains. To address this gap, we propose WybeCoder, an agentic code verification framework that enables prove-as-you-generate development, in which code, invariants, and proofs co-evolve. WybeCoder builds on a recent framework that combines automatic verification condition generation and SMT solving with interactive proofs in Lean. To enable systematic evaluation, we translate two benchmarks for functional verification in Lean, Verina and Clever, into equivalent imperative code specifications. On complex algorithms such as Heapsort, we observe consistent performance improvements as we scale our approach, synthesizing dozens of valid invariants and dispatching dozens of subgoals, ultimately producing hundreds of lines of verified code and overcoming plateaus reported in previous work. Our best system solves 74% of Verina tasks and 62% of Clever tasks at moderate compute budgets, substantially surpassing previous evaluations and paving the way for the automated construction of large-scale datasets of verified imperative code.

Automatic code generation and formal theorem proving have advanced significantly with LLMs, but software verification of imperative code has not seen comparable gains, leaving a gap between generated code and verified correctness.

WybeCoder is an agentic code verification framework enabling prove-as-you-generate development where code, invariants, and proofs co-evolve. It builds on a framework combining automatic verification condition generation and SMT solving with interactive proofs in Lean. The system translates functional verification benchmarks (Verina and Clever) into equivalent imperative code specifications for systematic evaluation.

The best WybeCoder system solves 74% of Verina tasks and 62% of Clever tasks at moderate compute budgets, synthesizing dozens of valid invariants and dispatching dozens of subgoals, producing hundreds of lines of verified code for complex algorithms such as Heapsort.