Alignment Contracts for Agentic Security Systems
Isaac David, Marco Guarnieri, Arthur Gervais
cs.CR
Apr 30, 2026 · v1
TL;DR
A Lean 4 artifact checks the formal core theorems behind alignment contracts for agentic security systems.
Abstract
Agentic security systems increasingly combine LLM planners with tools that can discover, validate, and report vulnerabilities. This creates an asymmetric control problem: the system should retain strong offensive capability inside an authorized engagement, while the same capabilities must be denied outside scope. Existing guardrails provide useful policy controls, but they do not make this boundary a first-class formal contract over observable effects. We introduce alignment contracts, a framework for specifying and enforcing behavioral constraints over observable effect traces. A contract defines scope, allowed and forbidden effects, resource budgets, and disclosure policies. We give the language finite-trace semantics, characterize satisfaction as a safety property with finite violation witnesses, develop refinement and one-way composition rules for modular contract engineering, and show that admissibility checking is decidable. We instantiate the framework for web-focused agentic security workflows and show how the same structure extends to other effect profiles. Under an explicit Effect Observability Assumption, where all $\SigmaEff$-effects are mediated, the soundness theorem quantifies over the agent model and gives guarantees for mediated $\SigmaEff$-effects, including enforcement soundness for monitor-realized traces. We also state an assumption-lifted adaptation result and formalize limits through undecidability transfer and observability-boundary theorems. A Lean 4 artifact checks the formal core theorems used by the paper.
Problem
Agentic security systems combine LLM planners with offensive tools that can discover and exploit vulnerabilities. The same capabilities must be permitted inside an authorized engagement but denied outside scope, creating an asymmetric control problem that existing guardrails do not formalize as a first-class contract over observable effects.
Approach
The paper introduces alignment contracts, a framework for specifying and enforcing behavioral constraints over observable effect traces. A contract defines allowed/forbidden effect signatures, budget predicates, data-flow declarations, and a resolution function. Enforcement uses a reference monitor that mediates all tool calls. The framework is partially formalized in Lean 4, with two trust models (full and partial visibility) and proofs of integrity and disclosure properties under the event model.
Results
The Lean formalization covers the core contract type, the reference-monitor enforcement mechanism, and integrity/disclosure theorems under the full-visibility trust model. Under partial visibility, integrity is still provable but disclosure requires declared flows. Payload semantics and timing channels remain outside the event model and are handled only as mitigations at the agentic layer.