Coherent Rollout Oracles for Finite-Horizon Sequential Decision Problems
Nishant Shukla
quant-ph
Apr 28, 2026 · v1
TL;DR
Machine-checks the main reversible-circuit results in Lean 4.
Abstract
Coherent quantum rollout for sequential decision problems requires a unitary simulator: randomness must live in explicit quantum registers, and basis-state selectors must be mapped to actions reversibly. With branch-dependent valid actions, this mapping is totalized coherent rank-select over an entangled $N$-bit validity mask: return the position of the $r$-th valid bit, or a sentinel if $r$ is out of range. We give the first reversible-circuit complexity analysis of this primitive. For selector width $w = \lceil \log_2(N+1) \rceil$, rank-select admits an $O(Nw)$-gate low-ancilla bounded-span scan, proved gate-optimal in its model, and an $O(N\log w)$-gate low-ancilla blocked construction when long-range gates are available; across all bounded-fan-in layouts, the unconditional gate lower bound is $Ω(N)$. Composing rank-select with reversible transition and predicate-evaluation circuits gives an explicit polynomial-size coherent rollout oracle for finite-horizon planning problems satisfying these primitive assumptions. The resulting oracle satisfies the access model of the best-arm pipeline of Wang et al., yielding $\widetilde{O}(\sqrt{k}/\varepsilon)$ coherent oracle calls against the standard classical $Ω(k/\varepsilon^2)$ arm-pull lower bound. We give a bounded-influence lifting theorem that extends this lower-bound construction from a base configuration to an exponential family of configurations. We instantiate the construction on SIR epidemic intervention, with a stochastic placement-game sanity check, and machine-check the main results in Lean 4. Code and proofs:
https://github.com/BinRoot/b01t/tree/main/demos/rollout.
Problem
Coherent quantum rollout for sequential decision problems requires a unitary simulator implementing reversible rank-select over an entangled validity mask. No prior reversible-circuit complexity analysis of this primitive existed.
Approach
The authors provide the first reversible-circuit complexity analysis of coherent rank-select. They give an O(Nw)-gate low-ancilla bounded-span scan (proved gate-optimal in its model) and an O(N log w)-gate blocked construction. Composing rank-select with reversible transition circuits yields a polynomial-size coherent rollout oracle for finite-horizon planning. Main results are machine-checked in Lean 4.
Results
The resulting oracle satisfies the access model of the best-arm pipeline of Wang et al., yielding O-tilde(sqrt(k)/epsilon) coherent oracle calls versus the classical Omega(k/epsilon^2) lower bound. A bounded-influence lifting theorem extends the lower bound to an exponential family of configurations. The construction is instantiated on SIR epidemic intervention with a stochastic placement-game sanity check.