Kill switches: the button that stops a running agent fleet.
Every team running agents in production will, eventually, want a button that stops the fleet. The hard parts are not the button — they are what the button must actually stop (in-flight calls, queued work, scheduled retries, fan-out children), how that stop propagates faster than work fans in, and the quarterly drill that proves it works before the day you need it.
What "stop" must actually halt.
Pressing a kill switch sounds like one action; for an agent fleet it is four, and any one of them being skipped means the button is theater. The minimum surface a real kill switch covers:
- In-flight LLM and tool calls — cancel them, do not wait for them to return. A 60-second tool call after the switch flipped is 60 seconds of bleeding you signed off on.
- Queued tasks — decide drain-vs-discard up front. Drain means the queue still runs to completion; discard means you accept the partial-task tax in exchange for an actual stop. Most safety-driven incidents want discard, with the journal sufficient to resume.
- Scheduled retries and timers — every
at(future_time)that was set before the flip must be cancelled, not "expire harmlessly later." The retry chain from idempotency-and-retries is the part most likely to fire ten minutes after you thought you stopped. - Webhooks and side effects already in flight — for actions the agent already dispatched to downstream systems, the kill switch cannot recall them; what it can do is record exactly what fired (so a compensating-action sweep is possible) and prevent any further dispatch.
The bar: after the switch is flipped, no new effect leaves the system. The journal from durable-state-and-resumability tells you exactly which effects had already left.
Four scopes, four buttons.
One global off-switch is too blunt for daily use and too slow to reach for in a real incident. The graded ladder pairs with the one in incident-response-for-agents — each level needs its own switch, evaluated in the loop before any side effect:
- Capability — freeze a single tool (the dangerous write tool, say) while reads keep flowing; diagnosis continues, damage stops.
- Run — halt a single
run_idon its next loop tick; the cheapest surgical stop. - Tenant or feature — halt every run for one tenant or one feature flag; the right level when one cohort is misbehaving and the rest are healthy.
- Global — stop the world. The button you almost never want to need, that must work the one time you do.
Each level is its own button because the on-call's job at 3am is to pick the smallest sufficient hammer, not to debate what to escalate. The flags powering these scopes are themselves a use case for feature-flags-for-agents: per-request, fail-closed evaluation, observable flip log.
Propagation: the stop must outrun the fan-in.
A distributed agent fleet has workers across processes, machines, and regions, all reading the kill state from somewhere. The hazard is that the stop signal arrives at one worker while another has already started its next tool call; the longer the propagation tail, the more "post-flip" effects you eat. The two properties that keep propagation honest:
- Evaluate the flag in the loop, before every effect — not at run start, not at process start. A worker that cached the kill state at process boot is a worker that ignores the switch you just flipped.
- Fail closed on the flag store — if the flag service is unreachable, the worker treats the switch as on, not off. The alternative is a kill switch that becomes useless during exactly the kind of outage that makes you want to reach for it.
"Just redeploy" or "kill the pods" is not a kill switch for a durable agent — a resumable run comes back on the next worker and continues exactly the harmful loop. The stop must be a state-aware halt that the resume path also respects, or you have built a runaway that survives its own kill.
The tabletop drill: an untested kill switch does not exist.
The day you need the switch is not the day to discover that the flag store has been unreachable for a month, or that the worker pool deployed last quarter is reading from a different config namespace, or that the discard-on-stop path leaves orphaned rows in the side-effect ledger. Run the drill on the calendar, not on the incident:
- Quarterly, in staging on production-shape traffic, pull the lever at each of the four scope levels.
- Measure: time-to-stop per worker, count of effects emitted after the flip, queue state at T+0 and T+60s, journal consistency after recovery.
- Capture the run as a permanent regression test — same scenario, replayable, in CI — so a refactor that breaks propagation fails the gate before it ships.
After the switch: what state is the system in, and how do you resume.
The kill switch is a halt, not a teardown. Recovery is the second half of the discipline and is far easier when you decided what "after" looks like before you pressed the button:
- The journal is intact — every effect that landed is recorded with its idempotency key; every effect that did not is recoverable from the run's plan. A clean stop produces a clean diff between "intended" and "landed."
- Compensating actions are explicit, not heuristic — refunds, retractions, customer notifications come off a list, not from on-call intuition. The side-effect ledger is the list.
- Resume requires a green release — if the cause was a bad rollout-and-versioning triple, roll it back before unhalting. Resuming on the same broken contract reproduces the incident on cue.
- One reason is enough — a guardrail-metric regression or a write-rate anomaly is a flip, not a debate. Investigate from the safe state.
The kill switch you build before you need it, drill on the calendar, and document the recovery path for is the one that buys you a postmortem instead of a press release. Build it before the day you cannot.