IDE agents — Playbooks

Playbook · Coding & Computer-Use Agents

IDE agents.

Coding agents that live in the editor — the loop is the same as a CLI coding agent, but the interaction surface, undo expectations, and trust threshold are all different. Cursor, Continue, Cline, the GitHub Copilot agent surface, the JetBrains AI Assistant: the inner loop is the localize-edit-verify pattern from coding-agent architecture, but every interaction is wrapped in a UI the user is watching in real time. That changes what counts as good behavior.

STEP 1

The core loop is the same; the surface is different.

An IDE agent is still a localize-edit-verify loop: it reads the repository, proposes a change, applies it, and ideally checks itself against tests, the type checker, or a linter. Everything in the coding-agent-architecture essay applies — the agent-computer interface is still the load-bearing surface, localization still decides outcomes, the verifier still earns its keep.

What is different is what surrounds the loop. The user is sitting in front of the editor, watching every cursor move, every diff hunk, every "applying…" indicator. The agent does not get to be a confident background process the way a CLI coding agent does — it has an audience.

STEP 2

Undo is the contract.

Every change an IDE agent applies must be Ctrl-Z-able. This is not negotiable. In the editor, undo is the user's safety primitive; an agent that bypasses it (by writing files directly through a process the editor does not track) breaks the user's basic mental model of "I can always back out." The result is that users stop trusting the agent — first they Ctrl-Z, then they uninstall.

Apply every change through the same editor API the user's own typing goes through. If you cannot, at minimum stage all changes as a single transactional undo group so one Ctrl-Z reverts the whole agent turn. Never write to disk in a way that bypasses the editor's tracked history — that path looks faster and trains users to fear your tool.

STEP 3

Three distinct interaction modes — pick one per feature, don't blend.

The IDE-agent space has converged on three modes, and most user confusion comes from mixing them:

Inline completion. Sub-second, character-level, low ceremony. The user is typing; the agent finishes the line or block. Acceptance is a Tab key. Latency budget: tens to a couple hundred milliseconds.
Diff suggestion. The agent proposes a multi-line or multi-file change as a reviewable diff. The user accepts, rejects, or edits before accepting. Latency budget: seconds.
In-place rewrite / agentic edit. The agent takes a goal ("refactor this to use the new API"), plans, edits, possibly runs a test, and presents a finished change. Latency budget: tens of seconds to minutes.

Pick the right mode per feature. Inline completion that occasionally pauses for 10 seconds to plan is a broken inline completion. An in-place rewrite that streams character-by-character invites the user to "correct" mid-flight and get a fight with the cursor. Each mode has a UX contract; blending them produces UIs that feel wrong without anyone being able to say why.

STEP 4

The state an IDE agent needs that a CLI agent does not.

A CLI coding agent operates on a checkpointed working tree. An IDE agent operates on a live, mutating workspace, and it needs more state to do the job correctly:

Cursor position and selection. "Refactor this" without knowing what "this" is reduces to "refactor something somewhere."
Open buffers, not just on-disk files. The user has unsaved changes in three other files; the agent must reason about the buffer contents, not the stale disk contents.
Undo history. "Undo what you just did" implies the agent knows what it did — the undo stack is part of the agent's working memory in an IDE.
Lint and type-check state. Live diagnostics already running in the editor are cheap signal; not feeding them to the agent throws away a free verifier.

An IDE agent that operates on disk files only — the way an early CLI coding agent does — will reliably step on the user's in-flight work and produce errors that have nothing to do with the model's capability.

STEP 5

Failure modes specific to the IDE surface.

Four shapes show up in production and are worth designing against:

Overwriting unsaved buffers. The agent reads disk, the user has unsaved changes in the buffer, the agent writes back through the editor — and the user's edits are silently lost. Always read through the editor, not through the filesystem, when a buffer exists.
Breaking the user's selection. An "in-place rewrite" that drops the user's selection mid-rewrite leaves them looking at code they cannot easily re-select to refine. Preserve or restore the selection when the rewrite finishes.
Cursor jumps on accept. An accepted suggestion that dumps the cursor to the end of the file (or to column 0 of a refactored hunk) breaks flow. Restore the cursor to a sensible position — usually wherever the user was about to type next.
Silent partial applies. The agent proposes a five-hunk diff; two hunks fail to apply (the file moved between propose and accept); the editor shows three green checks and no error. The user thinks the change landed. Hunk-apply failures must surface, not silently degrade.

The IDE is a high-trust surface — the user is watching, the action is in-line with their own work, and the cost of a regression shows up in their next pull-request review. Treat the interaction layer with at least as much care as the model and tool layer; in an editor, the surface is half the product.