Playbooks / Coding & Computer-Use Agents

Coding & Computer-Use Agents

Agents that read code, write code, run tools, and drive a computer — patterns, harnesses, and pitfalls.

Coding Agent Architecture

The localize-edit-verify loop that makes a coding agent more than a code generator: the agent-computer interface, why agentic beats pipeline coding, and where the loop fails.
Repo Navigation & Code Context

Code search vs. embeddings, symbol-level indexing, context budgeting over a large tree, and why confident wrong localization is the expensive failure of code retrieval.
Patch Generation & Test-Driven Loops

Structured diffs and hunk-apply failures, test-driven self-correction, regression guarding, and the three honest liars in the loop: flakes, overfit, and the deleted assertion.
Computer-Use & GUI Agents

Pixel vs. DOM grounding, the action space, the screenshot loop, and the multiplicative latency and reliability tax that makes GUI control a last resort.
Browser agents

Driving a real browser as a tool — DOM versus pixel observation, login + auth state, the well-trodden failure modes, and when to step up to a full GUI agent.
IDE agents

Coding agents that live in the editor — the loop is the same as a CLI coding agent, but the interaction surface, undo expectations, and trust threshold are all different.
Sandboxing & Safe Execution

Containerized execution, network and filesystem isolation, capability scoping, and designing for blast radius when an agent runs untrusted, attacker-influenced code.
Evaluating Coding Agents

The SWE-bench family, pass@k vs. resolve rate, harness sensitivity, documented contamination, and why a private post-cutoff eval set is the only number to trust.