Playbooks / Coding & Computer-Use Agents

Coding & Computer-Use Agents

Agents that read code, write code, run tools, and drive a computer — patterns, harnesses, and pitfalls.

  1. Coding Agent Architecture
    The localize-edit-verify loop that makes a coding agent more than a code generator: the agent-computer interface, why agentic beats pipeline coding, and where the loop fails.
  2. Repo Navigation & Code Context
    Code search vs. embeddings, symbol-level indexing, context budgeting over a large tree, and why confident wrong localization is the expensive failure of code retrieval.
  3. Patch Generation & Test-Driven Loops
    Structured diffs and hunk-apply failures, test-driven self-correction, regression guarding, and the three honest liars in the loop: flakes, overfit, and the deleted assertion.
  4. Computer-Use & GUI Agents
    Pixel vs. DOM grounding, the action space, the screenshot loop, and the multiplicative latency and reliability tax that makes GUI control a last resort.
  5. Browser agents
    Driving a real browser as a tool — DOM versus pixel observation, login + auth state, the well-trodden failure modes, and when to step up to a full GUI agent.
  6. IDE agents
    Coding agents that live in the editor — the loop is the same as a CLI coding agent, but the interaction surface, undo expectations, and trust threshold are all different.
  7. Sandboxing & Safe Execution
    Containerized execution, network and filesystem isolation, capability scoping, and designing for blast radius when an agent runs untrusted, attacker-influenced code.
  8. Evaluating Coding Agents
    The SWE-bench family, pass@k vs. resolve rate, harness sensitivity, documented contamination, and why a private post-cutoff eval set is the only number to trust.