"Make the failing test pass." Hand that prompt to four coding agents pointed at the same repo, as of late May 2026, and you get four different theories of what should happen next: Claude Code spins up a task-list plan, Codex CLI starts a sandboxed run, Cursor drafts a multi-file diff and waits for your review, Aider goes architect→editor→one commit. The feature lists barely diverge. The traces look like four different species — because the architecture is the product, not the tool catalog.
At a glance
Four agents, four answers to the same question: where does the trust boundary sit, and who decides the next step? The table below sets the basics; the matrix that follows shows where each one leans hardest across the axes that actually differ.
| Project | Released | Primary niche | Deployment shape |
|---|---|---|---|
| Claude Code | 2025 | Terminal harness with a typed tool catalog | CLI on your machine, OS-sandboxed |
| Codex CLI | 2025 | Sandboxed, shell-first coding agent | CLI, workspace-scoped sandbox on by default |
| Cursor Agent | 2023 (Agent mode later) | IDE-mediated agent with reviewable diffs | Editor app; agents run local / worktree / SSH / cloud |
| Aider | 2023 | Git-first pair-programmer in the terminal | CLI in a local git repo; no sandbox |
Snapshot: 2026-05-28. Behavior observed the preceding week; verify in your own environment.
Claude Code — deep dive
The harness model
Claude Code is a terminal harness, not an IDE plugin or a hosted service. The model drives a fixed, typed tool catalog — Read, Edit, Bash, Glob, Grep — where each tool has a declared schema instead of being a free-text shell string. That typing is deliberate: a structured Edit that names a file and an exact old/new string is checkable and reversible in a way that sed -i piped through a shell is not. The harness still has a shell (the Bash tool), but the shell is one tool among several rather than the whole interface, which is the architectural choice that distinguishes it from Codex CLI.
The trust boundary is the operating system. Claude Code runs your machine inside an OS sandbox — bubblewrap on Linux, seatbelt on macOS — gated by permission modes that scale from cautious to autonomous: default prompts on writes and commands, acceptEdits lets file edits through but still asks before commands, plan forbids mutation entirely, dontAsk stops the prompts, and bypassPermissions removes the sandbox guardrails outright. You choose how much rope per session rather than accepting one fixed posture.
Deferred tools, skills, and MCP
The catalog is small on purpose, but it is not the ceiling. Beyond the core tools, Claude Code exposes deferred tools — capabilities whose full schemas are fetched on demand via a search step rather than loaded into context up front — plus reusable Skills and a full MCP client for third-party servers. That structure is a direct application of sound tool-design principles: a typed, minimal default surface keeps the model's choices legible, while deferral keeps a large catalog from drowning the context window. Typed schemas also make tool-error recovery tractable — a malformed call fails against a schema with a specific message the model can act on, rather than returning an opaque shell error.
Models are Anthropic-first. You can point Claude Code at a local or third-party model, but only through a LiteLLM-style proxy — it is possible, not first-class — which matters if model portability is a hard requirement for your team.
The task-list loop
The loop's defining feature is an explicit, persisted task list. Faced with a multi-step job, Claude Code writes out the steps as durable state, marks them in progress and done as it works, and reads that list back as the source of truth for what to do next. This is the classic agent loop with its plan externalized: instead of holding the plan implicitly in a chain of ReAct-style reasoning turns, the harness commits it to a structure that survives a long run and stays auditable. The payoff shows on large jobs where an implicit plan would drift; the cost is ceremony on a one-line fix. Nothing commits to git unless you ask — the task list is the work record, not the version history.
Codex CLI — deep dive
The workspace sandbox and approval modes
Codex CLI's trust boundary is a workspace-scoped OS sandbox that is on by default — the agent can read, edit, and run inside the working directory, but stepping outside it or reaching the network requires explicit consent. Three approval modes set the posture. Auto, the default, allows read/edit/run inside the workspace and asks before anything outside it or networked. Read-only is consultative: the agent plans and proposes but does not mutate, which is the mode you use to interrogate a codebase before letting it touch anything. Full Access drops the prompts entirely, including network, for trusted batch runs. The escalation is the safety story — the sandbox is the floor, and you raise the ceiling deliberately.
Shell-first tool surface
Where Claude Code leads with typed tools, Codex CLI leads with the shell. The agent's primary surface is running commands, and file edits, builds, and tests all flow through that single channel. The advantage is zero impedance with the existing developer toolchain — anything you would type at a prompt, the agent can run — and the constraint is that error recovery leans on parsing command output rather than reading a typed failure. Codex CLI also speaks MCP, with the standing rule that destructive tools always require approval no matter the mode, and it benefits from structured tool I/O where a server provides it. Local models are a first-class option here: set model_provider to "oss" and run against Ollama.
PR-as-output
Codex CLI edits the working directory and is oriented toward producing a pull request. The natural end state of a run is a reviewable branch of changes rather than a chat transcript — the agent does the work in the sandbox, and the deliverable is the diff you ship. That orientation pairs with the approval modes: you let it run in Auto, then review the resulting PR the way you would a colleague's, which keeps a human gate at the point where the work leaves the sandbox.
Cursor Agent — deep dive
IDE-mediated FS and accept-per-file
Cursor's trust boundary is the editor itself. In Agent mode the model does not write to disk directly; its edits stage as reviewable diffs that you accept file by file before they are committed to the filesystem. The IDE is the sandbox — not an OS-level jail like Codex CLI's, but a human-in-the-loop gate that makes "the agent edited fourteen files" a reviewable event rather than a fait accompli. For developers who want to keep their hand on the wheel, the diff queue is the whole point.
The plan-then-diff loop
Agent mode acts — it plans a change, edits across many files, and runs commands — which is what separates it from plain chat, which only answers. The loop is a textbook plan-and-execute shape: the agent forms a plan, materializes it as a concrete multi-file diff, and then hands that diff back for human acceptance before it lands. The plan is real work, but the diff is the artifact, and the accept step is where your judgment enters the loop. Cursor supports MCP, exposing roughly the first 40 tools to a given agent, and runs local models via an OpenAI-compatible base URL such as Ollama.
Multi-file edits and the Agents Window
Cursor 3 (April 2026) reframed the editor as an "agent execution runtime." The Agents Window runs multiple agents in parallel across distinct targets — your local checkout, separate git worktrees, an SSH host, or the cloud — so one agent refactors in a worktree while another investigates on a remote box, each producing its own reviewable diff stream. The IDE stops being where one human types and becomes the console where several agents work concurrently and report back through the same accept-per-file gate.
Aider — deep dive
Git as the protocol (auto-commit per edit)
Aider has no sandbox. It runs directly in your local git repository, and git history is the safety net — every successful edit is auto-committed as its own granular commit with a generated message. There is no diff queue and no permission prompt; if a change is wrong, you reach for git revert or git reset rather than an accept/reject gate. The trust model is "the repo is checkpointed after every step, so undo is always one git command away." That makes Aider exceptionally fast to iterate with and exceptionally dependent on you working in a clean git tree.
The architect/editor pair
Aider's loop splits reasoning from editing across two models. An architect model reasons about the change and describes what to do; a cheaper, faster editor model turns that description into concrete diffs. The two-pass design measurably cuts multi-file edit errors — the strong model is not also burdened with emitting byte-exact patch syntax, and the cheap model is not asked to reason about architecture. You can run both roles on any LLM, including fully local models via Ollama or LM Studio, which makes Aider the most model-portable agent in this lineup.
Edit-blocks and the repo-map
Aider's surface is not a tool catalog. It builds a repo-map — a compact summary of the repository's structure and key symbols — to give the model orientation without loading every file, and the editor model emits changes in a SEARCH/REPLACE edit-block format that Aider applies directly to disk. This is why MCP is not native to Aider: its world model is repo-map plus diffs, not a tool-catalog host. You can wire external capability around it, but Aider itself is not an MCP client the way the other three are, and that is an honest architectural difference rather than a missing feature.
Cross-cutting comparison
Sandbox & filesystem trust
The four sit at four distinct points on the trust spectrum. Claude Code and Codex CLI both put the boundary at the operating system — Claude Code with a sandbox plus tiered permission modes you dial per session, Codex CLI with a workspace-scoped sandbox that is on by default and escalated through three approval modes. Cursor moves the boundary up into the application layer: the IDE mediates every write as a per-file diff you accept, so the gate is human review rather than a kernel jail. Aider removes the boundary entirely and substitutes git — no sandbox, no prompt, just an auto-commit after each step so undo is always a revert away. The practical reading: Codex CLI is safest by default for untrusted runs, Cursor is safest for "I want to see every change," Claude Code is the most tunable, and Aider trades isolation for raw speed in a repo you trust.
Planning loop shape
Each agent externalizes a different part of the loop. Claude Code externalizes the plan: a persisted task list it reads back as the source of truth, which keeps long jobs coherent at the cost of ceremony on short ones. Codex CLI externalizes the run: it works autonomously inside the sandbox toward a PR, so the loop is "execute, then review the branch" rather than "plan, then approve steps." Cursor externalizes the review: the plan-then-diff shape makes the human acceptance of a concrete multi-file diff a first-class loop step. Aider externalizes the roles: the architect/editor split puts reasoning and patch-emission in separate models, a two-pass that cuts multi-file errors. None of these is merely a ReAct variant with cosmetic differences — the place each one chooses to make state explicit is the place it expects the hard problems to live.
Tool catalog vs the shell
The tool surface is where "capability" and "architecture" come apart most clearly. Claude Code leads with a typed catalog — structured Read/Edit/Grep plus deferred tools, Skills, and full MCP — so the model's actions are schema-checked. Codex CLI leads with the shell: one command channel for everything, plus MCP, trading typed safety for toolchain reach. Cursor wraps tools in the IDE and exposes MCP up to roughly the first 40 tools per agent, mediating through the editor. Aider declines a tool catalog altogether — its surface is the repo-map plus SEARCH/REPLACE edit-blocks, and MCP is not native because its world model is diffs, not a tool host. All four can edit a repo competently, which is exactly the point: the catalog is not the capability. For teams standardizing on shared MCP servers and interoperable agents, three of the four are MCP hosts and Aider is the odd one out, and each handles the resulting context budget differently — deferred tools, shell output, a 40-tool cap, or a compact repo-map.
Commit & output policy
| Agent | What lands | When |
|---|---|---|
| Claude Code | Commits / PRs only on request | When you ask — never automatically |
| Codex CLI | Working-dir edits, oriented toward a PR | After a sandboxed run, for review |
| Cursor Agent | Per-file diffs to the working tree | On your accept, file by file |
| Aider | One git commit per successful edit | Automatically, immediately |
When to pick which
| Use case | Pick Claude Code if… | Pick Codex CLI if… | Pick Cursor if… | Pick Aider if… |
|---|---|---|---|---|
| Greenfield feature, many steps | You want a persisted task-list plan that keeps a long multi-step build coherent and auditable. | You want it to run autonomously in a sandbox and hand you a reviewable PR. | You want to watch the plan materialize as diffs and accept changes file by file. | You want fast iteration with each step auto-committed, so undo is one git command away. |
| Large monorepo | You want typed Read/Grep/Glob plus deferred tools to navigate breadth without flooding context. | You want shell-native search and builds against the existing toolchain inside a sandbox. | You want IDE indexing and parallel agents across worktrees via the Agents Window. | You accept that a repo-map gives orientation but very large trees stress it more than the others. |
| Fully local / offline | You can stand up a LiteLLM-style proxy — local models are possible but not first-class. | You want local models first-class via model_provider="oss" and Ollama. |
You point it at a local model through an OpenAI-compatible base URL such as Ollama. | You want the most model-portable option — any LLM, including local via Ollama or LM Studio. |
| Heavy review discipline / regulated | You want tiered permission modes and a plan mode that forbids mutation while you investigate. | You want a default-on sandbox where destructive and networked actions always prompt. | You want every write to pass a human accept-per-file gate before it touches disk. | Less ideal — auto-commit lands changes without a review gate, though git history records each one. |
| Quick one-file fix | Workable, but the task-list ceremony is overhead for a trivial change. | Workable, though spinning up the sandboxed run is heavier than the task warrants. | Great — edit, see the single diff, accept, done. | Great — describe the fix, get one commit, move on. |
FAQ
What's the actual difference between Claude Code and Codex CLI?
Both are terminal coding agents that edit your repo under an OS sandbox, but they make opposite tool-surface bets: Claude Code leads with a small typed tool catalog (Read/Edit/Bash/Glob/Grep) plus deferred tools, Skills, and MCP, and externalizes its plan as a persisted task list, while Codex CLI leads with the shell, escalates through three approval modes, and orients toward producing a PR. Claude Code is Anthropic-model-first; Codex CLI treats local models as first-class via Ollama.
Does Cursor's Agent mode replace its chat?
No — they do different jobs. Chat answers questions; Agent mode acts, planning a change, editing across many files, and running commands, then staging the result as diffs you accept per file. You use chat to understand code and Agent mode to change it.
Can Aider use MCP servers?
Not natively. Aider's world model is a repo-map plus SEARCH/REPLACE edit-blocks rather than a tool-catalog host, so it is not an MCP client the way Claude Code, Codex CLI, and Cursor are. You can wire external capability around it, but MCP is not a built-in feature.
Which coding agent is best for a large monorepo?
It depends on how you want to navigate breadth: Claude Code's typed Grep/Glob plus deferred tools and Cursor's IDE indexing with parallel worktree agents both handle large trees well, and Codex CLI leans on shell-native search inside its sandbox. Aider's repo-map gives orientation but is the most stressed by very large repositories, so it is the weakest fit at extreme scale.
Do these agents send my code to the cloud?
By default yes — all four call a hosted model API, so prompts and the code context they include leave your machine unless you configure a local model. The sandbox and IDE boundaries control what the agent can do to your filesystem, not whether the model itself runs in the cloud. Running fully local (see the next question) is how you keep code on-device.
Can I run any of these against a local model?
Yes, with different effort. Codex CLI (model_provider="oss" + Ollama), Aider (any LLM, including local via Ollama or LM Studio), and Cursor (an OpenAI-compatible base URL such as Ollama) all support local models directly. Claude Code can too, but only through a LiteLLM-style proxy — possible rather than first-class.
Further reading
On this wiki:
- What Is an Agent? — the perceive-decide-act definition that all four of these tools instantiate, and the baseline for telling an agent apart from a fancy autocomplete.
- Tool Calling Explained — how a model turns intent into a structured tool invocation, the mechanism beneath Claude Code's typed catalog and Codex CLI's shell channel alike.
- Tool Design Antipatterns — the failure modes a typed, minimal catalog avoids and a sprawling shell-everything surface invites.
Project sources:
- Claude Code documentation — tools, permission modes, Skills, and MCP setup.
- OpenAI Codex — Codex CLI approval modes, sandbox behavior, and local-model configuration.
- Cursor — Agent mode, the Agents Window, MCP support, and model configuration.
- Aider — architect/editor mode, edit-blocks, repo-map, and git workflow (source at github.com/Aider-AI/aider).