Single-agent vs. multi-agent orchestration.
"Use multiple agents" is the most over-applied agentic design decision. This essay covers the real reasons to split (context isolation, parallelism, capability boundaries), the supervisor/worker and hand-off topologies, the coordination tax nobody budgets for, and a decision framework with explicit anti-patterns.
What a multi-agent system actually is.
A multi-agent system is several agents — each with its own prompt, tools, and context window — coordinated to solve one task. It is decomposition along the state-ownership and context axes from the landscape essay: instead of one agent holding everything in one history, work is partitioned so each agent sees only its slice. The coordination structure is the architecture.
Critically, multi-agent is not "smarter." Three agents run by the same base model do not exceed that model's capability ceiling. What you gain is structural: isolated contexts, parallel execution, and enforceable capability boundaries. What you pay is a coordination tax. The decision is an engineering tradeoff, not a capability upgrade — and treating it as the latter is the field's most common architectural mistake.
The legitimate reasons to split.
- Context isolation. A long subtask (deep code search, multi-doc research) would otherwise flood the main agent's window with detail it does not need. A sub-agent does the work in its own window and returns a compact result. This is the strongest and most common real justification.
- Parallelism. Genuinely independent subtasks run concurrently across sub-agents — a latency win a single sequential agent cannot match.
- Capability / permission boundaries. A high-privilege tool (prod writes, payments) lives behind a narrowly scoped agent that does only that, with its own guardrails. Decomposition becomes a security boundary you can audit.
- Distinct expertise with distinct tools/prompts. When subtasks need genuinely different tool sets and instructions, separate agents are cleaner than one bloated prompt — though a router into specialized handlers often suffices without full multi-agent autonomy.
Notice every reason is structural (context, parallelism, security, separation), never "the agents will reason better together." If your justification is the latter, you do not yet have a multi-agent justification.
Two topologies.
Supervisor / worker (orchestrator-workers). A supervisor decomposes the task, delegates subtasks to workers, and synthesizes their results. Workers do not talk to each other; all coordination flows through the supervisor. This is the workhorse topology — Anthropic's research system uses a lead agent spawning subagents that explore in parallel and report back. Control and observability are centralized; the supervisor is the bottleneck and the single failure point.
# Supervisor / worker plan = supervisor.decompose(task) results = parallel_map( lambda sub: worker_for(sub.kind).run(sub), # isolated context each plan.independent_subtasks, ) return supervisor.synthesize(task, results) # reconcile, not concat
Hand-off / network. Agents pass control peer-to-peer: a triage agent hands the conversation to a specialist, which may hand off again. Flexible and natural for conversational flows, but control is decentralized, loops and ping-pong are easy, and end-to-end observability is hard. Constrain hand-offs to a small validated graph; never allow an arbitrary mesh.
The coordination tax nobody budgets for.
Multi-agent's costs are real, recurring, and routinely underestimated:
- Token blow-up. Every hand-off re-serializes context. Multi-agent systems commonly burn several times the tokens of an equivalent single agent — a measured, expected cost, not a bug.
- Error compounding. Mistakes propagate and amplify across hand-offs. A small misread by the supervisor becomes a wholly wrong subtask spec a worker executes faithfully.
- Lossy context transfer. Workers see only what the supervisor passed. Under-specify the hand-off and the worker solves the wrong problem confidently — the dominant multi-agent failure mode.
- Debugging is distributed-systems debugging. Failures live in the seams between agents. You need cross-agent tracing with a correlation ID or you cannot reconstruct what happened.
- Synthesis is a hard task itself. Merging worker outputs, resolving contradictions, and deduplicating is non-trivial reasoning the supervisor must actually do — concatenation produces incoherent results.
The signature anti-pattern: a multi-agent system where the agents mostly talk to each other rather than do work. If your trace is dominated by inter-agent coordination messages and most agents could be a function call or a single prompt, you have paid the full coordination tax for none of the structural benefit. Collapse it back to one agent.
A decision framework.
Default to a single agent. It has one context, one trace, one failure surface, and is dramatically cheaper to build, debug, and run. Escalate deliberately:
- Single ReAct agent with a good tool set → handles the large majority of real workloads. Start here, always.
- Add a router when tool selection degrades — cheaper and simpler than autonomous agents, and it captures most "specialization" value.
- Add sub-agents for context isolation when a subtask demonstrably pollutes the main window — the cleanest, best-justified multi-agent move.
- Go to a supervisor/worker topology only when subtasks are genuinely parallelizable or need separate capability boundaries, and you can afford the token and debugging tax.
- Use peer hand-off only for conversational triage flows, over a small explicitly validated graph with loop guards.
Anti-patterns to name and avoid: "agent per role" mirroring an org chart (humans coordinate via shared understanding; agents do not); deep agent hierarchies (error compounds geometrically with depth); free-form agent meshes (unbounded loops, no observability); and multi-agent for tasks a single ReAct loop already handles (pure overhead).
The honest tradeoff: multi-agent buys context isolation, parallelism, and auditable capability boundaries at the cost of a multiplicative token bill, compounding errors, lossy hand-offs, and distributed-systems debugging. Reach for it when the structural benefit is concrete and measured — never because more agents sound more capable. They are not.