Single-Agent vs. Multi-Agent Orchestration

D8
Deep Dive · Architectures & Patterns

Single-agent vs. multi-agent orchestration.

"Use multiple agents" is the most over-applied agentic design decision. This essay covers the real reasons to split (context isolation, parallelism, capability boundaries), the supervisor/worker and hand-off topologies, the coordination tax nobody budgets for, and a decision framework with explicit anti-patterns.

STEP 1

What a multi-agent system actually is.

A multi-agent system is several agents — each with its own prompt, tools, and context window — coordinated to solve one task. It is decomposition along the state-ownership and context axes from the landscape essay: instead of one agent holding everything in one history, work is partitioned so each agent sees only its slice. The coordination structure is the architecture.

Critically, multi-agent is not "smarter." Three agents run by the same base model do not exceed that model's capability ceiling. What you gain is structural: isolated contexts, parallel execution, and enforceable capability boundaries. What you pay is a coordination tax. The decision is an engineering tradeoff, not a capability upgrade — and treating it as the latter is the field's most common architectural mistake.

STEP 2

The legitimate reasons to split.

  • Context isolation. A long subtask (deep code search, multi-doc research) would otherwise flood the main agent's window with detail it does not need. A sub-agent does the work in its own window and returns a compact result. This is the strongest and most common real justification.
  • Parallelism. Genuinely independent subtasks run concurrently across sub-agents — a latency win a single sequential agent cannot match.
  • Capability / permission boundaries. A high-privilege tool (prod writes, payments) lives behind a narrowly scoped agent that does only that, with its own guardrails. Decomposition becomes a security boundary you can audit.
  • Distinct expertise with distinct tools/prompts. When subtasks need genuinely different tool sets and instructions, separate agents are cleaner than one bloated prompt — though a router into specialized handlers often suffices without full multi-agent autonomy.

Notice every reason is structural (context, parallelism, security, separation), never "the agents will reason better together." If your justification is the latter, you do not yet have a multi-agent justification.

STEP 3

Two topologies.

Supervisor / worker (orchestrator-workers). A supervisor decomposes the task, delegates subtasks to workers, and synthesizes their results. Workers do not talk to each other; all coordination flows through the supervisor. This is the workhorse topology — Anthropic's research system uses a lead agent spawning subagents that explore in parallel and report back. Control and observability are centralized; the supervisor is the bottleneck and the single failure point.

# Supervisor / worker
plan = supervisor.decompose(task)
results = parallel_map(
    lambda sub: worker_for(sub.kind).run(sub),  # isolated context each
    plan.independent_subtasks,
)
return supervisor.synthesize(task, results)        # reconcile, not concat

Hand-off / network. Agents pass control peer-to-peer: a triage agent hands the conversation to a specialist, which may hand off again. Flexible and natural for conversational flows, but control is decentralized, loops and ping-pong are easy, and end-to-end observability is hard. Constrain hand-offs to a small validated graph; never allow an arbitrary mesh.

STEP 4

The coordination tax nobody budgets for.

Multi-agent's costs are real, recurring, and routinely underestimated:

  • Token blow-up. Every hand-off re-serializes context. Multi-agent systems commonly burn several times the tokens of an equivalent single agent — a measured, expected cost, not a bug.
  • Error compounding. Mistakes propagate and amplify across hand-offs. A small misread by the supervisor becomes a wholly wrong subtask spec a worker executes faithfully.
  • Lossy context transfer. Workers see only what the supervisor passed. Under-specify the hand-off and the worker solves the wrong problem confidently — the dominant multi-agent failure mode.
  • Debugging is distributed-systems debugging. Failures live in the seams between agents. You need cross-agent tracing with a correlation ID or you cannot reconstruct what happened.
  • Synthesis is a hard task itself. Merging worker outputs, resolving contradictions, and deduplicating is non-trivial reasoning the supervisor must actually do — concatenation produces incoherent results.

The signature anti-pattern: a multi-agent system where the agents mostly talk to each other rather than do work. If your trace is dominated by inter-agent coordination messages and most agents could be a function call or a single prompt, you have paid the full coordination tax for none of the structural benefit. Collapse it back to one agent.

STEP 5

A decision framework.

Default to a single agent. It has one context, one trace, one failure surface, and is dramatically cheaper to build, debug, and run. Escalate deliberately:

  • Single ReAct agent with a good tool set → handles the large majority of real workloads. Start here, always.
  • Add a router when tool selection degrades — cheaper and simpler than autonomous agents, and it captures most "specialization" value.
  • Add sub-agents for context isolation when a subtask demonstrably pollutes the main window — the cleanest, best-justified multi-agent move.
  • Go to a supervisor/worker topology only when subtasks are genuinely parallelizable or need separate capability boundaries, and you can afford the token and debugging tax.
  • Use peer hand-off only for conversational triage flows, over a small explicitly validated graph with loop guards.

Anti-patterns to name and avoid: "agent per role" mirroring an org chart (humans coordinate via shared understanding; agents do not); deep agent hierarchies (error compounds geometrically with depth); free-form agent meshes (unbounded loops, no observability); and multi-agent for tasks a single ReAct loop already handles (pure overhead).

The honest tradeoff: multi-agent buys context isolation, parallelism, and auditable capability boundaries at the cost of a multiplicative token bill, compounding errors, lossy hand-offs, and distributed-systems debugging. Reach for it when the structural benefit is concrete and measured — never because more agents sound more capable. They are not.