AI Blog

LangGraph vs CrewAI vs Claude Managed Agents vs OpenAI Agents SDK: Four Architectures of the Orchestration Layer

Four orchestration frameworks let you wire up the same workflow — and the feature lists nearly match. The thing that decides which one survives production is invisible there: where your agent's state actually lives.

By Agentic AI Wiki 21 min read

Wire up the same three-step workflow — research, draft, review — in all four of these and the code looks nearly identical: name some agents, give them tools, hit run. Then the first run crashes halfway and they diverge violently. LangGraph replays from its last checkpoint; Anthropic's managed runtime never lost the thread, because the state was never on your machine; CrewAI and the OpenAI Agents SDK start from nothing. As of late May 2026 the feature matrices almost match. The thing that decides which one you can actually operate in production is invisible on the feature list: where your agent's state lives.

At a glance

Four frameworks, four answers to the same question — what does the orchestration layer have to remember, and who is responsible for not losing it. The table sets the basics; the matrix below it shows where each one leans hardest across the axes that actually differ.

Framework Released / maintainer Primary niche Where it runs
LangGraph 2024, LangChain Inc. Explicit graph state machine, durable execution Your infra (or LangGraph Platform)
CrewAI 2024, CrewAI Inc. Role-based multi-agent crews + Flows Your infra
Claude Managed Agents April 2026 (beta), Anthropic Managed server-side autonomous harness Anthropic infra (Environment can self-host)
OpenAI Agents SDK 2025, OpenAI Minimal code-first agents + handoffs Your infra

Snapshot: 2026-05-28. These frameworks change fast; verify against current docs.

Orchestration feature matrix Heatmap comparing LangGraph, CrewAI, Claude Managed Agents, and OpenAI Agents SDK across six axes: State durability, Control-flow, Multi-agent, Streaming, Human-in-loop, and Hosting. Strength indicated by fill color from light (weak) to dark orange-red (strong). Orchestration feature matrix State durability Control- flow Multi- agent Streaming Human- in-loop Hosting LangGraph Checkpointed Graph Subgraphs Yes Yes Self / Platform CrewAI In-memory Roles + tasks Native crews Partial Limited Self Claude Managed Server- held Server loop Single harness Yes Steer via events Managed OpenAI Agents SDK Session Code + handoffs Handoffs Yes Guardrails Self Weak Medium Strong
Where each framework leans hardest. The axes converge everywhere except state durability and who operates the runtime.

LangGraph — deep dive

LangGraph architecture Nodes in a cyclic graph read and write a single explicit state object you own; a pluggable checkpointer persists that state to memory, SQLite, or Postgres for durable execution and replay. LangGraph — Compiled Graph START Node A read + write Node B route / branch Node C tool call / LLM END conditional edge (loop) Explicit State Object TypedDict / Pydantic — you own the schema nodes READ & WRITE channel reducers merge updates the one inspectable source of truth Checkpointer durable · replayable in-memory / SQLite / Postgres persists every state snapshot Human-in-the-loop / interrupt
LangGraph runs a graph of nodes over a state object you own; a checkpointer snapshots that state after every step.

Graph nodes & edges

LangGraph asks you to draw the control flow as a graph. Nodes are functions — call a model, run a tool, transform data — and edges decide what runs next. Edges can be conditional (branch on the state) and cyclic (loop back), so the structure is a state machine, not a linear pipeline. This is the same perceive-decide-act cycle described in The Agent Loop, but made explicit: instead of an opaque while-loop inside a model harness, the loop is a diagram you can read, test, and reason about edge by edge. A supervisor-routes-to-specialists pattern, or the classic plan-and-execute split, both fall out naturally as graph topologies.

The state object

Every node reads from and writes to a single typed state object that you define. There is no hidden conversation buffer the framework controls behind your back — the state is the schema you wrote, and node return values are merged into it by reducers you can customize. That makes LangGraph the only framework here where the question "what does my agent know right now?" has a concrete, inspectable answer at every step. It also draws the line between this short-term working state and the durable knowledge stores discussed in short-vs-long-term memory: the graph state is the per-run scratchpad; long-term memory is a separate store a node reads into it.

The checkpointer & durability

The state object is the headline; the checkpointer is why it matters in production. LangGraph snapshots the full state after every node via a pluggable checkpointer — in-memory for tests, SQLite for a single box, Postgres for a fleet. Because each step is committed before the next runs, the runtime supports durable execution and replay: crash the process mid-run and it resumes from the last successful step rather than restarting from zero. That same checkpoint boundary powers human-in-the-loop interrupts (pause at a node, wait for approval, resume) and token-level streaming. Durability is not an add-on you bolt on later; it is the default consequence of owning a checkpointed state object.

CrewAI — deep dive

CrewAI architecture A crew of role-based agents runs an ordered task list with context threaded implicitly between agents; Flows add an event-driven path with explicit state when needed. Crew independent framework — no vendor lock-in Process sequential | hierarchical (manager LLM) Researcher role-based agent goal + backstory + tools Writer role-based agent goal + backstory + tools Reviewer role-based agent goal + backstory + tools context threaded implicitly — NO external state store Tasks ordered list · expected_output · assigned agent crew.kickoff() → final output Flows event-driven, explicit state @start / @listen decorators opt-in stateful path opt-in Memory short-term · long-term · entity optional — not default pluggable storage backends Tools LangChain-compatible + CrewAI native
CrewAI composes role-based agents into a crew; Flows add an explicit, event-driven state layer on top.

The crew / agent / task model

CrewAI's core abstraction is a Crew: a set of agents, each declared with a role, a goal, and a backstory, working through a list of tasks. You describe who the agents are rather than wiring the control flow by hand — the framework turns those role declarations plus task descriptions into a running collaboration. It is multi-agent by construction, the same supervisor-and-specialists shape covered in the supervisor-worker pattern, except you reach it by declaring roles instead of drawing a graph.

Sequential vs hierarchical process, plus Flows

A crew runs under a process: sequential (tasks run in order, each output feeding the next) or hierarchical (a manager agent delegates and reviews). For deterministic, event-driven control — conditional branching, explicit state, looping — CrewAI adds Flows, a separate orchestration layer where state is first-class rather than implied. The mental model splits cleanly: Crews when you want emergent role-based collaboration, Flows when you want a controlled pipeline. Many real systems nest a crew inside a flow.

Where context is threaded implicitly

In a plain crew, state is mostly implicit: context flows from one task's output into the next agent's prompt, threaded by the framework rather than stored in a schema you own. That is ergonomic for prototypes and is exactly why durability is weaker here than LangGraph's checkpointer model — there is no built-in snapshot to resume from unless you adopt Flows and manage the state yourself. CrewAI also speaks native MCP and A2A, so crews can call external tools and talk to other agents over open protocols. One correction worth stating plainly: CrewAI was historically built on LangChain, but the project is independent of LangChain now — its current README describes it as a fully standalone framework.

Claude Managed Agents — deep dive

Claude Managed Agents architecture The client submits a goal and streams results via SSE; Anthropic's server runs the agent loop and holds all session state — a single managed harness, not a multi-agent framework. CLIENT Submit goal POST /v1/agents/sessions Steer via events human-in-the-loop / cancel Poll / Stream (SSE) event stream — non-blocking Single harness NOT multi-agent-first one loop · env can self-host Anthropic Server Agent Loop plan → act → observe managed by Anthropic Claude model calls State lives here sessions on Anthropic infra resumable · history tool outputs + context client holds NO state Tool Execution bash · web · file · custom sandboxed environment Event / Result Stream SSE events → client poll / stream tool_use · text · done · error submit · steer stream (SSE)
The client submits a goal and streams events; Anthropic's server runs the loop and holds the Session state.

The client / server split

Claude Managed Agents (launched April 2026, in beta) inverts the ownership question entirely. You run a thin client; Anthropic runs the loop. The framework's concepts are Agent, Environment, Session, and Events: you submit a goal and stream the run, but the orchestration happens server-side on Anthropic's infrastructure. This is distinct from the Claude Agent SDK, which runs the loop in your environment — Managed Agents is the hosted counterpart, where the runtime is the vendor's responsibility, not yours.

The server-side loop & tool execution

The control flow is a server-side autonomous loop. A Session stores conversation history, container state, and outputs on Anthropic infrastructure, so a run resumes cleanly after a pause — the thread is never on your machine to lose. You observe and steer through Events: the run streams over SSE, and you can interrupt or redirect it mid-flight by sending events back. The Environment — where tools and code actually execute — can be a self-hosted sandbox on your own infra even while the loop itself stays managed, which keeps tool execution close to your data without making you operate the orchestrator.

What you give up and gain

You gain durability for free: no checkpointer to configure, no Postgres to operate, no resume logic to write. You give up two things. First, this is a single autonomous harness, not a multi-agent framework — sub-agents are not a headline primitive, so do not pick it expecting first-class crews or handoffs. Second, because Session state lives on Anthropic infrastructure, it is not ZDR- or HIPAA-eligible; the same property that makes it never-lose-the-thread durable also means the state physically leaves your boundary. That trade is the whole pitch: hand over operation of the runtime, and in return never write resume code again.

OpenAI Agents SDK — deep dive

OpenAI Agents SDK architecture A thin Runner loop drives an Agent with tools, hands off to peer agents, and applies input/output guardrails; session state is process-local and in-memory unless you persist it yourself. Runner Loop Agent instructions (prompt) model: GPT / Claude… tools list handoffs list Tools function_tool decorator web / file search custom fns hosted MCP tools Peer Agent handoff target own tools + instructions runner re-enters loop with new active agent Guardrails input check output check run in parallel raise tripwire handoff Session State process-local · in-memory by default NO built-in checkpoint or replay persist it yourself for durability Lifecycle hooks on_tool_start / on_agent_end Model adapter OpenAI models (default) Claude / Gemini via LiteLLM adapter Tracing built-in spans → dashboard agent · tool · handoff chain
A thin runner loop in your process: agents with tools, handoffs between them, guardrails, and a process-local Sessions store.

Agents + tools

The OpenAI Agents SDK is the most minimal of the four: a lightweight library you run in your own process. An agent is a model plus instructions plus a set of tools, and a small runner loop drives the model-calls-tool-evaluates cycle as plain code. There is no graph DSL and no role declarations to learn — control flow is the Python you already write, with the SDK supplying just enough loop, tracing, and structure to keep it honest.

Handoffs

Multi-agent is first-class here through handoffs: one agent transfers control to another, passing full context along. A triage agent hands off to a specialist; the specialist takes over the conversation. This is the lightest-weight multi-agent primitive of the four — closer to a function call than to a graph or a managed crew — and it maps directly onto the trade-offs in single vs multi-agent: start with one agent, split into handoffs only when a clear specialization earns the seam.

Guardrails & sessions (and running Claude via LiteLLM)

Safety and human-in-the-loop come from guardrails (input/output validators that can halt a run) and tool-approval patterns, with tracing built in for observability. State lives in a process-local Sessions abstraction — in-memory by default, with pluggable backends — and crucially there is no built-in checkpoint or replay: durability is your responsibility. Crash mid-run and, unless you persisted the session yourself, you start over. One pragmatic strength: the bundled LiteLLM adapter runs non-OpenAI models, so you can drive Claude with LitellmModel(model="anthropic/claude-opus-4-7", ...), or swap in Gemini, Bedrock, Azure, or Ollama without leaving the SDK.

Cross-cutting comparison

Where state lives & durability

Where state lives & durability Four-column comparison of state durability: LangGraph checkpoints state you own (most durable), CrewAI keeps state in-memory with optional explicit state via Flows, Claude Managed Agents holds sessions server-side on Anthropic infrastructure, OpenAI Agents SDK uses process-local in-memory sessions with no built-in checkpoint. Where state lives & durability LangGraph Explicit checkpointed state object you own; durable, replayable CrewAI In-memory in the crew; Flows add explicit state Claude Managed Agents Server-held sessions on Anthropic infra; resumable OpenAI Agents SDK Process-local session; in-memory by default; no built-in checkpoint
The headline axis: four different homes for your agent's state, with four different durability stories.

This is the axis that separates the four, and it is the one the feature lists hide. LangGraph hands you an explicit state object you own and snapshots after every step, so durability and replay are the default — you can lose the process and not the run. Claude Managed Agents reaches the same never-lose-the-thread durability from the opposite direction: the Session lives on Anthropic's servers, so the state is durable precisely because it was never yours to hold (and, for the same reason, it leaves your data boundary). CrewAI keeps state implicit in the crew — context threaded between tasks, with no built-in snapshot — unless you adopt Flows and manage it explicitly. The OpenAI Agents SDK keeps a process-local Sessions store that is in-memory by default with no built-in checkpoint, so durability is entirely on you. Two frameworks make resume free; two make it your homework. Knowing which is which is the difference between a demo and something you can operate.

Control-flow model

Control-flow model Four-column comparison of control-flow models: LangGraph uses a graph of nodes with conditional edges, CrewAI uses role and task declaration with Flows for events, Claude Managed Agents uses a server-side autonomous loop, OpenAI Agents SDK uses plain code with handoffs. Control-flow model LangGraph Graph of nodes + conditional edges CrewAI Role + task declaration (Crews); Flows for events Claude Managed Agents Server-side autonomous loop OpenAI Agents SDK Plain code + handoffs
From most explicit to most autonomous: a drawn graph, declared roles, plain code, and a managed loop.

The four sit on a spectrum from drawn to delegated. LangGraph is the most explicit: you author the control flow as a graph of nodes and conditional, cyclic edges, and every branch is visible and testable. The OpenAI Agents SDK is explicit in a different idiom — the control flow is plain code in your process, with handoffs as the one structured branch. CrewAI is declarative: you describe roles and tasks and let the sequential or hierarchical process decide ordering, reaching for Flows when you need deterministic branching back. Claude Managed Agents is the most delegated: the loop runs autonomously on the server and you steer it by streaming events rather than by authoring its structure. Whether the loop terminates cleanly — the concern in planning and termination — is something you design into a LangGraph edge, observe through OpenAI SDK tracing, or trust the managed runtime to handle.

Multi-agent stance

Multi-agent stance Four-column comparison of multi-agent approaches: LangGraph supports subgraphs and supervisor patterns, CrewAI is multi-agent by construction with native crews, Claude Managed Agents uses a single autonomous harness not designed for multi-agent, OpenAI Agents SDK enables multi-agent via handoffs between agents. Multi-agent stance LangGraph Subgraphs / supervisor patterns CrewAI Native crews — multi-agent by construction Claude Managed Agents Single autonomous harness — NOT multi-agent-first OpenAI Agents SDK Handoffs between agents
Three first-class multi-agent stances and one deliberately single harness.

Three of the four treat multi-agent as first-class; one deliberately does not. CrewAI is multi-agent by construction — a crew is a team of role-based agents, and that is the whole point of the abstraction. LangGraph makes it first-class through subgraphs and supervisor patterns, composing agents as nodes within a larger graph. The OpenAI Agents SDK does it through handoffs, the lightest-weight transfer-of-control primitive of the three. Claude Managed Agents is the exception: it is a single autonomous harness, not a multi-agent framework, so sub-agents are not a headline primitive — pick it for one capable agent, not a crew. Before reaching for any of the multi-agent three, it is worth checking multi-agent: when and why and the topologies that follow, because the cheapest multi-agent system is often the single agent you did not split.

Where it runs & who operates it

The state question collapses into an operations question: if the state lives on your infra, you operate the runtime; if it lives on the vendor's, they do. Three of the four run in your environment and put durability, scaling, and uptime on your team; Claude Managed Agents moves the runtime to Anthropic and the data boundary with it.

Framework Runs on You operate?
LangGraph Your infra (or LangGraph Platform / LangSmith Deployment) Yes — unless you use the managed Platform
CrewAI Your infra Yes
Claude Managed Agents Anthropic infra (Environment can be self-hosted) No — Anthropic operates the loop
OpenAI Agents SDK Your infra Yes

When to pick which

Use case Pick LangGraph if… Pick CrewAI if… Pick Claude Managed if… Pick OpenAI Agents SDK if…
Durable long-running workflow You want a checkpointed state object that resumes from the last step after a crash, on infra you control. You can accept Flows-managed state and explicit persistence, or your runs are short enough that restart is cheap. You want durability for free and are fine with Session state living on Anthropic's servers. Not the natural fit — there is no built-in checkpoint, so you must persist sessions yourself.
Quick role-based prototype Overkill — drawing a graph is more ceremony than a prototype needs. You want to declare a few agents with roles and tasks and watch them collaborate with minimal wiring. Workable for a single-agent prototype, but you are not modeling a team of roles. You want a handful of agents and handoffs in plain code, no new DSL to learn.
Don't want to run infra No — you operate the runtime (unless you adopt the managed Platform). No — CrewAI runs in your environment. Yes — Anthropic runs the loop and holds the state; you stream a thin client. No — the runner loop lives in your process.
Minimal, code-first Heavier than you want; the graph abstraction is the opposite of minimal. You prefer declarations over code, so this leans away from code-first. No — the loop is managed, not code you author. You want a thin library that is mostly your own Python, with handoffs, guardrails, and tracing added.
Complex branching control flow You need conditional and cyclic edges you can author and test explicitly — this is LangGraph's home turf. You reach for Flows for event-driven branching on top of crews. You are comfortable letting the server-side loop decide and steering it via events. You will express branches as plain code and handoffs, accepting that complex graphs get unwieldy.

FAQ

What's the difference between LangGraph and the OpenAI Agents SDK?

Both run in your own infrastructure, but they sit at opposite ends of the explicit-state spectrum. LangGraph gives you an explicit graph of nodes and edges over a checkpointed state object you own, with built-in durable execution and replay — crash mid-run and it resumes from the last step. The OpenAI Agents SDK is a thinner library: control flow is plain code, multi-agent happens through handoffs, and state is a process-local Sessions store that is in-memory by default with no built-in checkpoint, so durability is your responsibility. Reach for LangGraph when you need a durable, inspectable state machine; reach for the OpenAI Agents SDK when you want minimal code-first agents and will handle persistence yourself.

Is CrewAI built on LangChain?

Not anymore. CrewAI was historically built on LangChain, but the project is independent now — its current README describes it as a completely standalone framework with no LangChain dependency. It also has native MCP and A2A support for talking to external tools and other agents over open protocols.

Do I have to self-host LangGraph, or is there a managed version?

You can self-host LangGraph anywhere you run Python, choosing a checkpointer (in-memory, SQLite, Postgres) to match your durability needs. There is also a managed option — historically branded "LangGraph Platform," now offered under the "LangSmith Deployment" umbrella — that runs the orchestration for you. The exact product name has shifted over time, so check the current LangChain docs for the label, but the managed path does exist.

Can I use Claude or other non-OpenAI models with the OpenAI Agents SDK?

Yes. The SDK bundles a LiteLLM adapter, so you can run Claude with something like LitellmModel(model="anthropic/claude-opus-4-7", ...), as well as Gemini, Bedrock, Azure, or Ollama. The "OpenAI" in the name refers to who maintains the library, not a lock-in to OpenAI models.

When should I use a framework at all instead of a plain while loop?

For a single agent calling a couple of tools with no durability or multi-agent needs, a plain while loop around a model call is honestly fine — see The Agent Loop. Reach for a framework when you need one of the things a loop does not give you for free: durable checkpointing and replay, explicit branching control flow, first-class multi-agent coordination, or a managed runtime you do not operate. The framework earns its weight exactly when "I'll just add persistence later" stops being a small change. See Agent Frameworks for the broader trade-off.

Which framework is best for long-running, durable workflows?

Two options give you durability out of the box, from opposite directions. LangGraph checkpoints an owned state object after every step and resumes from the last successful step after a crash — durable on infrastructure you control. Claude Managed Agents keeps the Session on Anthropic's servers, so it resumes cleanly after pauses without you writing any resume logic, at the cost of the state leaving your boundary (and not being ZDR/HIPAA-eligible). CrewAI's plain crews and the OpenAI Agents SDK both leave durability to you, so they are weaker fits unless you persist state yourself.

Further reading

On this wiki:

  • Agent Frameworks — what a framework adds over raw model calls, and when the abstraction earns its weight versus a plain loop.
  • The Agent Loop — the perceive-decide-act cycle every one of these four wraps, made explicit so you can see what LangGraph draws, the OpenAI SDK codes, and Anthropic runs server-side.
  • Supervisor-Worker Pattern — the coordination shape CrewAI builds in by default and LangGraph composes from subgraphs, with the trade-offs of delegating to specialists.

Project sources: