LangGraph vs CrewAI vs Claude Managed Agents vs OpenAI Agents SDK: Four Architectures of the Orchestration Layer

Wire up the same three-step workflow — research, draft, review — in all four of these and the code looks nearly identical: name some agents, give them tools, hit run. Then the first run crashes halfway and they diverge violently. LangGraph replays from its last checkpoint; Anthropic's managed runtime never lost the thread, because the state was never on your machine; CrewAI and the OpenAI Agents SDK start from nothing. As of late May 2026 the feature matrices almost match. The thing that decides which one you can actually operate in production is invisible on the feature list: where your agent's state lives.

At a glance

Four frameworks, four answers to the same question — what does the orchestration layer have to remember, and who is responsible for not losing it. The table sets the basics; the matrix below it shows where each one leans hardest across the axes that actually differ.

Framework	Released / maintainer	Primary niche	Where it runs
LangGraph	2024, LangChain Inc.	Explicit graph state machine, durable execution	Your infra (or LangGraph Platform)
CrewAI	2024, CrewAI Inc.	Role-based multi-agent crews + Flows	Your infra
Claude Managed Agents	April 2026 (beta), Anthropic	Managed server-side autonomous harness	Anthropic infra (Environment can self-host)
OpenAI Agents SDK	2025, OpenAI	Minimal code-first agents + handoffs	Your infra

Snapshot: 2026-05-28. These frameworks change fast; verify against current docs.

Where each framework leans hardest. The axes converge everywhere except state durability and who operates the runtime.

LangGraph — deep dive

LangGraph runs a graph of nodes over a state object you own; a checkpointer snapshots that state after every step.

Graph nodes & edges

LangGraph asks you to draw the control flow as a graph. Nodes are functions — call a model, run a tool, transform data — and edges decide what runs next. Edges can be conditional (branch on the state) and cyclic (loop back), so the structure is a state machine, not a linear pipeline. This is the same perceive-decide-act cycle described in The Agent Loop, but made explicit: instead of an opaque while-loop inside a model harness, the loop is a diagram you can read, test, and reason about edge by edge. A supervisor-routes-to-specialists pattern, or the classic plan-and-execute split, both fall out naturally as graph topologies.

The state object

Every node reads from and writes to a single typed state object that you define. There is no hidden conversation buffer the framework controls behind your back — the state is the schema you wrote, and node return values are merged into it by reducers you can customize. That makes LangGraph the only framework here where the question "what does my agent know right now?" has a concrete, inspectable answer at every step. It also draws the line between this short-term working state and the durable knowledge stores discussed in short-vs-long-term memory: the graph state is the per-run scratchpad; long-term memory is a separate store a node reads into it.

The checkpointer & durability

The state object is the headline; the checkpointer is why it matters in production. LangGraph snapshots the full state after every node via a pluggable checkpointer — in-memory for tests, SQLite for a single box, Postgres for a fleet. Because each step is committed before the next runs, the runtime supports durable execution and replay: crash the process mid-run and it resumes from the last successful step rather than restarting from zero. That same checkpoint boundary powers human-in-the-loop interrupts (pause at a node, wait for approval, resume) and token-level streaming. Durability is not an add-on you bolt on later; it is the default consequence of owning a checkpointed state object.

CrewAI — deep dive

CrewAI composes role-based agents into a crew; Flows add an explicit, event-driven state layer on top.

The crew / agent / task model

CrewAI's core abstraction is a Crew: a set of agents, each declared with a role, a goal, and a backstory, working through a list of tasks. You describe who the agents are rather than wiring the control flow by hand — the framework turns those role declarations plus task descriptions into a running collaboration. It is multi-agent by construction, the same supervisor-and-specialists shape covered in the supervisor-worker pattern, except you reach it by declaring roles instead of drawing a graph.

Sequential vs hierarchical process, plus Flows

A crew runs under a process: sequential (tasks run in order, each output feeding the next) or hierarchical (a manager agent delegates and reviews). For deterministic, event-driven control — conditional branching, explicit state, looping — CrewAI adds Flows, a separate orchestration layer where state is first-class rather than implied. The mental model splits cleanly: Crews when you want emergent role-based collaboration, Flows when you want a controlled pipeline. Many real systems nest a crew inside a flow.

Where context is threaded implicitly

In a plain crew, state is mostly implicit: context flows from one task's output into the next agent's prompt, threaded by the framework rather than stored in a schema you own. That is ergonomic for prototypes and is exactly why durability is weaker here than LangGraph's checkpointer model — there is no built-in snapshot to resume from unless you adopt Flows and manage the state yourself. CrewAI also speaks native MCP and A2A, so crews can call external tools and talk to other agents over open protocols. One correction worth stating plainly: CrewAI was historically built on LangChain, but the project is independent of LangChain now — its current README describes it as a fully standalone framework.

Claude Managed Agents — deep dive

The client submits a goal and streams events; Anthropic's server runs the loop and holds the Session state.

The client / server split

Claude Managed Agents (launched April 2026, in beta) inverts the ownership question entirely. You run a thin client; Anthropic runs the loop. The framework's concepts are Agent, Environment, Session, and Events: you submit a goal and stream the run, but the orchestration happens server-side on Anthropic's infrastructure. This is distinct from the Claude Agent SDK, which runs the loop in your environment — Managed Agents is the hosted counterpart, where the runtime is the vendor's responsibility, not yours.

The server-side loop & tool execution

The control flow is a server-side autonomous loop. A Session stores conversation history, container state, and outputs on Anthropic infrastructure, so a run resumes cleanly after a pause — the thread is never on your machine to lose. You observe and steer through Events: the run streams over SSE, and you can interrupt or redirect it mid-flight by sending events back. The Environment — where tools and code actually execute — can be a self-hosted sandbox on your own infra even while the loop itself stays managed, which keeps tool execution close to your data without making you operate the orchestrator.

What you give up and gain

You gain durability for free: no checkpointer to configure, no Postgres to operate, no resume logic to write. You give up two things. First, this is a single autonomous harness, not a multi-agent framework — sub-agents are not a headline primitive, so do not pick it expecting first-class crews or handoffs. Second, because Session state lives on Anthropic infrastructure, it is not ZDR- or HIPAA-eligible; the same property that makes it never-lose-the-thread durable also means the state physically leaves your boundary. That trade is the whole pitch: hand over operation of the runtime, and in return never write resume code again.

OpenAI Agents SDK — deep dive

A thin runner loop in your process: agents with tools, handoffs between them, guardrails, and a process-local Sessions store.

Agents + tools

The OpenAI Agents SDK is the most minimal of the four: a lightweight library you run in your own process. An agent is a model plus instructions plus a set of tools, and a small runner loop drives the model-calls-tool-evaluates cycle as plain code. There is no graph DSL and no role declarations to learn — control flow is the Python you already write, with the SDK supplying just enough loop, tracing, and structure to keep it honest.

Handoffs

Multi-agent is first-class here through handoffs: one agent transfers control to another, passing full context along. A triage agent hands off to a specialist; the specialist takes over the conversation. This is the lightest-weight multi-agent primitive of the four — closer to a function call than to a graph or a managed crew — and it maps directly onto the trade-offs in single vs multi-agent: start with one agent, split into handoffs only when a clear specialization earns the seam.

Guardrails & sessions (and running Claude via LiteLLM)

Safety and human-in-the-loop come from guardrails (input/output validators that can halt a run) and tool-approval patterns, with tracing built in for observability. State lives in a process-local Sessions abstraction — in-memory by default, with pluggable backends — and crucially there is no built-in checkpoint or replay: durability is your responsibility. Crash mid-run and, unless you persisted the session yourself, you start over. One pragmatic strength: the bundled LiteLLM adapter runs non-OpenAI models, so you can drive Claude with LitellmModel(model="anthropic/claude-opus-4-7", ...), or swap in Gemini, Bedrock, Azure, or Ollama without leaving the SDK.

Cross-cutting comparison

Where state lives & durability

The headline axis: four different homes for your agent's state, with four different durability stories.

This is the axis that separates the four, and it is the one the feature lists hide. LangGraph hands you an explicit state object you own and snapshots after every step, so durability and replay are the default — you can lose the process and not the run. Claude Managed Agents reaches the same never-lose-the-thread durability from the opposite direction: the Session lives on Anthropic's servers, so the state is durable precisely because it was never yours to hold (and, for the same reason, it leaves your data boundary). CrewAI keeps state implicit in the crew — context threaded between tasks, with no built-in snapshot — unless you adopt Flows and manage it explicitly. The OpenAI Agents SDK keeps a process-local Sessions store that is in-memory by default with no built-in checkpoint, so durability is entirely on you. Two frameworks make resume free; two make it your homework. Knowing which is which is the difference between a demo and something you can operate.

Control-flow model

From most explicit to most autonomous: a drawn graph, declared roles, plain code, and a managed loop.

The four sit on a spectrum from drawn to delegated. LangGraph is the most explicit: you author the control flow as a graph of nodes and conditional, cyclic edges, and every branch is visible and testable. The OpenAI Agents SDK is explicit in a different idiom — the control flow is plain code in your process, with handoffs as the one structured branch. CrewAI is declarative: you describe roles and tasks and let the sequential or hierarchical process decide ordering, reaching for Flows when you need deterministic branching back. Claude Managed Agents is the most delegated: the loop runs autonomously on the server and you steer it by streaming events rather than by authoring its structure. Whether the loop terminates cleanly — the concern in planning and termination — is something you design into a LangGraph edge, observe through OpenAI SDK tracing, or trust the managed runtime to handle.

Multi-agent stance

Three first-class multi-agent stances and one deliberately single harness.

Three of the four treat multi-agent as first-class; one deliberately does not. CrewAI is multi-agent by construction — a crew is a team of role-based agents, and that is the whole point of the abstraction. LangGraph makes it first-class through subgraphs and supervisor patterns, composing agents as nodes within a larger graph. The OpenAI Agents SDK does it through handoffs, the lightest-weight transfer-of-control primitive of the three. Claude Managed Agents is the exception: it is a single autonomous harness, not a multi-agent framework, so sub-agents are not a headline primitive — pick it for one capable agent, not a crew. Before reaching for any of the multi-agent three, it is worth checking multi-agent: when and why and the topologies that follow, because the cheapest multi-agent system is often the single agent you did not split.

Where it runs & who operates it

The state question collapses into an operations question: if the state lives on your infra, you operate the runtime; if it lives on the vendor's, they do. Three of the four run in your environment and put durability, scaling, and uptime on your team; Claude Managed Agents moves the runtime to Anthropic and the data boundary with it.

Framework	Runs on	You operate?
LangGraph	Your infra (or LangGraph Platform / LangSmith Deployment)	Yes — unless you use the managed Platform
CrewAI	Your infra	Yes
Claude Managed Agents	Anthropic infra (Environment can be self-hosted)	No — Anthropic operates the loop
OpenAI Agents SDK	Your infra	Yes

When to pick which

Use case	Pick LangGraph if…	Pick CrewAI if…	Pick Claude Managed if…	Pick OpenAI Agents SDK if…
Durable long-running workflow	You want a checkpointed state object that resumes from the last step after a crash, on infra you control.	You can accept Flows-managed state and explicit persistence, or your runs are short enough that restart is cheap.	You want durability for free and are fine with Session state living on Anthropic's servers.	Not the natural fit — there is no built-in checkpoint, so you must persist sessions yourself.
Quick role-based prototype	Overkill — drawing a graph is more ceremony than a prototype needs.	You want to declare a few agents with roles and tasks and watch them collaborate with minimal wiring.	Workable for a single-agent prototype, but you are not modeling a team of roles.	You want a handful of agents and handoffs in plain code, no new DSL to learn.
Don't want to run infra	No — you operate the runtime (unless you adopt the managed Platform).	No — CrewAI runs in your environment.	Yes — Anthropic runs the loop and holds the state; you stream a thin client.	No — the runner loop lives in your process.
Minimal, code-first	Heavier than you want; the graph abstraction is the opposite of minimal.	You prefer declarations over code, so this leans away from code-first.	No — the loop is managed, not code you author.	You want a thin library that is mostly your own Python, with handoffs, guardrails, and tracing added.
Complex branching control flow	You need conditional and cyclic edges you can author and test explicitly — this is LangGraph's home turf.	You reach for Flows for event-driven branching on top of crews.	You are comfortable letting the server-side loop decide and steering it via events.	You will express branches as plain code and handoffs, accepting that complex graphs get unwieldy.

FAQ

What's the difference between LangGraph and the OpenAI Agents SDK?

Both run in your own infrastructure, but they sit at opposite ends of the explicit-state spectrum. LangGraph gives you an explicit graph of nodes and edges over a checkpointed state object you own, with built-in durable execution and replay — crash mid-run and it resumes from the last step. The OpenAI Agents SDK is a thinner library: control flow is plain code, multi-agent happens through handoffs, and state is a process-local Sessions store that is in-memory by default with no built-in checkpoint, so durability is your responsibility. Reach for LangGraph when you need a durable, inspectable state machine; reach for the OpenAI Agents SDK when you want minimal code-first agents and will handle persistence yourself.

Is CrewAI built on LangChain?

Not anymore. CrewAI was historically built on LangChain, but the project is independent now — its current README describes it as a completely standalone framework with no LangChain dependency. It also has native MCP and A2A support for talking to external tools and other agents over open protocols.

Do I have to self-host LangGraph, or is there a managed version?

You can self-host LangGraph anywhere you run Python, choosing a checkpointer (in-memory, SQLite, Postgres) to match your durability needs. There is also a managed option — historically branded "LangGraph Platform," now offered under the "LangSmith Deployment" umbrella — that runs the orchestration for you. The exact product name has shifted over time, so check the current LangChain docs for the label, but the managed path does exist.

Can I use Claude or other non-OpenAI models with the OpenAI Agents SDK?

Yes. The SDK bundles a LiteLLM adapter, so you can run Claude with something like LitellmModel(model="anthropic/claude-opus-4-7", ...), as well as Gemini, Bedrock, Azure, or Ollama. The "OpenAI" in the name refers to who maintains the library, not a lock-in to OpenAI models.

When should I use a framework at all instead of a plain while loop?

For a single agent calling a couple of tools with no durability or multi-agent needs, a plain while loop around a model call is honestly fine — see The Agent Loop. Reach for a framework when you need one of the things a loop does not give you for free: durable checkpointing and replay, explicit branching control flow, first-class multi-agent coordination, or a managed runtime you do not operate. The framework earns its weight exactly when "I'll just add persistence later" stops being a small change. See Agent Frameworks for the broader trade-off.

Which framework is best for long-running, durable workflows?

Two options give you durability out of the box, from opposite directions. LangGraph checkpoints an owned state object after every step and resumes from the last successful step after a crash — durable on infrastructure you control. Claude Managed Agents keeps the Session on Anthropic's servers, so it resumes cleanly after pauses without you writing any resume logic, at the cost of the state leaving your boundary (and not being ZDR/HIPAA-eligible). CrewAI's plain crews and the OpenAI Agents SDK both leave durability to you, so they are weaker fits unless you persist state yourself.