Strategic constraint: agents must support model choices from open weights (Ollama, vLLM) through frontier (Claude, GPT, Gemini), and the architecture must never be tied to one provider. That requirement immediately narrows the runtime layer to options that are multi-provider native. This leaf works the decision: which runtime, what it gives you, what you give up, and when the calculus flips toward a first-party SDK like Claude Agent SDK instead.
The recommendation: LangGraph as the runtime, paired with a model gateway underneath (Cloudflare AI Gateway, OpenRouter, LiteLLM — see the model gateway category). LangGraph is multi-provider native (provider switching is a config call to init_chat_model), mature (v1.0 late 2025, 90k+ stars), MCP-aware, and pairs cleanly with both LangSmith and OTel-based audit. The second pick is Vercel AI SDK for TypeScript-first shops; Pydantic AI for type-first Python. The Claude Agent SDK is the right choice only when you have committed to Claude as the single model — which, by the architecture's stated constraint, this design hasn't.
This is a member-tier decision leaf on know.2nth.ai. Opinionated content — "the runtime layer choice and what it costs" — goes behind the join wall. Reference content (the landscape, categories, options per category) stays open at /tools/. Sign up via the join form to unlock the rest.
The model-portability constraint cuts the candidate list immediately: any first-party SDK that locks to one provider is out. Among multi-provider options, the deciding factor is maturity at scale (state, multi-agent orchestration, MCP support, ecosystem) versus ergonomics (DX, type safety, smaller mental model).
| Runtime | Verdict |
|---|---|
| LangGraph — pick | Multi-provider native via init_chat_model. Mature (v1.0 late 2025, 90k+ stars). MCP-aware as of late 2025. Best-in-class for stateful multi-agent orchestration when you grow into it. Strong observability story (LangSmith native; Langfuse via OTel exporter). Pairs cleanly with LiteLLM as the model gateway. Python ecosystem is the broader hireable pool. |
| Vercel AI SDK v6 — alt for TS shops | TypeScript-first, native multi-provider, lovely DX, ToolLoopAgent for production. Narrower scope than LangGraph for complex orchestration. Right pick if the team is TS-only. |
| Pydantic AI — alt for type-first Python | Multi-provider; Pydantic-typed everywhere. Smaller community; less mature for complex multi-agent. Right pick if your stance is "types over flexibility." |
| CrewAI | Role-based, fast adoption (60%+ Fortune 500 by Jan 2026). Multi-provider. Coarser permission model than LangGraph; less granular hook surface for layering Cerbos / Langfuse. Outgrown by teams that need fine-grained control. |
| Claude Agent SDK — rejected here | First-party for Claude. Beautiful primitives (allowed_tools, PreToolUse hooks, MCP-native, Skills, prompt caching). Claude-only. The right pick when the team has committed to Claude as the single model; ruled out by this architecture's model-portability constraint. |
| OpenAI Agents SDK — rejected here | OpenAI-only. Same shape of disqualification as Claude Agent SDK, opposite direction. |
| Raw API + DIY loop | Maximum control; you rebuild the loop, context compaction, tool-result feeding, MCP plumbing. Worth it if you have unusual constraints; otherwise the LangGraph-style runtime saves months of work. |
The deciding factor. If model portability is a stated requirement, any first-party SDK that locks to one provider is disqualified before the analysis starts. That leaves multi-provider runtimes. Among those, LangGraph's combination of maturity, multi-agent depth, and MCP support wins for production workloads. Vercel AI SDK wins for TypeScript-only shops. Pydantic AI wins for teams that prize type safety over flexibility.
The architecture lets you reverse this later: swapping LangGraph for Vercel AI SDK is roughly one day per agent, since MCP, Cerbos, Langfuse, and Inngest all sit downstream of the runtime and don't care which one is on top.
A runtime is narrow by design. Knowing exactly what LangGraph gives you (and what it doesn't) is the difference between using it cleanly and fighting it.
Model call, tool-call parsing, tool execution, result feedback, state transitions. Multi-step graphs as a first-class primitive.
init_chat_model("claude-opus-4-7") or "gpt-5" or "ollama:llama-4". Switch with a config change, not a refactor.
Connects to MCP servers directly. Points at the mcp-gateway Worker URL like any other MCP source.
Built-in human-in-the-loop interrupts. State checkpointing for replay + persistence. Pair with Inngest for cross-process pauses.
Traces flow into LangSmith out of the box; OTel exporter sends them to Langfuse instead if that's the audit backend.
Model routing across providers + cost / cache. Layered in via the model gateway (Cloudflare AI Gateway + LiteLLM) below LangGraph.
Audit / proof of work. LangGraph emits traces; Langfuse stores them via the OTel GenAI exporter.
Permission policy beyond the framework allowlist. Layered in via Cerbos, called from a custom node or pre-tool hook.
Cross-process human-in-the-loop approval. LangGraph's interrupts are in-process; for durable approvals (HTML form, hours-long wait, retries), Inngest sits alongside.
Human-facing surface for the same actions. An HTML form on Cloudflare Pages that fires the same Inngest event — identical code path.
A multi-provider runtime keeps the model itself optional, which is the whole point. But the layer isn't free. Six implications worth saying out loud.
The whole reason for picking LangGraph (or Vercel AI SDK / Pydantic AI) is that init_chat_model("<provider>:<model>") swaps the model. Going from Claude to GPT to a self-hosted Llama via Ollama is a config change, not a code change. The architecture's downstream layers (MCP, Cerbos, Langfuse, Inngest) are all provider-neutral too.
If you actually use the portability — routing one task to Claude, another to GPT, another to Llama — you need an eval harness that runs across providers. Different models have different tool-calling reliability, different context-window costs, different refusal patterns. Budget time for an eval suite from day one.
LangGraph models agents as state graphs — nodes for steps, edges for transitions, interrupts for human gates. Worth the learning curve at production scale (multi-agent orchestration, replay-debuggable workflows) but slower to start than a one-loop SDK. Expect a week of ramp-up per engineer.
LangGraph emits traces to LangSmith by default; that's the slick path. For Langfuse (self-hosted, OSS), use the OpenTelemetry GenAI exporter and accept slightly less polish in the UI in exchange for owning the trace store. The reversal cost is low — pick later if you need to.
LangGraph's init_chat_model can route to provider-direct (e.g. Anthropic SDK, OpenAI SDK) or to a unified endpoint (LiteLLM, OpenRouter, Cloudflare AI Gateway). Use the unified endpoint — centralised auth + caching + rate limits + cost metering matter more than the marginal latency of a hop. The model gateway is its own layer in the architecture, deliberately.
The Claude Agent SDK is genuinely the best fit when (and only when) Claude is the only model. If a pilot proves that's the right call — Claude wins on every task that matters — switching to Claude Agent SDK is ~1 day per agent. The lock-in becomes acceptable because it bought first-party features (Skills, MCP, prompt caching, sub-agents). Don't pre-commit; reserve the option.
Linked tersely. The SDK moves fast — verify the version against the date on this leaf.