know.2nth.ai Agents LangGraph
agents · LangGraph · Skill Leaf

Pick the path. Don't ask the model.

LangGraph is the graph-based orchestration framework from LangChain Inc. Built around a StateGraph primitive with explicit nodes, edges, and shared state — you draw the agent flow, the framework runs it. MIT-licensed, Python and TypeScript SDKs, native checkpointing, time-travel debugging, human-in-the-loop interrupts, deep LangSmith observability. The de-facto choice in 2026 when you want full control over agent behaviour rather than letting an LLM decide what happens next. Used in production at Klarna, Replit, Elastic, AppFolio, and across the broader LangChain ecosystem.

v1.0+ · production-ready MIT licensed Python · TypeScript 17,000+ stars LangSmith integrated

An agent runtime where the graph is the program.

LangGraph is a low-level orchestration framework for building stateful, multi-step agent workflows. Released in early 2024 by LangChain Inc. and shipped as a sibling library to LangChain itself, it took a deliberately different design from the "chain" abstraction that made LangChain famous.

The thesis: most production agent failures are control-flow failures, not reasoning failures. The LLM picks the wrong tool, calls it twice, loops forever, or misses a step. LangGraph's answer is to lift the flow out of the prompt and into the framework. You define a StateGraph — nodes, edges, shared state — and the graph runs deterministically. The LLM still reasons inside nodes; the framework decides which node runs next.

That single architectural choice cascades into the things LangGraph does well: checkpointing (the state at every node is persisted, so you can resume, inspect, or fork from any step), time-travel debugging (rewind state to a previous node and re-run), human-in-the-loop interrupts (pause the graph, ask a human, resume), and fan-out / fan-in patterns (parallel branches that merge cleanly). All of which are awkward to bolt onto a "let the agent decide" framework.

LangGraph is MIT-licensed and lives across langchain-ai/langgraph (Python) and langchain-ai/langgraphjs (TypeScript). The Python repo crossed 17,000 GitHub stars in 2026 and ships ~weekly releases. The framework reached its 1.0 stable release in late 2024, signalling production-readiness; v1.x has been the recommended track since.

Why "graph" not "chain" not "agent"

LangChain's original abstraction was the chain — linear pipelines of LLM calls. The agent abstraction (ReAct loops) added autonomy but lost predictability. LangGraph's graph sits between: as flexible as an agent (cycles, branches, dynamic routing) but as predictable as a chain (you can read the graph and know what it does). For most production workloads, that middle position is structurally where the work lives.

Five primitives carry almost everything.

State, Nodes, Edges, Checkpointer, and Interrupts. Master those five and 90% of LangGraph clicks. The rest is integration: which LLM, which tool layer, which deployment surface.

State is the single shared object every node reads from and writes to. Defined as a TypedDict in Python (or a Zod schema in TS), it's the contract between nodes. Nodes are plain functions that take state and return a partial state update. Edges connect nodes — either statically (always go from A to B) or conditionally (a function inspects state and returns the next node name).

The minimal LangGraph in Python:

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_anthropic import ChatAnthropic

class State(TypedDict):
    messages: Annotated[list, add_messages]

def chatbot(state: State):
    llm = ChatAnthropic(model="claude-sonnet-4-6")
    return {"messages": [llm.invoke(state["messages"])]}

graph = StateGraph(State)
graph.add_node("chatbot", chatbot)
graph.add_edge(START, "chatbot")
graph.add_edge("chatbot", END)
app = graph.compile()

# Run it
result = app.invoke({"messages": [("user", "Hello")]})

Conditional edges are where LangGraph earns its keep. A function reads state and decides which node runs next. This is how you build routing, retries, and tool-use loops without the LLM picking from a freeform menu:

def route(state: State) -> str:
    last = state["messages"][-1]
    if last.tool_calls:
        return "tools"
    return END

graph.add_conditional_edges("chatbot", route, {
    "tools": "tool_node",
    END: END,
})
graph.add_edge("tool_node", "chatbot")  # back to the LLM

Checkpointing. Every node executes inside a transaction. If you compile the graph with a Checkpointer (in-memory, SQLite, Postgres, or Redis backends), the state at every step is persisted. That gives you free wins:

  • Resume after crash — pick up where the graph left off
  • Time travel — rewind to any prior state and re-run from there
  • Multi-turn conversations — same thread_id keeps history
  • Audit trail — every step is recorded, with input and output
from langgraph.checkpoint.postgres import PostgresSaver

checkpointer = PostgresSaver.from_conn_string("postgresql://...")
app = graph.compile(checkpointer=checkpointer)

# Same thread_id keeps state across calls
config = {"configurable": {"thread_id": "user-42"}}
app.invoke({"messages": [("user", "Hi")]}, config)
app.invoke({"messages": [("user", "What did I just say?")]}, config)

Human-in-the-loop interrupts. Pause the graph mid-execution, let a human review or edit state, then resume. This is the load-bearing feature for any agent that touches money, sends external messages, or has compliance constraints:

from langgraph.types import interrupt, Command

def approve_payment(state: State):
    decision = interrupt({"amount": state["amount"]})
    if decision["approved"]:
        return {"status": "approved"}
    return {"status": "rejected"}

# Run; graph pauses at interrupt(). Resume with a Command:
app.invoke(Command(resume={"approved": True}), config)

The structural advantage over "agent decides" frameworks

In a freeform agent, "did the model approve a $50,000 transfer?" is a prompt-engineering question. In LangGraph, it's an architectural fact — the approval node either ran or it didn't, the interrupt either fired or it didn't, and you can read the graph to know which is true. Predictable control flow makes audit and compliance tractable. That's the productivity argument for the extra boilerplate.

LangChain inheritance, LangSmith observability, LangGraph Platform.

LangGraph alone is a graph runtime. Plug it into the LangChain ecosystem and you get the largest tool integration library in agent-land. Plug LangSmith on top and you get the most mature observability story in the field. Wire in LangGraph Platform (the commercial managed runtime) and you get hosted deployment with checkpointing, scaling, and monitoring as managed services.

Library · runtime

LangGraph

The MIT-licensed graph runtime itself. Python and TypeScript SDKs. Self-host anywhere — Docker, Cloud Run, Lambda, Hetzner droplets, your laptop.

Library · tools

LangChain

The sibling library. 700+ tool integrations, retrievers, document loaders, output parsers, agents. LangGraph nodes typically use LangChain tools internally.

SaaS · observability

LangSmith

Commercial observability for LangChain & LangGraph runs. Traces, evaluations, prompt management, dataset curation. Free tier available; paid tiers for production volume.

SaaS · runtime

LangGraph Platform

Managed runtime for LangGraph apps. Hosted checkpointing, autoscaling, threads API, Studio (visual graph debugger). Self-host or cloud-host options.

The honest commercial picture. LangGraph the library is genuinely free MIT. The free tier of LangSmith covers small teams. At production scale — high throughput, long retention, multi-seat — LangSmith and LangGraph Platform are the revenue model. Most teams that adopt LangGraph end up paying LangChain Inc. for one or both. That's the trade compared with Google ADK + Vertex AI Agent Engine (different vendor, similar pattern) or self-hosting LangGraph + a roll-your-own observability layer (more work, no SaaS bill).

Where LangGraph wins, where it doesn't.

LangGraph and Google ADK are the two strongest "build production agents" frameworks in 2026. They make different bets. The right choice depends less on quality — both are excellent — and more on ecosystem alignment, deployment surface, and how much explicit control you want.

DimensionLangGraphGoogle ADK
Default control styleExplicit graph — you draw itHierarchical agents — LLM routes
Cloud alignmentVendor-agnostic; runs anywhereGCP-first; managed surface on Vertex AI
Tool ecosystemLangChain — the largest in agent-landMCP + LangchainTool / CrewaiTool adapters
ObservabilityLangSmith (paid) — the most matureOpenTelemetry GenAI + Cloud Trace
Voice / multimodalPossible, not nativeNative via Gemini Live API
Checkpointing & time-travelNative, with multiple backendsSessions + Memory Bank (managed)
BoilerplateHigher — you write the graphLower — agents wire themselves up
LanguagesPython, TypeScriptPython, TypeScript, Go, Java
A2A interopCommunity wrapperFirst-party (to_a2a())
Best fitPredictable flows, compliance-heavy work, multi-vendor stacksGCP deployments, Gemini-first, voice agents, faster prototypes

The combine-them pattern most production teams use

LangGraph and ADK aren't mutually exclusive. The pattern that's emerging in 2026: LangGraph for the orchestration "control plane" (where you want explicit edges and audit), ADK or specialist frameworks for the "data plane" (Gemini-Live voice agents, Vertex-managed sub-agents). The two talk via A2A. LangGraph's strength as the orchestrator is the explicit graph; ADK's strength as a sub-agent is the managed runtime. Use both for what each is best at.

Where LangGraph earns its keep.

Six patterns where LangGraph has shipped at scale, drawn from the LangChain Inc. case studies, public reference architectures, and production engineering blogs.

  • Customer support routing — Klarna's LangGraph-based AI assistant handles ~2.3M conversations and saved an estimated $40M annually (LangChain case study). Graph encodes routing logic, escalation thresholds, and human handoff rules.
  • Code agent orchestration — Replit Agent uses LangGraph to manage multi-step coding flows: plan, execute, test, fix, retry. Checkpointing means the agent can resume after a build fails or a sandbox times out.
  • Document processing pipelines — multi-stage extract / classify / enrich / validate flows, where each step needs explicit retry rules and the final output triggers downstream business logic.
  • Compliance-heavy automation — banking, insurance, healthcare. Human-in-the-loop interrupts at known decision points; the audit trail comes from the checkpointer; the graph itself is the policy document reviewers can read.
  • Research / report generation — multi-agent fan-out: parallel sub-agents research different angles, results merge into a single report. ParallelAgent-style patterns natively expressed as graph branches.
  • Internal developer agents — PR review, ticket triage, on-call assist. LangGraph's checkpointing means the agent can pause, ask the engineer for clarification, and resume without losing context.

Pick LangGraph when. Skip LangGraph when.

LangGraph is opinionated. The opinion is that explicit control beats implicit reasoning for production work, and the cost is more boilerplate. That trade is right for some teams and wrong for others. Honest two-sided guidance follows.

Use LangGraph when

  • You need explicit control over agent flow — not "let the LLM decide"
  • Compliance or audit requirements demand a readable control-flow document
  • Human-in-the-loop interrupts are load-bearing (payments, content moderation, contracts)
  • Long-running flows that need to resume after crash or pause
  • Your team already uses LangChain — tool ecosystem inheritance is the win
  • Multi-vendor LLM strategy: Claude + GPT + Gemini + Ollama in one graph
  • You're fine paying for LangSmith / LangGraph Platform at scale

Pinning advice

LangGraph is on a stable v1.x track. Pin minor versions in production (langgraph~=1.x) but expect to update monthly — the API is stable but the ecosystem moves. LangChain itself has a faster cadence; if you depend on specific LangChain integrations, pin those more conservatively. The graph definition itself is the durable artefact — you can usually upgrade the runtime without rewriting the graph.

Why explicit graphs matter more in SA delivery.

LangGraph's vendor-agnostic shape and explicit control flow have practical leverage in SA delivery contexts. POPIA compliance is easier when the audit trail is a checkpointer table, not a vendor's proprietary log. Multi-cloud is easier when the framework runs anywhere. Cost predictability is easier when you can swap an expensive cloud LLM for a local Ollama call by changing one node.

Enterprise · SA banks, insurers, telcos

For SA banks, insurers, and telcos with mixed-cloud or AWS-first estates, LangGraph is the right framework choice if ADK's GCP-bias is a problem. Run on EKS in Cape Town (AWS af-south-1), checkpoint to RDS Postgres in the same region, observe via LangSmith or self-hosted Phoenix. The audit trail stays on-region; the framework doesn't care which LLM you call. POPIA Section 72 cross-border-transfer concerns become a per-node decision: which nodes call Claude (US), which call Vertex (JHB), which call local Ollama. The graph itself documents the policy.

Studio · mid-market builds

For mid-market builds, LangGraph + Cloud Run + a small Postgres instance for checkpointing is a productive stack. LangSmith free tier covers most pilots; you can defer the paid SKU until volume actually justifies it. The pragmatic path for SA studios: start in LangGraph dev mode on a developer laptop, ship to Cloud Run for staging, only move to LangGraph Platform if a client wants the managed runtime + Studio visual debugger.

Learning · explicit control flow as a teaching tool

LangGraph is one of the better frameworks to learn agent patterns on, precisely because the abstractions are explicit. You can see the nodes, the edges, the state. The official LangChain Academy course is free and SA-friendly (no Google Cloud account or paid model needed — pair with Ollama-hosted Gemma 3 for the model layer). For SA developers learning agentic patterns in 2026, LangGraph + Ollama is a credible zero-cost learning stack.

FX cost note. LangSmith is USD-billed and the paid tiers add up at production volume. If FX exposure is a constraint, plan for either: (a) self-hosting an OpenTelemetry-based observability stack (more engineering work upfront), (b) using LangSmith's free tier and accepting the 14-day retention limit, or (c) treating LangSmith as a strategic line item and locking in annual pricing.

Where LangGraph links in the tree.

Primary sources.

Authored from the canonical langchain-ai/langgraph repo, the LangGraph documentation, the LangChain Academy course materials, and LangChain Inc.'s public case studies and engineering blog posts. Last reviewed 2026-05-10.