know.2nth.ai › Agents › LangGraph

agents · LangGraph · Skill Leaf

Pick the path. Don't ask the model.

LangGraph is the graph-based orchestration framework from LangChain Inc. Built around a StateGraph primitive with explicit nodes, edges, and shared state — you draw the agent flow, the framework runs it. MIT-licensed, Python and TypeScript SDKs, native checkpointing, time-travel debugging, human-in-the-loop interrupts, deep LangSmith observability. The de-facto choice in 2026 when you want full control over agent behaviour rather than letting an LLM decide what happens next. Used in production at Klarna, Replit, Elastic, AppFolio, and across the broader LangChain ecosystem.

v1.0+ · production-ready MIT licensed Python · TypeScript 17,000+ stars LangSmith integrated

01 · What it is

An agent runtime where the graph is the program.

LangGraph is a low-level orchestration framework for building stateful, multi-step agent workflows. Released in early 2024 by LangChain Inc. and shipped as a sibling library to LangChain itself, it took a deliberately different design from the "chain" abstraction that made LangChain famous.

The thesis: most production agent failures are control-flow failures, not reasoning failures. The LLM picks the wrong tool, calls it twice, loops forever, or misses a step. LangGraph's answer is to lift the flow out of the prompt and into the framework. You define a StateGraph — nodes, edges, shared state — and the graph runs deterministically. The LLM still reasons inside nodes; the framework decides which node runs next.

That single architectural choice cascades into the things LangGraph does well: checkpointing (the state at every node is persisted, so you can resume, inspect, or fork from any step), time-travel debugging (rewind state to a previous node and re-run), human-in-the-loop interrupts (pause the graph, ask a human, resume), and fan-out / fan-in patterns (parallel branches that merge cleanly). All of which are awkward to bolt onto a "let the agent decide" framework.

LangGraph is MIT-licensed and lives across langchain-ai/langgraph (Python) and langchain-ai/langgraphjs (TypeScript). The Python repo crossed 17,000 GitHub stars in 2026 and ships ~weekly releases. The framework reached its 1.0 stable release in late 2024, signalling production-readiness; v1.x has been the recommended track since.

Why "graph" not "chain" not "agent"

LangChain's original abstraction was the chain — linear pipelines of LLM calls. The agent abstraction (ReAct loops) added autonomy but lost predictability. LangGraph's graph sits between: as flexible as an agent (cycles, branches, dynamic routing) but as predictable as a chain (you can read the graph and know what it does). For most production workloads, that middle position is structurally where the work lives.

02 · How it works

Five primitives carry almost everything.

State, Nodes, Edges, Checkpointer, and Interrupts. Master those five and 90% of LangGraph clicks. The rest is integration: which LLM, which tool layer, which deployment surface.

State is the single shared object every node reads from and writes to. Defined as a TypedDict in Python (or a Zod schema in TS), it's the contract between nodes. Nodes are plain functions that take state and return a partial state update. Edges connect nodes — either statically (always go from A to B) or conditionally (a function inspects state and returns the next node name).

The minimal LangGraph in Python:

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_anthropic import ChatAnthropic

class State(TypedDict):
    messages: Annotated[list, add_messages]

def chatbot(state: State):
    llm = ChatAnthropic(model="claude-sonnet-4-6")
    return {"messages": [llm.invoke(state["messages"])]}

graph = StateGraph(State)
graph.add_node("chatbot", chatbot)
graph.add_edge(START, "chatbot")
graph.add_edge("chatbot", END)
app = graph.compile()

# Run it
result = app.invoke({"messages": [("user", "Hello")]})

Conditional edges are where LangGraph earns its keep. A function reads state and decides which node runs next. This is how you build routing, retries, and tool-use loops without the LLM picking from a freeform menu:

def route(state: State) -> str:
    last = state["messages"][-1]
    if last.tool_calls:
        return "tools"
    return END

graph.add_conditional_edges("chatbot", route, {
    "tools": "tool_node",
    END: END,
})
graph.add_edge("tool_node", "chatbot")  # back to the LLM

Checkpointing. Every node executes inside a transaction. If you compile the graph with a Checkpointer (in-memory, SQLite, Postgres, or Redis backends), the state at every step is persisted. That gives you free wins:

Resume after crash — pick up where the graph left off
Time travel — rewind to any prior state and re-run from there
Multi-turn conversations — same thread_id keeps history
Audit trail — every step is recorded, with input and output

from langgraph.checkpoint.postgres import PostgresSaver

checkpointer = PostgresSaver.from_conn_string("postgresql://...")
app = graph.compile(checkpointer=checkpointer)

# Same thread_id keeps state across calls
config = {"configurable": {"thread_id": "user-42"}}
app.invoke({"messages": [("user", "Hi")]}, config)
app.invoke({"messages": [("user", "What did I just say?")]}, config)

Human-in-the-loop interrupts. Pause the graph mid-execution, let a human review or edit state, then resume. This is the load-bearing feature for any agent that touches money, sends external messages, or has compliance constraints:

from langgraph.types import interrupt, Command

def approve_payment(state: State):
    decision = interrupt({"amount": state["amount"]})
    if decision["approved"]:
        return {"status": "approved"}
    return {"status": "rejected"}

# Run; graph pauses at interrupt(). Resume with a Command:
app.invoke(Command(resume={"approved": True}), config)

The structural advantage over "agent decides" frameworks

In a freeform agent, "did the model approve a $50,000 transfer?" is a prompt-engineering question. In LangGraph, it's an architectural fact — the approval node either ran or it didn't, the interrupt either fired or it didn't, and you can read the graph to know which is true. Predictable control flow makes audit and compliance tractable. That's the productivity argument for the extra boilerplate.

03 · The ecosystem

LangChain inheritance, LangSmith observability, LangGraph Platform.

LangGraph alone is a graph runtime. Plug it into the LangChain ecosystem and you get the largest tool integration library in agent-land. Plug LangSmith on top and you get the most mature observability story in the field. Wire in LangGraph Platform (the commercial managed runtime) and you get hosted deployment with checkpointing, scaling, and monitoring as managed services.

Library · runtime

LangGraph

The MIT-licensed graph runtime itself. Python and TypeScript SDKs. Self-host anywhere — Docker, Cloud Run, Lambda, Hetzner droplets, your laptop.

Library · tools

LangChain

The sibling library. 700+ tool integrations, retrievers, document loaders, output parsers, agents. LangGraph nodes typically use LangChain tools internally.

SaaS · observability

LangSmith

Commercial observability for LangChain & LangGraph runs. Traces, evaluations, prompt management, dataset curation. Free tier available; paid tiers for production volume.

SaaS · runtime

LangGraph Platform

Managed runtime for LangGraph apps. Hosted checkpointing, autoscaling, threads API, Studio (visual graph debugger). Self-host or cloud-host options.

The honest commercial picture. LangGraph the library is genuinely free MIT. The free tier of LangSmith covers small teams. At production scale — high throughput, long retention, multi-seat — LangSmith and LangGraph Platform are the revenue model. Most teams that adopt LangGraph end up paying LangChain Inc. for one or both. That's the trade compared with Google ADK + Vertex AI Agent Engine (different vendor, similar pattern) or self-hosting LangGraph + a roll-your-own observability layer (more work, no SaaS bill).

04 · vs other frameworks

Where LangGraph wins, where it doesn't.

LangGraph and Google ADK are the two strongest "build production agents" frameworks in 2026. They make different bets. The right choice depends less on quality — both are excellent — and more on ecosystem alignment, deployment surface, and how much explicit control you want.

Dimension	LangGraph	Google ADK
Default control style	Explicit graph — you draw it	Hierarchical agents — LLM routes
Cloud alignment	Vendor-agnostic; runs anywhere	GCP-first; managed surface on Vertex AI
Tool ecosystem	LangChain — the largest in agent-land	MCP + LangchainTool / CrewaiTool adapters
Observability	LangSmith (paid) — the most mature	OpenTelemetry GenAI + Cloud Trace
Voice / multimodal	Possible, not native	Native via Gemini Live API
Checkpointing & time-travel	Native, with multiple backends	Sessions + Memory Bank (managed)
Boilerplate	Higher — you write the graph	Lower — agents wire themselves up
Languages	Python, TypeScript	Python, TypeScript, Go, Java
A2A interop	Community wrapper	First-party (`to_a2a()`)
Best fit	Predictable flows, compliance-heavy work, multi-vendor stacks	GCP deployments, Gemini-first, voice agents, faster prototypes

The combine-them pattern most production teams use

LangGraph and ADK aren't mutually exclusive. The pattern that's emerging in 2026: LangGraph for the orchestration "control plane" (where you want explicit edges and audit), ADK or specialist frameworks for the "data plane" (Gemini-Live voice agents, Vertex-managed sub-agents). The two talk via A2A. LangGraph's strength as the orchestrator is the explicit graph; ADK's strength as a sub-agent is the managed runtime. Use both for what each is best at.

05 · Use cases

Where LangGraph earns its keep.

Six patterns where LangGraph has shipped at scale, drawn from the LangChain Inc. case studies, public reference architectures, and production engineering blogs.

Customer support routing — Klarna's LangGraph-based AI assistant handles ~2.3M conversations and saved an estimated $40M annually (LangChain case study). Graph encodes routing logic, escalation thresholds, and human handoff rules.
Code agent orchestration — Replit Agent uses LangGraph to manage multi-step coding flows: plan, execute, test, fix, retry. Checkpointing means the agent can resume after a build fails or a sandbox times out.
Document processing pipelines — multi-stage extract / classify / enrich / validate flows, where each step needs explicit retry rules and the final output triggers downstream business logic.
Compliance-heavy automation — banking, insurance, healthcare. Human-in-the-loop interrupts at known decision points; the audit trail comes from the checkpointer; the graph itself is the policy document reviewers can read.
Research / report generation — multi-agent fan-out: parallel sub-agents research different angles, results merge into a single report. ParallelAgent-style patterns natively expressed as graph branches.
Internal developer agents — PR review, ticket triage, on-call assist. LangGraph's checkpointing means the agent can pause, ask the engineer for clarification, and resume without losing context.

06 · Decision guide

Pick LangGraph when. Skip LangGraph when.

LangGraph is opinionated. The opinion is that explicit control beats implicit reasoning for production work, and the cost is more boilerplate. That trade is right for some teams and wrong for others. Honest two-sided guidance follows.

Use LangGraph when

You need explicit control over agent flow — not "let the LLM decide"
Compliance or audit requirements demand a readable control-flow document
Human-in-the-loop interrupts are load-bearing (payments, content moderation, contracts)
Long-running flows that need to resume after crash or pause
Your team already uses LangChain — tool ecosystem inheritance is the win
Multi-vendor LLM strategy: Claude + GPT + Gemini + Ollama in one graph
You're fine paying for LangSmith / LangGraph Platform at scale

Skip LangGraph when

Single-shot agents or short ReAct loops — the boilerplate isn't worth it
You're fully on GCP and Gemini-first — ADK fits better
You need bidirectional voice / video natively — ADK's Gemini Live integration wins
Rapid-prototype-and-throw-away work — CrewAI is faster to first demo
Cost-sensitive at scale and avoiding LangSmith fees — self-host plus your own observability is harder than expected
Your team has zero LangChain context and the team's main language isn't Python or TypeScript

Pinning advice

LangGraph is on a stable v1.x track. Pin minor versions in production (langgraph~=1.x) but expect to update monthly — the API is stable but the ecosystem moves. LangChain itself has a faster cadence; if you depend on specific LangChain integrations, pin those more conservatively. The graph definition itself is the durable artefact — you can usually upgrade the runtime without rewriting the graph.

07 · South African context

Why explicit graphs matter more in SA delivery.

LangGraph's vendor-agnostic shape and explicit control flow have practical leverage in SA delivery contexts. POPIA compliance is easier when the audit trail is a checkpointer table, not a vendor's proprietary log. Multi-cloud is easier when the framework runs anywhere. Cost predictability is easier when you can swap an expensive cloud LLM for a local Ollama call by changing one node.

Enterprise · SA banks, insurers, telcos

For SA banks, insurers, and telcos with mixed-cloud or AWS-first estates, LangGraph is the right framework choice if ADK's GCP-bias is a problem. Run on EKS in Cape Town (AWS af-south-1), checkpoint to RDS Postgres in the same region, observe via LangSmith or self-hosted Phoenix. The audit trail stays on-region; the framework doesn't care which LLM you call. POPIA Section 72 cross-border-transfer concerns become a per-node decision: which nodes call Claude (US), which call Vertex (JHB), which call local Ollama. The graph itself documents the policy.

Studio · mid-market builds

For mid-market builds, LangGraph + Cloud Run + a small Postgres instance for checkpointing is a productive stack. LangSmith free tier covers most pilots; you can defer the paid SKU until volume actually justifies it. The pragmatic path for SA studios: start in LangGraph dev mode on a developer laptop, ship to Cloud Run for staging, only move to LangGraph Platform if a client wants the managed runtime + Studio visual debugger.

Learning · explicit control flow as a teaching tool

LangGraph is one of the better frameworks to learn agent patterns on, precisely because the abstractions are explicit. You can see the nodes, the edges, the state. The official LangChain Academy course is free and SA-friendly (no Google Cloud account or paid model needed — pair with Ollama-hosted Gemma 3 for the model layer). For SA developers learning agentic patterns in 2026, LangGraph + Ollama is a credible zero-cost learning stack.

FX cost note. LangSmith is USD-billed and the paid tiers add up at production volume. If FX exposure is a constraint, plan for either: (a) self-hosting an OpenTelemetry-based observability stack (more engineering work upfront), (b) using LangSmith's free tier and accepting the 14-day retention limit, or (c) treating LangSmith as a strategic line item and locking in annual pricing.

08 · Connections

Where LangGraph links in the tree.

agents

Agents hub

The sub-tree landing. LangGraph sits in the Frameworks band alongside ADK, CrewAI, and the model-vendor SDKs.

agents/google-adk

Google ADK

The major comparison anchor. ADK and LangGraph both target production agents but make different bets on control style and cloud alignment. Often combined via A2A.

agents/a2a

A2A Protocol

Cross-framework interop layer. Community wrapper for LangGraph; the seam where ADK orchestrators talk to LangGraph specialists.

agents/mcp

Model Context Protocol

Tool / data protocol. LangGraph nodes commonly call MCP servers via community adapters or wrap them as LangChain tools.

agents/ollama

Ollama

Local LLM runtime. Plug into a LangGraph node via ChatOllama from langchain-ollama; same node code works against frontier APIs and local models.

agents/gemma

Gemma

Google's open-weights family. Common pairing with LangGraph for cost-sensitive or POPIA-residency workloads — same graph, different model, no vendor lock-in.

agents/anthropic-agent-sdk

Anthropic Agent SDK

Claude-first agent framework. LangGraph + Claude is one of the most common combinations in production; LangGraph nodes call Claude via langchain-anthropic.

tech/cloudflare/workers

Cloudflare Workers

Possible deployment surface for LangGraph TS via Node-compat. Less common than Cloud Run / Lambda but viable for low-latency edge orchestrators.

09 · Resources

Primary sources.

Authored from the canonical langchain-ai/langgraph repo, the LangGraph documentation, the LangChain Academy course materials, and LangChain Inc.'s public case studies and engineering blog posts. Last reviewed 2026-05-10.

Docs LangGraph documentation langchain-ai.github.io/langgraph Source · Python langchain-ai/langgraph github.com/langchain-ai/langgraph Source · TypeScript langchain-ai/langgraphjs github.com/langchain-ai/langgraphjs Course LangChain Academy · LangGraph track (free) academy.langchain.com Site LangGraph product page langchain.com/langgraph Platform LangGraph Platform (managed runtime) langchain.com/langgraph-platform Observability LangSmith smith.langchain.com Blog LangChain engineering blog blog.langchain.dev