know.2nth.ai › Tools

tools · landscape map

The agent tool landscape.

What categories of tooling an agent platform actually needs to ship into production, what credible options exist in each, and how to choose between them. Reference knowledge — no claims about anyone's deployment. The point of this leaf is orientation: most teams get one or two categories right and discover the others under deadline pressure. This page maps them so the gaps surface before the deadline, not after.

Last reviewed: 2026-05-17 · Cadence: hot (quarterly) · Worked example: tools/example

Eleven categories, plus one cross-cutting.

An agent is software that calls external systems with permissions and leaves a trail. To do that in production — not in a demo — you need eleven categories of tooling, plus one that cross-cuts the rest. They are not all the same category of decision: some picks are sticky (the tool protocol your whole stack speaks), some are reversible (the audit backend, if you stuck to a standard), some are cheap-to-buy, some are cheap-to-build. A team that maps all ten at the start avoids the worst mistake: discovering at week six that there is no plan for “how a human approves before this sends.”

The categories below are the ones that show up in every credible agent stack as of mid-2026. The twelfth (cross-cutting), pre-built vendor MCP servers, applies across all of them — it is the universe of integrations agents can call without writing custom code.

02 · The ten categories at a glance

Each card: what the category is, and the question that picks the layer apart.

Read this section to orient. The next section breaks each category into its credible options.

Runtime / framework

What hosts the agent loop — the model call, the tool-call parsing, the result feedback, the next call. Manages context window and (often) MCP plumbing.

Multi-provider or first-party?

Model gateway / routing

Where calls to the model actually go. Routes between providers (Claude, GPT, Gemini, Ollama, vLLM), handles fallback + caching + rate limits + cost tracking. The layer that buys you model portability.

Edge gateway, unified API, or self-hosted proxy?

Tool protocol

How the agent talks to external systems. Defines tool schemas, invocation, error handling, streaming. The choice composes (or doesn't) with everything downstream.

Standardised or per-vendor?

Tool registries / discovery

Where agents and humans find tools to call. Catalogues of pre-built integrations with auth handled. Often the only way to avoid writing custom adapter code.

Free + community, or paid + curated?

Permissions / policy

What each agent is allowed to do, expressed as policy a non-engineer can read. The load-bearing layer for any audit / compliance / regulated-environment story.

Centralised policy engine or framework allowlists?

Audit / observability

How a human (or another agent) reconstructs what happened. Replayable traces, prompt + result + token + cost capture, by-agent / by-tool / by-time queries.

OSS self-hosted or SaaS managed?

Sandboxed code execution

Where agent-written code runs without taking down the host. Tradeoff is between cold start and isolation strength; hardware vs container vs language-level.

Hardware isolation or container isolation?

Dual-use / workflow

The same operation triggered by humans (form) or agents (tool call). Durable functions with sleep, retry, fan-out, and "pause until event" as first-class primitives.

First-class workflow primitive or DIY?

Identity / auth

Who the agent is when it acts. Short-TTL scoped tokens, per-agent service accounts, sender-constraint proofs. Without this, "permissions" is just a name on a policy file.

Edge service tokens or enterprise IdP?

Approval / human-in-the-loop

What blocks until a human says yes. Mutating actions, financial transfers, message sends, irreversible state changes — all candidates. Cheap to add, expensive to retrofit.

Built into the workflow engine or external UI?

Memory / state

What the agent remembers between calls and across sessions. Per-conversation, per-user, per-organisation; managed memory vs explicit state machines vs DIY.

Per-session state or persistent memory?

12 · cross-cut

Pre-built vendor MCP servers

The universe of packaged integrations agents can call without writing custom code. 500+ servers in the official MCP registry as of mid-2026, first-party + community mixed.

First-party only, or accept community?

03 · Options in each category

Credible picks, who maintains them, the one-line trade-off.

Each table has three columns: pick, maintainer, and trade-off / when to choose. Not exhaustive — just the picks that show up in production agent stacks in mid-2026. Skim the picks; the rationale for any single choice belongs in its own decision leaf.

01Runtime / framework

The agent's host environment. Manages the model call ↔ tool call ↔ result feedback loop, often the MCP plumbing too. Pick the one closest to the model you are using; switch costs are real but bounded (~1 day per agent).

Pick	Maintainer	Trade-off / when to choose
Claude Agent SDK	Anthropic (first-party)	Best fit if Claude is the model. Native `allowed_tools` + `PreToolUse` hooks; MCP-aware out of the box.
LangGraph	LangChain	Best for stateful multi-agent orchestration. Multi-provider. v1.0 in late 2025; 90k+ stars.
Vercel AI SDK v6	Vercel	TypeScript-first, lovely DX, ToolLoopAgent for production. Narrower scope than LangGraph.
CrewAI	community + commercial	Role-based, fast adoption (60%+ Fortune 500 by Jan 2026), coarser permission model.
OpenAI Agents SDK	OpenAI (first-party)	Best fit if OpenAI is the model. Locks the runtime to OpenAI.
Microsoft Agent Framework	Microsoft (unified AutoGen + SK)	Enterprise .NET / Azure shops. GA Q1 2026.
Pydantic AI	Pydantic	Type-first Python; small but credible.
Raw model API + DIY loop	—	Maximum control; you rebuild the loop, context compaction, MCP plumbing yourself.

02Model gateway / routing

Where calls to the model actually go. The layer that decouples agent code from “which model does this call?” Without it, swapping from Claude to Gemini means refactoring; with it, it's a config change. Cheap to add early, expensive to retrofit once dozens of code paths call anthropic.messages.create (or equivalent) directly. The category that buys you model portability — open weights through frontier — without rewrites.

Pick	Maintainer	Trade-off / when to choose
Cloudflare AI Gateway	Cloudflare	Edge-native, caching + rate limits + analytics built in. Cheapest if you're already on Cloudflare; fits the edge-first story.
OpenRouter	OpenRouter	Unified API across 100+ models including open-weights (Llama, Qwen, DeepSeek) and frontiers (Claude, GPT, Gemini). Pay-per-call.
LiteLLM	BerriAI (OSS)	Self-hostable OpenAI-compatible proxy that routes anywhere (Anthropic, OpenAI, Bedrock, Vertex, Ollama). Strong control-plane choice.
Vertex AI Model Garden	Google	Google's hosted catalogue. Strong for Gemini + Claude + Llama in one place; GCP-native auth.
Amazon Bedrock	AWS	AWS-native model gateway. Right if you're already on AWS; heavy otherwise.
Ollama / vLLM (direct)	OSS	For self-hosting open-weights models. Pair with LiteLLM upstream or call directly.

03Tool protocol

How the agent talks to external systems. This pick is sticky — everything downstream depends on it. Bias toward standards.

Pick	Maintainer	Trade-off / when to choose
MCP (Model Context Protocol)	Anthropic-origin, open spec	The vendor-neutral standard. 500+ servers in the official registry. Cross-runtime by design.
OpenAI function-calling	OpenAI	Works inside OpenAI tooling. Tool definitions tied to one provider.
Direct HTTP / vendor SDKs	per-vendor	No discovery, no composition; fine for one-off integrations, awful as the count grows.
GraphQL endpoints	per-team	Niche — useful only if the backend is already GraphQL-native.

04Tool registries / discovery

Where the agent finds tools to call. The alternative to a registry is writing your own adapters — viable for 5 tools, unmanageable at 50.

Pick	Maintainer	Trade-off / when to choose
Official MCP Registry	Anthropic-maintained	The canonical discovery surface. Free. Listings for first-party and community servers.
Composio	Composio (SaaS)	500+ pre-built actions with OAuth handled. Paid.
Arcade.dev	Arcade (commercial)	MCP-first runtime with per-user scoped tokens. Smaller catalog than Composio.
Pipedream	Pipedream (SaaS)	2000+ app integrations. Agent-callable via API. Workflow-flavoured rather than MCP-native.

05Permissions / policy

What each agent is allowed to do. Externalised from code so non-engineers (and auditors) can read it. Without a policy engine, you are doing RBAC in code — fine at small scale, awful past it.

Pick	Maintainer	Trade-off / when to choose
Cerbos	Cerbos (OSS, Apache-2)	YAML policies, sub-ms decisions, self-hostable. Partner-readable.
Permit.io	Permit (SaaS)	Managed RBAC/ABAC with a UI. Policies live in their database.
OPA / Rego	CNCF (OSS)	Generalist policy engine. Rego is powerful but opaque; higher learning curve.
Framework allowlists	per-framework	Built into the SDK (e.g. Claude SDK's `allowed_tools`). Sufficient at small scale.
Aembit	Aembit (commercial)	Machine-identity-first. Newer entrant, narrower focus.

06Audit / observability

How you reconstruct what happened. Three things matter: that traces are replayable, that args and results are captured (not just metadata), and that the schema follows the OpenTelemetry GenAI conventions so the backend is swappable.

Pick	Maintainer	Trade-off / when to choose
Langfuse	Langfuse (OSS, MIT; ClickHouse-acquired Jan 2026)	Self-hostable, replayable traces, free at scale. Most popular OSS pick.
LangSmith	LangChain (SaaS)	Deepest LangGraph integration; per-seat priced.
Helicone	Helicone (OSS + SaaS)	Proxy-based, simplest to deploy. Narrower than dedicated trace stores.
Braintrust	Braintrust (SaaS)	Eval-first; tracing is secondary. Strong for evaluation workflows.
Arize Phoenix	Arize (OSS)	ML-grade rigor; embedding analysis. Mixed LLM + traditional ML.
Pydantic Logfire	Pydantic (SaaS)	AI-specific UI; emerging.
OpenTelemetry GenAI	OTel community (standard)	The vendor-neutral semantic conventions underneath every backend. Not a backend itself.

07Sandboxed code execution

Where agent-written code runs without taking down the host. Hardware isolation (Firecracker) is the credible pick for untrusted output; container isolation is faster but weaker.

Pick	Maintainer	Trade-off / when to choose
E2B	E2B (commercial)	Firecracker microVMs — hardware boundary. Sub-second cold starts.
Daytona	Daytona (commercial)	Docker-based, sub-90ms cold starts. Container-level isolation.
Modal Sandboxes	Modal (commercial)	GPU support, gVisor isolation. Higher cost; niche unless you need GPU.
Cloudflare Sandboxes	Cloudflare	Edge-native, emerging. Worth tracking, not yet the production pick.

08Dual-use / workflow

The same operation, callable by a human (HTML form) or by an agent (MCP tool). Durable functions with "pause until event" are the cleanest way to model this so the audit trail is identical for both callers.

Pick	Maintainer	Trade-off / when to choose
Inngest	Inngest (OSS core + cloud)	Durable functions; "pause until event" primitive. Free tier covers pilot scale.
Trigger.dev v3	Trigger.dev (OSS)	Similar to Inngest. TypeScript-first.
Temporal	Temporal (OSS)	Heavyweight enterprise option. Powerful, more infrastructure.
n8n	n8n (OSS + cloud)	Low-code workflows. Bidirectional MCP integration since April 2025.
Hookdeck	Hookdeck (SaaS)	Webhook infrastructure with an MCP server. Good when webhooks dominate.
DIY: one function, multiple entry points	—	Works at small scale; you re-implement durability when you grow.

09Identity / auth

Who the agent is when it acts. Short-TTL scoped tokens per (agent, tool) pair, signed by the runtime. Without this layer, "permissions" is just a wishful name on a policy file.

Pick	Maintainer	Trade-off / when to choose
Cloudflare Access service tokens	Cloudflare	Zero-trust, edge-native, no separate IdP. Cheapest path if you are already on Cloudflare.
Auth0 / Okta Agent IAM	Okta (SaaS)	Enterprise identity, broader scope. Heavier; right for regulated industries.
SPIFFE / SPIRE	CNCF (OSS)	Workload identity standard for K8s. Right if you are running on Kubernetes.
DPoP-bound JWTs	open spec (RFC 9449)	Sender-constrained tokens. Most secure; most implementation work.
Asgardeo / WSO2	WSO2 (commercial)	Identity-as-a-service with agent-specific flows. Newer, niche adoption.

10Approval / human-in-the-loop

What blocks until a human says yes. Cheap to add as a first-class workflow primitive on day one; expensive to retrofit after the audit log shows agents shipping money out the door without anyone approving it.

Pick	Maintainer	Trade-off / when to choose
Inngest pause-and-resume	Inngest	First-class workflow primitive. "Wait for approval keyed by run_id" in one line.
LangGraph interrupts	LangChain	Built into the agent loop. Right if LangGraph is the runtime.
AutoGen human handoff	Microsoft	Pattern within Microsoft's framework.
n8n approval nodes	n8n	Visual workflow with explicit approval steps. Lower code.
Slack-bot approval	per-team	Decisions happen where the team already talks. Lightweight, ergonomic.
Custom approvals UI + queue	—	DIY when frameworks don't fit. Most work; most control.

11Memory / state

What the agent remembers between calls and across sessions. Three flavours: per-session state (the conversation), explicit state machines (the workflow), persistent memory (the agent's long-term knowledge of a user).

Pick	Maintainer	Trade-off / when to choose
Anthropic Memory primitives	Anthropic (first-party)	Built into Claude Agent SDK. Lowest friction if Claude is the runtime.
Letta (MemGPT-derived)	Letta (OSS)	Persistent memory, conversation-centric. 15k stars.
mem0	mem0 (commercial)	Managed memory service. Saves you running a vector DB.
LangGraph state machines	LangChain	Explicit per-workflow state. Right when your "memory" is really workflow state.
Vector DB + retrieval (DIY)	—	Maximum flexibility, you own the orchestration. Pick a DB (Postgres + pgvector, Pinecone, Qdrant) and build.

12 · cross-cutPre-built vendor MCP servers

The universe of packaged integrations. Cuts across all the other categories. The constraint that matters: first-party (vendor-maintained) servers are zero-effort to maintain; community servers can be richer but you adopt them at your own pace.

Server	Maintainer	What it brings
GitHub MCP	GitHub (first-party)	Repos, PRs, issues, code search. The legal-review and platform-engineering substrate.
HubSpot MCP	HubSpot (first-party, GA Apr 2026)	Contacts, deals, marketing email, campaign analytics.
Salesforce MCP	Salesforce (first-party)	Pipeline objects, flows, Apex actions. Enterprise CRM.
Google Workspace MCP	Google (first-party)	Gmail, Drive, Sheets, Calendar, Docs. The default office substrate.
BigQuery Remote MCP	Google (first-party)	Analytics queries. Read-only by default; write needs explicit scoping.
Slack MCP	Slack (first-party)	Read channels, post messages, list users. Post is destructive; gate it.
Notion MCP	Notion (first-party)	Pages, databases, comments.
Atlassian MCP	Atlassian (first-party)	Jira issues, Confluence pages. Engineering and compliance.
+ ~500 more in the registry	mixed (first-party + community)	Check the official MCP registry. Production-pick rule: prefer first-party.

04 · How to choose

Five heuristics, weighted for production-readiness.

Most picks pass any single heuristic. Run a candidate through all five and the picks narrow fast. These are biased toward shipping into production, not toward the cleverest architecture for a demo.

1 · OSS vs SaaS

What does lock-in cost in 24 months? SaaS is faster to start; costs more long-term; the audit log lives on someone else's infrastructure. OSS is more setup, full data sovereignty. For agents handling sensitive data (financial, HR, legal, regulated), audit-log ownership often makes the OSS choice non-negotiable.

Rule of thumb: SaaS for the layers that are commodity (model gateway, registry); OSS for the layers that hold the audit (Langfuse over LangSmith), the policy (Cerbos over Permit.io), and any data plane.

2 · Vendor-maintained vs community

What does the maintenance bill look like in a year? A vendor-maintained MCP server (GitHub's, HubSpot's, Salesforce's) is zero effort to maintain — the vendor ships fixes alongside their API changes. A community server can be richer but is someone's side project; you may inherit the patch responsibility.

Rule of thumb: For production, vendor-maintained wins unless you can afford to fork it and own the patches.

3 · Single-tenant vs multi-tenant

Who owns the audit log? In a multi-tenant SaaS, the log is in the provider's database. You can read it; you don't own it. For regulated environments (POPIA, GDPR, sector compliance) this is often disqualifying — the regulator wants the log under your control, not on someone else's terms of service.

Rule of thumb: Treat audit and policy as single-tenant by default. Treat catalogues and model traffic as multi-tenant if the vendor's terms hold.

4 · Depth vs breadth

At your scale, when does "good enough" beat "best in class"? At 5 agents and 12 tools, breadth (one credible pick per category, all working together) matters more than depth (the best possible audit tool). At 50 agents and 200 tools, depth in your weakest category becomes the constraint.

Rule of thumb: First production deploy — breadth. Second — deepen the layer that hurt most. Don't optimise before the pain shows up.

5 · Reversibility

What does swapping cost if the pick was wrong? The tool protocol (MCP vs not) is sticky — the whole stack speaks it. The audit backend can swap in a weekend if you stuck to OpenTelemetry GenAI conventions. The runtime is one-day-per-agent. Some layers are cheap to change later; some aren't.

Rule of thumb: Spend more time on the sticky picks (protocol, identity, dual-use shape). Spend less on the reversible ones (audit backend, sandbox provider). Bias toward standards when sticky.

05 · From landscape to stack

How the categories wire together, and one worked example.

The categories above are not independent. The dual-use workflow primitive expects the audit layer to capture both callers identically; the permissions layer expects scoped identity tokens to bind decisions to agents; the runtime drives most of it. Two leaves go deeper.

Three places to dig in

Reference architecture

How the categories wire together

A single Worker entry point + Cerbos + Inngest + Langfuse + vendor MCP, end to end. Described as designed, not as running state.

Worked example

A five-agent fractional tool stack

An illustrative roster (Grant CFO, Penny CMO, Leo CLO, Grace CHRO, Katharine CRO) with specific picks across the categories. One stack among many.

Decision leaf Member

The runtime layer choice

Why pick the Claude Agent SDK over LangGraph, CrewAI, Vercel AI SDK, etc. First in a series of per-layer decision leaves.

06 · Connections

Where else this touches the tree.

→ Architecture → Worked example → Runtime decision · member → Agents → Agent Skills → MCP → Claude Agent SDK → LangGraph → CrewAI → Cloudflare → Google (Workspace + BigQuery) → CRM (HubSpot, Salesforce, Frappe)

07 · Resources

Primary sources.

Linked tersely. The landscape moves fast — verify the version against the date on this leaf.

ProtocolModel Context Protocolmodelcontextprotocol.io DiscoveryOfficial MCP Registryregistry.modelcontextprotocol.io RuntimeClaude Agent SDKgithub.com/anthropics/claude-agent-sdk-python RuntimeLangGraphlangchain.com/langgraph RuntimeCrewAIdocs.crewai.com RuntimeVercel AI SDKai-sdk.dev PermissionsCerboscerbos.dev PermissionsPermit.iopermit.io AuditLangfuselangfuse.com AuditLangSmithlangchain.com/langsmith Trace contractOpenTelemetry GenAI semconvopentelemetry.io/docs/specs/semconv/gen-ai SandboxE2Be2b.dev SandboxDaytonadaytona.io Dual-useInngestinngest.com Dual-useTrigger.devtrigger.dev Dual-useTemporaltemporal.io RegistryComposiocomposio.dev RegistryArcade.devarcade.dev MemoryLettagithub.com/letta-ai/letta Memorymem0mem0.ai