know.2nth.ai Tools
tools · landscape map

The agent tool landscape.

What categories of tooling an agent platform actually needs to ship into production, what credible options exist in each, and how to choose between them. Reference knowledge — no claims about anyone's deployment. The point of this leaf is orientation: most teams get one or two categories right and discover the others under deadline pressure. This page maps them so the gaps surface before the deadline, not after.

Last reviewed: 2026-05-17 · Cadence: hot (quarterly) · Worked example: tools/example

11
Categories
~50
Picks listed
5
Choosing heuristics
1
Worked example
01 · What an agent platform actually needs

Eleven categories, plus one cross-cutting.

An agent is software that calls external systems with permissions and leaves a trail. To do that in production — not in a demo — you need eleven categories of tooling, plus one that cross-cuts the rest. They are not all the same category of decision: some picks are sticky (the tool protocol your whole stack speaks), some are reversible (the audit backend, if you stuck to a standard), some are cheap-to-buy, some are cheap-to-build. A team that maps all ten at the start avoids the worst mistake: discovering at week six that there is no plan for “how a human approves before this sends.”

The categories below are the ones that show up in every credible agent stack as of mid-2026. The twelfth (cross-cutting), pre-built vendor MCP servers, applies across all of them — it is the universe of integrations agents can call without writing custom code.

02 · The ten categories at a glance

Each card: what the category is, and the question that picks the layer apart.

Read this section to orient. The next section breaks each category into its credible options.

01

Runtime / framework

What hosts the agent loop — the model call, the tool-call parsing, the result feedback, the next call. Manages context window and (often) MCP plumbing.

Multi-provider or first-party?
02

Model gateway / routing

Where calls to the model actually go. Routes between providers (Claude, GPT, Gemini, Ollama, vLLM), handles fallback + caching + rate limits + cost tracking. The layer that buys you model portability.

Edge gateway, unified API, or self-hosted proxy?
03

Tool protocol

How the agent talks to external systems. Defines tool schemas, invocation, error handling, streaming. The choice composes (or doesn't) with everything downstream.

Standardised or per-vendor?
04

Tool registries / discovery

Where agents and humans find tools to call. Catalogues of pre-built integrations with auth handled. Often the only way to avoid writing custom adapter code.

Free + community, or paid + curated?
05

Permissions / policy

What each agent is allowed to do, expressed as policy a non-engineer can read. The load-bearing layer for any audit / compliance / regulated-environment story.

Centralised policy engine or framework allowlists?
06

Audit / observability

How a human (or another agent) reconstructs what happened. Replayable traces, prompt + result + token + cost capture, by-agent / by-tool / by-time queries.

OSS self-hosted or SaaS managed?
07

Sandboxed code execution

Where agent-written code runs without taking down the host. Tradeoff is between cold start and isolation strength; hardware vs container vs language-level.

Hardware isolation or container isolation?
08

Dual-use / workflow

The same operation triggered by humans (form) or agents (tool call). Durable functions with sleep, retry, fan-out, and "pause until event" as first-class primitives.

First-class workflow primitive or DIY?
09

Identity / auth

Who the agent is when it acts. Short-TTL scoped tokens, per-agent service accounts, sender-constraint proofs. Without this, "permissions" is just a name on a policy file.

Edge service tokens or enterprise IdP?
10

Approval / human-in-the-loop

What blocks until a human says yes. Mutating actions, financial transfers, message sends, irreversible state changes — all candidates. Cheap to add, expensive to retrofit.

Built into the workflow engine or external UI?
11

Memory / state

What the agent remembers between calls and across sessions. Per-conversation, per-user, per-organisation; managed memory vs explicit state machines vs DIY.

Per-session state or persistent memory?
12 · cross-cut

Pre-built vendor MCP servers

The universe of packaged integrations agents can call without writing custom code. 500+ servers in the official MCP registry as of mid-2026, first-party + community mixed.

First-party only, or accept community?
03 · Options in each category

Credible picks, who maintains them, the one-line trade-off.

Each table has three columns: pick, maintainer, and trade-off / when to choose. Not exhaustive — just the picks that show up in production agent stacks in mid-2026. Skim the picks; the rationale for any single choice belongs in its own decision leaf.

01Runtime / framework

The agent's host environment. Manages the model call ↔ tool call ↔ result feedback loop, often the MCP plumbing too. Pick the one closest to the model you are using; switch costs are real but bounded (~1 day per agent).

PickMaintainerTrade-off / when to choose
Claude Agent SDKAnthropic (first-party)Best fit if Claude is the model. Native allowed_tools + PreToolUse hooks; MCP-aware out of the box.
LangGraphLangChainBest for stateful multi-agent orchestration. Multi-provider. v1.0 in late 2025; 90k+ stars.
Vercel AI SDK v6VercelTypeScript-first, lovely DX, ToolLoopAgent for production. Narrower scope than LangGraph.
CrewAIcommunity + commercialRole-based, fast adoption (60%+ Fortune 500 by Jan 2026), coarser permission model.
OpenAI Agents SDKOpenAI (first-party)Best fit if OpenAI is the model. Locks the runtime to OpenAI.
Microsoft Agent FrameworkMicrosoft (unified AutoGen + SK)Enterprise .NET / Azure shops. GA Q1 2026.
Pydantic AIPydanticType-first Python; small but credible.
Raw model API + DIY loopMaximum control; you rebuild the loop, context compaction, MCP plumbing yourself.
02Model gateway / routing

Where calls to the model actually go. The layer that decouples agent code from “which model does this call?” Without it, swapping from Claude to Gemini means refactoring; with it, it's a config change. Cheap to add early, expensive to retrofit once dozens of code paths call anthropic.messages.create (or equivalent) directly. The category that buys you model portability — open weights through frontier — without rewrites.

PickMaintainerTrade-off / when to choose
Cloudflare AI GatewayCloudflareEdge-native, caching + rate limits + analytics built in. Cheapest if you're already on Cloudflare; fits the edge-first story.
OpenRouterOpenRouterUnified API across 100+ models including open-weights (Llama, Qwen, DeepSeek) and frontiers (Claude, GPT, Gemini). Pay-per-call.
LiteLLMBerriAI (OSS)Self-hostable OpenAI-compatible proxy that routes anywhere (Anthropic, OpenAI, Bedrock, Vertex, Ollama). Strong control-plane choice.
Vertex AI Model GardenGoogleGoogle's hosted catalogue. Strong for Gemini + Claude + Llama in one place; GCP-native auth.
Amazon BedrockAWSAWS-native model gateway. Right if you're already on AWS; heavy otherwise.
Ollama / vLLM (direct)OSSFor self-hosting open-weights models. Pair with LiteLLM upstream or call directly.
03Tool protocol

How the agent talks to external systems. This pick is sticky — everything downstream depends on it. Bias toward standards.

PickMaintainerTrade-off / when to choose
MCP (Model Context Protocol)Anthropic-origin, open specThe vendor-neutral standard. 500+ servers in the official registry. Cross-runtime by design.
OpenAI function-callingOpenAIWorks inside OpenAI tooling. Tool definitions tied to one provider.
Direct HTTP / vendor SDKsper-vendorNo discovery, no composition; fine for one-off integrations, awful as the count grows.
GraphQL endpointsper-teamNiche — useful only if the backend is already GraphQL-native.
04Tool registries / discovery

Where the agent finds tools to call. The alternative to a registry is writing your own adapters — viable for 5 tools, unmanageable at 50.

PickMaintainerTrade-off / when to choose
Official MCP RegistryAnthropic-maintainedThe canonical discovery surface. Free. Listings for first-party and community servers.
ComposioComposio (SaaS)500+ pre-built actions with OAuth handled. Paid.
Arcade.devArcade (commercial)MCP-first runtime with per-user scoped tokens. Smaller catalog than Composio.
PipedreamPipedream (SaaS)2000+ app integrations. Agent-callable via API. Workflow-flavoured rather than MCP-native.
05Permissions / policy

What each agent is allowed to do. Externalised from code so non-engineers (and auditors) can read it. Without a policy engine, you are doing RBAC in code — fine at small scale, awful past it.

PickMaintainerTrade-off / when to choose
CerbosCerbos (OSS, Apache-2)YAML policies, sub-ms decisions, self-hostable. Partner-readable.
Permit.ioPermit (SaaS)Managed RBAC/ABAC with a UI. Policies live in their database.
OPA / RegoCNCF (OSS)Generalist policy engine. Rego is powerful but opaque; higher learning curve.
Framework allowlistsper-frameworkBuilt into the SDK (e.g. Claude SDK's allowed_tools). Sufficient at small scale.
AembitAembit (commercial)Machine-identity-first. Newer entrant, narrower focus.
06Audit / observability

How you reconstruct what happened. Three things matter: that traces are replayable, that args and results are captured (not just metadata), and that the schema follows the OpenTelemetry GenAI conventions so the backend is swappable.

PickMaintainerTrade-off / when to choose
LangfuseLangfuse (OSS, MIT; ClickHouse-acquired Jan 2026)Self-hostable, replayable traces, free at scale. Most popular OSS pick.
LangSmithLangChain (SaaS)Deepest LangGraph integration; per-seat priced.
HeliconeHelicone (OSS + SaaS)Proxy-based, simplest to deploy. Narrower than dedicated trace stores.
BraintrustBraintrust (SaaS)Eval-first; tracing is secondary. Strong for evaluation workflows.
Arize PhoenixArize (OSS)ML-grade rigor; embedding analysis. Mixed LLM + traditional ML.
Pydantic LogfirePydantic (SaaS)AI-specific UI; emerging.
OpenTelemetry GenAIOTel community (standard)The vendor-neutral semantic conventions underneath every backend. Not a backend itself.
07Sandboxed code execution

Where agent-written code runs without taking down the host. Hardware isolation (Firecracker) is the credible pick for untrusted output; container isolation is faster but weaker.

PickMaintainerTrade-off / when to choose
E2BE2B (commercial)Firecracker microVMs — hardware boundary. Sub-second cold starts.
DaytonaDaytona (commercial)Docker-based, sub-90ms cold starts. Container-level isolation.
Modal SandboxesModal (commercial)GPU support, gVisor isolation. Higher cost; niche unless you need GPU.
Cloudflare SandboxesCloudflareEdge-native, emerging. Worth tracking, not yet the production pick.
08Dual-use / workflow

The same operation, callable by a human (HTML form) or by an agent (MCP tool). Durable functions with "pause until event" are the cleanest way to model this so the audit trail is identical for both callers.

PickMaintainerTrade-off / when to choose
InngestInngest (OSS core + cloud)Durable functions; "pause until event" primitive. Free tier covers pilot scale.
Trigger.dev v3Trigger.dev (OSS)Similar to Inngest. TypeScript-first.
TemporalTemporal (OSS)Heavyweight enterprise option. Powerful, more infrastructure.
n8nn8n (OSS + cloud)Low-code workflows. Bidirectional MCP integration since April 2025.
HookdeckHookdeck (SaaS)Webhook infrastructure with an MCP server. Good when webhooks dominate.
DIY: one function, multiple entry pointsWorks at small scale; you re-implement durability when you grow.
09Identity / auth

Who the agent is when it acts. Short-TTL scoped tokens per (agent, tool) pair, signed by the runtime. Without this layer, "permissions" is just a wishful name on a policy file.

PickMaintainerTrade-off / when to choose
Cloudflare Access service tokensCloudflareZero-trust, edge-native, no separate IdP. Cheapest path if you are already on Cloudflare.
Auth0 / Okta Agent IAMOkta (SaaS)Enterprise identity, broader scope. Heavier; right for regulated industries.
SPIFFE / SPIRECNCF (OSS)Workload identity standard for K8s. Right if you are running on Kubernetes.
DPoP-bound JWTsopen spec (RFC 9449)Sender-constrained tokens. Most secure; most implementation work.
Asgardeo / WSO2WSO2 (commercial)Identity-as-a-service with agent-specific flows. Newer, niche adoption.
10Approval / human-in-the-loop

What blocks until a human says yes. Cheap to add as a first-class workflow primitive on day one; expensive to retrofit after the audit log shows agents shipping money out the door without anyone approving it.

PickMaintainerTrade-off / when to choose
Inngest pause-and-resumeInngestFirst-class workflow primitive. "Wait for approval keyed by run_id" in one line.
LangGraph interruptsLangChainBuilt into the agent loop. Right if LangGraph is the runtime.
AutoGen human handoffMicrosoftPattern within Microsoft's framework.
n8n approval nodesn8nVisual workflow with explicit approval steps. Lower code.
Slack-bot approvalper-teamDecisions happen where the team already talks. Lightweight, ergonomic.
Custom approvals UI + queueDIY when frameworks don't fit. Most work; most control.
11Memory / state

What the agent remembers between calls and across sessions. Three flavours: per-session state (the conversation), explicit state machines (the workflow), persistent memory (the agent's long-term knowledge of a user).

PickMaintainerTrade-off / when to choose
Anthropic Memory primitivesAnthropic (first-party)Built into Claude Agent SDK. Lowest friction if Claude is the runtime.
Letta (MemGPT-derived)Letta (OSS)Persistent memory, conversation-centric. 15k stars.
mem0mem0 (commercial)Managed memory service. Saves you running a vector DB.
LangGraph state machinesLangChainExplicit per-workflow state. Right when your "memory" is really workflow state.
Vector DB + retrieval (DIY)Maximum flexibility, you own the orchestration. Pick a DB (Postgres + pgvector, Pinecone, Qdrant) and build.
12 · cross-cutPre-built vendor MCP servers

The universe of packaged integrations. Cuts across all the other categories. The constraint that matters: first-party (vendor-maintained) servers are zero-effort to maintain; community servers can be richer but you adopt them at your own pace.

ServerMaintainerWhat it brings
GitHub MCPGitHub (first-party)Repos, PRs, issues, code search. The legal-review and platform-engineering substrate.
HubSpot MCPHubSpot (first-party, GA Apr 2026)Contacts, deals, marketing email, campaign analytics.
Salesforce MCPSalesforce (first-party)Pipeline objects, flows, Apex actions. Enterprise CRM.
Google Workspace MCPGoogle (first-party)Gmail, Drive, Sheets, Calendar, Docs. The default office substrate.
BigQuery Remote MCPGoogle (first-party)Analytics queries. Read-only by default; write needs explicit scoping.
Slack MCPSlack (first-party)Read channels, post messages, list users. Post is destructive; gate it.
Notion MCPNotion (first-party)Pages, databases, comments.
Atlassian MCPAtlassian (first-party)Jira issues, Confluence pages. Engineering and compliance.
+ ~500 more in the registrymixed (first-party + community)Check the official MCP registry. Production-pick rule: prefer first-party.
04 · How to choose

Five heuristics, weighted for production-readiness.

Most picks pass any single heuristic. Run a candidate through all five and the picks narrow fast. These are biased toward shipping into production, not toward the cleverest architecture for a demo.

1 · OSS vs SaaS

What does lock-in cost in 24 months? SaaS is faster to start; costs more long-term; the audit log lives on someone else's infrastructure. OSS is more setup, full data sovereignty. For agents handling sensitive data (financial, HR, legal, regulated), audit-log ownership often makes the OSS choice non-negotiable.

Rule of thumb: SaaS for the layers that are commodity (model gateway, registry); OSS for the layers that hold the audit (Langfuse over LangSmith), the policy (Cerbos over Permit.io), and any data plane.

2 · Vendor-maintained vs community

What does the maintenance bill look like in a year? A vendor-maintained MCP server (GitHub's, HubSpot's, Salesforce's) is zero effort to maintain — the vendor ships fixes alongside their API changes. A community server can be richer but is someone's side project; you may inherit the patch responsibility.

Rule of thumb: For production, vendor-maintained wins unless you can afford to fork it and own the patches.

3 · Single-tenant vs multi-tenant

Who owns the audit log? In a multi-tenant SaaS, the log is in the provider's database. You can read it; you don't own it. For regulated environments (POPIA, GDPR, sector compliance) this is often disqualifying — the regulator wants the log under your control, not on someone else's terms of service.

Rule of thumb: Treat audit and policy as single-tenant by default. Treat catalogues and model traffic as multi-tenant if the vendor's terms hold.

4 · Depth vs breadth

At your scale, when does "good enough" beat "best in class"? At 5 agents and 12 tools, breadth (one credible pick per category, all working together) matters more than depth (the best possible audit tool). At 50 agents and 200 tools, depth in your weakest category becomes the constraint.

Rule of thumb: First production deploy — breadth. Second — deepen the layer that hurt most. Don't optimise before the pain shows up.

5 · Reversibility

What does swapping cost if the pick was wrong? The tool protocol (MCP vs not) is sticky — the whole stack speaks it. The audit backend can swap in a weekend if you stuck to OpenTelemetry GenAI conventions. The runtime is one-day-per-agent. Some layers are cheap to change later; some aren't.

Rule of thumb: Spend more time on the sticky picks (protocol, identity, dual-use shape). Spend less on the reversible ones (audit backend, sandbox provider). Bias toward standards when sticky.

05 · From landscape to stack

How the categories wire together, and one worked example.

The categories above are not independent. The dual-use workflow primitive expects the audit layer to capture both callers identically; the permissions layer expects scoped identity tokens to bind decisions to agents; the runtime drives most of it. Two leaves go deeper.

06 · Connections

Where else this touches the tree.

07 · Resources

Primary sources.

Linked tersely. The landscape moves fast — verify the version against the date on this leaf.