know.2nth.ai Tools Architecture
tools · reference architecture

A reference architecture for agent tool calls.

A control plane shape for agent tool calls — described as designed, not as a running system. In this architecture, agents run on a multi-provider runtime that calls models through a model gateway (so the model itself stays a config choice — open weights through frontier, never tied to one provider). Each agent carries a tight allowed_tools list, and every call routes through one Cloudflare Worker that asks Cerbos for a decision, optionally pauses for human approval via Inngest, forwards to a vendor MCP server, and emits an OpenTelemetry trace to a self-hosted Langfuse. Mutating tools are modelled as Inngest functions so the same code path can serve a human form and an agent. Reference knowledge — this page documents the design so it can be adopted, adapted, or argued with.

Cloudflare-first edge OSS substrate Per-agent Cerbos policies Hot · quarterly review

A reference architecture, not a hosted service.

A fractional-agent set-up — for illustration, agents named after C-suite roles like Grant (CFO), Penny (CMO), Leo (CLO), Grace (CHRO), Katharine (CRO) — needs real tools wired the right way: agents that do work, leave proof of work, and respect permissions. This page documents one such architecture, end to end, so it can be adapted or copied. The agent names below are illustrative; the architecture is independent of any specific roster.

About the descriptions below. Sections 02 to 08 describe the architecture as designed. They use the architectural present (“the gateway forwards…”, “the function pauses…”) for clarity, but read them as advisory — this is what the system would do in the design, not a claim about a currently-running deployment.

Three guarantees the architecture is designed to enforce:

  • Proof of work. Every tool call would emit four artifacts tied to one run_id — an OTel trace in Langfuse, a row in D1, an R2 object if any file was produced, and an approval record if it was a mutation. By construction, off-record action would not be possible.
  • Per-agent permissions. One Cerbos principalPolicy file per agent enumerates the exact (resource, action) pairs that agent is allowed to call. No wildcards. The files are partner-readable.
  • Dual-use surfaces. Every mutating action would live once, as an Inngest function callable by an HTML form (humans) or an MCP tool (agents). Same code path, same audit shape.

One Worker. One PDP. One audit store. One function-per-mutation.

By design there is exactly one entry point for every tool call. Everything downstream — policy, approval, vendor MCP, audit, storage — sits behind that single Worker, which is what makes the audit trail complete by construction.

# One Worker entry. Five downstreams. Same shape for every call.

  agent (multi-provider runtime      human (HTML form)
    + model gateway)
       |                                  |
       |  MCP tool call                    |  POST form
       v                                  v
  +-----------------------------------------------+
  |  mcp-gateway  (Cloudflare Worker)             |
  +-----------------------------------------------+
       |
       +---> Cerbos PDP              # allow | deny | approval_required
       |          (cerbos.2nth.io, Fly.io)
       |
       +---> Inngest function        # if mutating: pause for approval
       |          (Inngest Cloud)               then call the vendor
       |
       +---> vendor MCP server       # GitHub / HubSpot / Slack / GWS / …
       |          (vendor-hosted)
       |
       +---> Langfuse                # OTel GenAI span, replayable
       |          (langfuse.2nth.io, Fly.io)
       |
       +---> D1  tool_calls row      # durable summary, indexed
       |          (Cloudflare D1)
       |
       +---> R2  runs/{run_id}/…   # artifact (file output, if any)
                  (Cloudflare R2)

The Worker is intentionally thin. By design it doesn't invent a new protocol — it would forward MCP calls to vendor MCP servers with a per-agent scoped token. It doesn't invent an audit format — it would emit standard OpenTelemetry GenAI spans. It doesn't implement its own policy engine — it would ask Cerbos. Each downstream service is one a partner can independently swap or self-host. That is the point.

Eight layers. Why each pick beats the obvious alternative.

The agents are designed to share this substrate. None of these are 2nth-built; all are off-the-shelf. The "why" for each pick matters more than the pick itself — if a partner reads the rationale they can substitute a different OSS option without breaking the architecture.

LayerPickWhy this, not the alternative
Agent runtimeMulti-provider runtime memberLangGraph (Python), Vercel AI SDK (TS), or Pydantic AI — pick the one that switches between Claude, GPT, Gemini, Ollama, vLLM by config. See the runtime decision leaf for the trade-offs.
Model gatewayCloudflare AI Gateway + LiteLLMEdge caching + rate limits + analytics in front of a routing proxy that targets any provider. Keeps the model itself a config choice rather than a code commitment.
Tool protocolMCP (Model Context Protocol)500+ servers in the official registry; vendor-maintained ones are the safest picks for a partner-facing showcase.
Permission policyCerbos PDPYAML policies, <1 ms decisions, Apache-2 OSS, self-hostable. Permit.io rejected: SaaS.
Proof of workLangfuse self-hostedMIT, ClickHouse-backed, free at scale, replayable traces. LangSmith rejected: SaaS lock-in.
Trace contractOpenTelemetry GenAIVendor-neutral semantic conventions. Future-proofs the audit log if we swap backends.
Dual-useInngestOne function = one webhook entry + one MCP tool entry; same code path. Trigger.dev viable but Inngest's pattern is cleaner for human/agent parity.
Sandboxed codeE2BFirecracker microVMs — hardware boundary. Daytona is faster but Docker-based; for partner credibility, hardware isolation wins.
Identity per agentCloudflare Access service tokens + short-TTL capability JWTsCloudflare-native. No extra IdP. Adding SPIFFE/Auth0 would be over-engineering at this scale.

Eight vendor-maintained servers, GA as of May 2026.

All eight are first-party (the vendor maintains the MCP server themselves). That is the load-bearing constraint — a third-party MCP server can be richer or cheaper, but it is also someone's hobby project, and a partner showcase cannot run on hobby projects.

ServerMaintainerPrimary consumers
GitHub MCPGitHubLeo (legal review of repo content), all (code)
HubSpot MCP (mcp.hubspot.com, GA Apr 2026)HubSpotPenny, Katharine
Salesforce Hosted MCPSalesforceKatharine (enterprise-CRM lane)
Google Workspace MCP (Gmail / Drive / Sheets / Calendar / Docs)GoogleGrace, Penny, Katharine, all
BigQuery Remote MCPGoogleGrant; Katharine read-only
Slack MCPSlackPenny, Katharine, all (post-via-approval)
Notion MCPNotionAll
Atlassian MCP (Jira / Confluence)AtlassianLeo (compliance issues), engineering issue tracking

The per-agent allocation

Each agent in this kind of set-up would get an allowlist of eight tools or fewer, drawn from the universal pool above plus a few domain-specific picks (e.g. Xero or SARS eFiling for a CFO agent, BambooHR-style HR tools for a CHRO agent). The enforceable version is one Cerbos principalPolicy YAML per agent, sitting at a stable path like cerbos/policies/<agent>.yaml. Allocation lives in the tools catalog for one illustrative roster.

Four artifacts. One join key. Replayable end-to-end.

In the design, every tool call produces all four artifacts. The run_id would be the universal join key — a partner pivoting from a single Langfuse trace could find the D1 row, the R2 file, and the approval record without any other identifier.

Artifact 1

OTel GenAI span

Replayable trace in Langfuse. Agent prompt, tool args, decisions, latencies, costs — everything an auditor needs to reconstruct what happened.

langfuse, self-hosted
Artifact 2

D1 row

Durable summary in tool_calls. Cheap to query, indexed by agent / tool / decision / time. Powers dashboards without paging the trace store.

cloudflare D1
Artifact 3

R2 artifact

When a tool produces a file (PDF, CSV, image, audio), it would land at runs/{run_id}/{filename}. The trace and the row both link to it.

cloudflare R2
Artifact 4

Approval record

Destructive actions block until a human decides. The decision is its own row in approvals with approver id, time, and note. A "yes" is as audited as a "no".

cloudflare D1

The schema — three tables, append-only by convention: agent_runs (one row per session), tool_calls (one row per call within a run), approvals (one row per approval decision). Indexes cover the three real query shapes: by-agent-by-time, by-decision-by-time, and pending-approvals.

One function. Two doors. Identical audit shape.

In the design, a mutating action lives once, as an Inngest function. The function would be triggered by either an HTML form on know.2nth.ai/tools/… (humans) or an MCP tool call routed through the gateway (agents). Same code path. Same Langfuse trace shape. Only the source attribute differs.

Worked example · send-marketing-email

How a call would flow through the system — using a marketing-email-send action as the illustration:

Agent path. An agent (e.g. a CMO role) calls send_marketing_email via MCP. The gateway sees a destructive action, logs decision=approval_required, and fires an Inngest event with source: "agent". The function pauses on approval.decided keyed by run_id.

Approval step. A human approver opens the audit UI (Langfuse, behind a corporate access gateway), sees a pending card with the proposed copy + audience, clicks approve. The decision becomes its own D1 row and fires the matching event.

Send step. The function unpauses, calls the relevant vendor API (e.g. HubSpot Marketing Email) via the per-agent scoped token, writes the result back to the gateway, closes the tool_calls row.

Human path. A marketing operator opens an HTML form for the same action and submits it. The form POSTs to a Worker which fires the same Inngest event with source: "human". No pause — the form submission is the approval. Send step runs identically.

Audit parity. The Langfuse trace shape is identical across both paths. The single attribute source on the root span distinguishes them — everything else is the same.

Seven PRs from scaffold to live audit UI.

If a team adopts this architecture, the natural sequence is seven PRs. Each is one logical change; each builds on the last. Order matters — the audit and policy layer comes before any gateway code, so the first real tool call lands in a system that can see it and decide on it.

01

Scaffold the platform repo

Create an agent-platform repo with the canonical folder layout (workers, functions, cerbos/policies, schema, fly). Set up CI with ASCII-only commit messages (Cloudflare Pages rejects non-ASCII with error 8000111). Land a README pointing at the architecture.

02

Stand up audit + policy

Deploy Langfuse and Cerbos on Fly.io (or equivalent). Write one example trace by hand. Load the per-agent Cerbos policies. Confirm a denied call lands in the audit log.

03

MCP gateway Worker

Implement workers/mcp-gateway: receive MCP call, check Cerbos, forward to a vendor MCP server, emit OTel span. End-to-end smoke test against the GitHub MCP server (read-only is safest).

04

First agent end-to-end

Build one mutating action end-to-end (Inngest function + form Worker + MCP tool registration). Verify dual-use parity: agent and human produce identical traces, only the source attribute differs.

05

Roll out the remaining agents

Replicate the pattern for each remaining agent in scope. Each gets at least one Inngest function for a mutating action.

06

Per-tool documentation

Author one leaf per tool in the catalog (what it is, scopes, audit surface, sample agent + sample human call). Flip the matching soon cards on the catalog hub to Live.

07

Audit UI

Stand up the auditor surface — typically Langfuse embedded behind a corporate access gateway — so human auditors and partners can replay any run by run_id.

Five paths to verify, once deployed.

Acceptance criteria for any deployment of this architecture. Each verification should produce a specific, observable outcome once the system is wired and running. If any fail, the design has been broken in implementation.

Agent path

An agent (e.g. a CFO role) issues a BigQuery query via the gateway. The gateway logs a span with source=agent, the agent id, tool=bigquery.query, decision=allow. A row appears in D1. The trace replays in Langfuse with the exact prompt + response.

Human path

A user opens the matching HTML form and submits the same operation. The Inngest function fires. The Langfuse trace appears with source=human, identical schema otherwise.

Denied path

An agent attempts an action not in its allowlist (e.g. a CFO agent trying slack.delete_channel). Cerbos returns DENY. The denial is logged with decision=deny, reason=policy_no_match. No mutation occurs.

Approval path

An agent attempts a destructive action (e.g. hubspot.update_deal_stage). Inngest pauses. An approver opens the approvals surface, sees the pending action, approves. The action completes. The Langfuse trace has a child approval span with approver_id and latency.

Portability

The same code clones onto another team's Cloudflare account, with their own Cerbos policies and their own vendor MCP tokens. Agents work the same way. No origin-team credentials involved.

One practical warning for anyone wiring this.

Cloudflare's ASCII commit gotcha

The Cloudflare Pages deployments API rejects non-ASCII commit messages (em-dashes, middle-dots, curly quotes, arrows) with error code 8000111. Any wrangler deploy step in CI should pass --commit-message="<ASCII string>" explicitly, with a SHA-based value. Without it, wrangler reads HEAD's raw git commit and any Unicode in there silently fails the deploy. Load-bearing on day one.

Where else this touches the tree.

Primary sources.

Linked tersely. Verify the version against the date you read this leaf — the substrate moves fast, which is why the catalog cadence is hot (quarterly).