Tools · worked example

01 · The substrate

Eight things every agent shares.

Before a single vendor MCP server lights up, an agent needs a runtime, a protocol, a permission gate, an audit trail, a sandbox for code, an identity, a dual-use bridge, and a trace contract. These are the eight load-bearing picks. Each is verified production in 2026 and self-hostable, so a partner can clone the whole pattern.

MemberRuntime

Multi-provider runtime

LangGraph or Vercel AI SDK — switches between Claude, GPT, Gemini, Ollama, vLLM by config.

oss · multi-provider

Model gateway

Cloudflare AI Gateway + LiteLLM

Edge caching + rate limits + routing proxy that targets any provider. Keeps model choice a config decision.

cloudflare + oss

Protocol

MCP — Model Context Protocol

500+ servers in the official registry. Tool discovery + invocation, vendor-neutral.

anthropic-origin · oss

Permissions

Cerbos PDP

YAML policies per agent. <1 ms decision. One principalPolicy file per role.

cerbos · oss (Apache-2)

Audit

Langfuse (self-hosted)

OpenTelemetry-compatible. Replayable traces. ClickHouse-backed. Free at scale.

langfuse · oss (MIT)

Sandbox

E2B

Firecracker microVMs. Hardware boundary for code an agent writes itself.

e2b · oss core

Identity

Cloudflare Access service tokens

Per-agent service token + short-TTL capability JWTs. No extra IdP.

cloudflare · saas

Dual-use

Inngest

One function fires from a form OR an MCP tool. Single code path, single audit row.

inngest · oss core + cloud

Trace contract

OpenTelemetry GenAI

Semantic conventions for gen_ai.agent.id, gen_ai.tool.call.id, etc.

otel · oss

02 · Vendor MCP servers

The eight servers everyone draws from.

All eight are vendor-maintained, GA, and as of May 2026 the safest picks for a partner-facing showcase. A third-party MCP server might be richer, but every dependency is a future maintenance bill — vendor-hosted servers are zero-effort to keep alive.

GitHub MCPSoon

tools/github

Repos, PRs, issues, code search. The legal-review and platform-engineering substrate.

HubSpot MCPSoon

tools/hubspot

Contacts, deals, marketing-email send, campaign analytics. Penny + Katharine's CRM.

Salesforce MCPSoon

tools/salesforce

Pipeline objects + flows. Katharine's enterprise-CRM lane for teams running on Salesforce.

Google Workspace MCPSoon

tools/google-workspace

Gmail, Drive, Calendar, Sheets, Docs. The default office substrate for many SME setups.

BigQuery Remote MCPSoon

tools/bigquery

Read-only analytics queries. Grant's CFO data backend; Katharine read-only too.

Slack MCPSoon

tools/slack

Post messages, read channels, list users. Posting is destructive: gated by approval.

Notion MCPSoon

tools/notion

Pages, databases, comments. Where agent drafts and playbooks accrue.

Atlassian MCPSoon

tools/atlassian

Jira issues + Confluence pages. Leo's compliance lane + general engineering tracker.

03 · Tools by agent

Who can do what.

Each agent has a tight allowlist — eight tools or fewer. Read scopes are wide by default; mutations are narrow and gated. Each agent's Cerbos principalPolicy lives at a stable path like cerbos/policies/<agent>.yaml — readable rules, not implementation details to take on trust.

Grant · CFO Finance · reads broadly · mutates with approval

BigQuerySoon

tools/bigquery

Read finance datasets, run analytics SQL. The CFO's primary data tool.

XeroSoon

tools/xero

Read the books; create invoices and bills with human approval.

Postgres / HyperdriveSoon

tools/postgres

The warehouse layer for non-BigQuery finance data.

SheetsSoon

tools/google-workspace

Working files. Grant can write to spreadsheets he owns.

Slack (finance)Soon

tools/slack

Post to #finance, #cfo-updates, #leadership. Approval required.

SARS eFilingSoon

tools/sars

Submit returns. Always approval-gated. Typically a thin wrapper over SARS eFiling APIs.

E2B (Python)Soon

tools/e2b

Run finance models in a sandbox — recon, forecasts, journal generation.

Inngest banking-exportSoon

tools/inngest

The dual-use export. Same fn fires from a CFO form or from Grant directly.

Penny · CMO Marketing · drafts by default · sends with approval

HubSpotSoon

tools/hubspot

Contacts, lists, deals, campaigns. Sends gated by approval.

Gmail (draft)Soon

tools/google-workspace

Drafts only by default. Send goes through the Inngest gate.

Drive (marketing/brand)Soon

tools/google-workspace

Write scoped to marketing and brand folders.

NotionSoon

tools/notion

Briefs, playbooks, campaign trackers. Read + create + update.

R2 brand-assetsSoon

tools/r2

Logo + image bucket. Penny can read and write here, no other agent can.

SlackSoon

tools/slack

Read freely; post with approval.

E2B (image gen)Soon

tools/e2b

Generate images via Vertex Imagen / Flux inside the sandbox; outputs land in R2.

Inngest send-emailSoon

tools/inngest

The flagship dual-use fn. Human form or agent tool, same code path.

Leo · CLO Legal · flags risk · drafts redlines · never signs

GitHubSoon

tools/github

Read repos and PRs for IP / open-source compliance. Comment, never merge.

Drive (legal)Soon

tools/google-workspace

Write scoped to legal, contracts, compliance folders.

Gmail (draft)Soon

tools/google-workspace

Read incoming + draft replies. Send is human-only.

Notion (playbooks)Soon

tools/notion

Legal playbooks, POPIA checklists, contract templates.

Web searchSoon

tools/web-search

Case-law and regulatory lookups. Read-only.

E2B (redline)Soon

tools/e2b

Run diff tools across contract drafts; output the redlined PDF to R2.

Slack (legal)Soon

tools/slack

Post to #legal, #compliance, #leadership. Approval gated.

Inngest contract-signSoon

tools/inngest

Routes a contract for signature to the human signatory.

Grace · CHRO People · sensitive defaults · offers via approval

Gmail (HR)Soon

tools/google-workspace

Read + draft. Sends via Inngest with approval.

Drive (people/hr)Soon

tools/google-workspace

Strictly scoped to people / hr / playbooks folders.

CalendarSoon

tools/google-workspace

Read, create, update events. The interview / 1-1 / review surface.

Sheets (HR)Soon

tools/google-workspace

Write only to spreadsheets Grace owns. Read broadly.

Notion (playbooks)Soon

tools/notion

HR playbooks, onboarding tracks, leadership memos.

Slack (DMs + people)Soon

tools/slack

DMs free; #people + #leadership posts approval-gated.

Inngest offer-letterSoon

tools/inngest

The offer + onboarding-pack dual-use fns. Always human-approved.

Katharine · CRO Revenue · pipeline · deal moves are explicit

HubSpot (deals)Soon

tools/hubspot

Read everything; update contacts + deals (deal-stage changes are gated).

SalesforceSoon

tools/salesforce

For teams on SFDC. Same patterns as HubSpot.

Slack (sales)Soon

tools/slack

#sales, #revenue, #deals, #leadership.

Gmail (drafts)Soon

tools/google-workspace

Draft outreach. Send via Inngest send-proposal fn.

CalendarSoon

tools/google-workspace

Book discovery and demo slots.

Notion (deals)Soon

tools/notion

Account plans, deal rooms, proposals.

BigQuery (read-only)Soon

tools/bigquery

Scoped to sales_marts + revenue_reporting datasets.

Inngest deal-stage-changeSoon

tools/inngest

Moving a deal to Closed-Won is never silent. Always logged, always approved.

04 · Proof of work

Every call leaves a trail. Four artifacts. One join key.

Every tool call — agent or human — emits the same four artifacts, all tied by a single run_id. Anyone reviewing the catalog — an auditor, another agent — can pivot from any one of them to the others, replay the run in the Langfuse UI, and verify nothing happened off-record.

OTel GenAI span → Langfuse

The replayable trace. Agent prompts, tool args, decisions, latencies, costs — everything an auditor needs to reconstruct what happened, including the agent's reasoning.

D1 row in `tool_calls`

The durable summary. Cheap to query, indexed by agent, tool, decision, time. Powers dashboards and approval queues without paging the trace store.

R2 artifact

When the tool produced a file (image, PDF, CSV, audio), it lands at runs/{run_id}/{filename}. The trace and the row both link to it.

Approval record

Destructive actions block until a human decides. The decision is its own row in approvals with approver id, time, and note. A "yes" is as audited as a "no".

05 · Dual-use pattern

One function. Human form or agent tool.

A tool that mutates state lives once, as an Inngest function. The function is callable from a plain HTML form (humans) or from the MCP gateway (agents). The Langfuse trace is identical except for one attribute: source. That's how the audit log treats humans and agents the same way — because the code path is the same.

Worked example · `penny.send-email`

Agent path: Penny calls penny.send_marketing_email via MCP. The gateway logs decision=approval_required and fires Inngest event penny.send_email.requested with source: "agent".
The Inngest function pauses on approval.decided. A human approver sees a card in the audit UI, reviews the proposed copy + audience, clicks approve.
The function unpauses, calls the HubSpot Marketing Email API, writes the result back to the gateway, closes the tool_calls row.
Human path: a marketing operator opens the matching HTML form for the same action and submits it. The form POSTs to a Worker which fires the same Inngest event with source: "human". No pause — the form submission is the approval.
The Langfuse trace shape is identical in both paths. Only the source attribute differs.

06 · What this is not

The honest disclaimers.

Self-host, not a hosted product

The catalog describes a self-hostable pattern. There is no managed multi-tenant service behind it — teams clone the architecture and run it on their own Cloudflare + Fly accounts.

Tool calls, not model traffic

The catalog gates tool calls. Model traffic (Anthropic, Vertex, Bedrock) goes direct from the agent. Use Cloudflare AI Gateway or equivalent for model-side routing.

One of several stacks

Multiple stacks land at the same architecture. The picks in this catalog favour OSS + self-hostable + readable policies. Substitute confidently when a different OSS option suits the situation; the architecture survives the swap.

07 · Connections

Where else this catalog touches the tree.

→ Architecture (the deep dive) → Runtime decision · member → Agents → Skills (the agent-side spec) → MCP → Claude Agent SDK → Cloudflare (Workers, D1, R2) → Google Workspace + BigQuery → CRM (HubSpot, Salesforce, Frappe) → Frappe

One worked example: a five-agent fractional tool stack.

Eight things every agent shares.

Multi-provider runtime

Cloudflare AI Gateway + LiteLLM

MCP — Model Context Protocol

Cerbos PDP

Langfuse (self-hosted)

E2B

Cloudflare Access service tokens

Inngest

OpenTelemetry GenAI

The eight servers everyone draws from.

Who can do what.

Every call leaves a trail. Four artifacts. One join key.

OTel GenAI span → Langfuse

D1 row in tool_calls

R2 artifact

Approval record

One function. Human form or agent tool.

Worked example · penny.send-email

The honest disclaimers.

Self-host, not a hosted product

Tool calls, not model traffic

One of several stacks

Where else this catalog touches the tree.

D1 row in `tool_calls`

Worked example · `penny.send-email`