know.2nth.ai Agents Claude
agents · Claude · Skill Leaf

Reasoning quality, priced and tiered.

Claude is Anthropic's flagship model family. Three production tiers in 2026: Opus 4.7 for the hardest reasoning, Sonnet 4.6 for everyday agent work, Haiku 4.5 for high-volume routing and classification. 200k standard context window, 1M extended on premium tiers. Native extended thinking, MCP-first tool layer, vision and computer use built in. The default for production agents that need long-context reasoning, reliable function-calling, and instruction-following beyond what current open-weights models reach.

Claude 4.x family · current Opus 4.7 · Sonnet 4.6 · Haiku 4.5 200k → 1M context Extended thinking MCP-native

Anthropic's flagship family.

Claude is the model family from Anthropic, the AI safety company founded in 2021 by former OpenAI researchers. The first Claude model shipped in early 2023; the family has been on a steady release cadence since, with Claude 4.x being the current generation as of mid-2026.

Where GPT and Gemini compete on raw capability and ecosystem reach, Claude has historically led on three things: instruction-following reliability (does what you ask, even when you ask weird things), long-context reasoning quality (uses 200k+ context windows well, doesn't degrade like cheaper models), and safety-focused behaviour (refuses cleanly, hallucinates less in adversarial domains, avoids the "confidently wrong" failure mode that plagues mid-tier models). Those three properties make it the default choice for production agents that have to be right and have to be auditable.

Distribution: Claude is available via the Anthropic API directly, via AWS Bedrock (including in af-south-1 Cape Town for SA-resident calls), via Google Cloud Vertex AI, via the Claude.ai web app, the Claude Desktop app, the Claude Code CLI, and through every major aggregator (OpenRouter, LangSmith, etc.). Enterprise plans include Claude Enterprise with SSO, audit logging, and admin controls.

The family naming convention

Claude versions follow Claude {Tier} {Major.Minor}. Tier is the size class (Opus / Sonnet / Haiku — large / medium / small). Major is the model generation; minor is the within-generation iteration. So Claude Opus 4.7 is the seventh Opus 4.x release. The full model ID for API calls drops the "Claude" prefix and uses dashes: claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5.

Three tiers, one quality curve.

The Claude family is deliberately tiered. Each tier is meaningfully cheaper than the one above and meaningfully less capable on the hardest tasks. The right production pattern is multi-tier routing: Haiku for the volume work, Sonnet for the default, Opus for the hard cases. The Anthropic Agent SDK supports this routing trivially — sub-agents can run on different tiers.

Tier · frontier

Claude Opus 4.7

The hardest-reasoning tier. Best for: deep research, complex code review, multi-step planning, agentic problem-solving. Frontier-level on reasoning benchmarks. Most expensive of the three. 200k standard / 1M extended context.

Tier · default

Claude Sonnet 4.6

The "good for almost everything" tier. ~80% of Opus quality at meaningfully lower cost. The right default for most production agents. 200k context. Most teams' day-to-day workhorse.

Tier · volume

Claude Haiku 4.5

The fastest, cheapest tier. Best for: classification, routing, summarisation, structured-extraction at high throughput. Lower reasoning depth but very fast and cheap. Good fit for "first-pass" agents that escalate to Sonnet/Opus when needed.

Capability · cross-tier

All tiers ship

Tool use / function calling, vision (image input), MCP-native integration, JSON mode for structured outputs, extended thinking on Opus and Sonnet, computer use on Sonnet (vision + UI control), 200k+ context, streaming.

Extended thinking — the Claude differentiator

Claude Opus and Sonnet 4.x support extended thinking: when the prompt indicates complex reasoning is required, the model can allocate a separately-billed thinking-token budget to "think before answering" — producing internal reasoning traces that improve final answer quality on hard tasks. The Anthropic Agent SDK exposes this as thinking={"budget_tokens": 16000}. Most other model families either don't have this or expose it only via specialised "reasoning" SKUs (OpenAI's o-series, Gemini Thinking).

Where Claude is the right pick — and where it isn't.

Honest cross-family positioning. Each model family has a real strength and a real constraint. Most production stacks in 2026 use multiple families for different work; few teams pick one and use it for everything.

FamilyStrengthsWatch out for
Claude (Anthropic)Best instruction-following, strong long-context reasoning, native extended thinking, tight MCP integrationClosed model; USD billing; rate limits hit hard at high volume on free tier
GPT (OpenAI)Largest API ecosystem, broadest tooling, strong structured outputs, leading on multimodal extremesVoice / video / DALL-E billed separately; more "confidently wrong" failure mode than Claude
Gemini (Google)Largest context window (1M+ on Pro/Ultra), native search grounding, voice via Live API, GCP integrationLess consistent on instruction-following than Claude; Gemini-only via ADK/Vertex unless via OpenRouter
Llama (Meta, open-weights)Open weights, runs locally via Ollama, largest community fine-tune ecosystemSmaller frontier gap; instruction-following lags Claude/GPT; Meta licence has commercial restrictions over 700M MAU
Gemma (Google, open-weights)Open weights, multimodal, frontier-lab safety tuning, runs locallySmaller community fine-tune ecosystem than Llama; not as code-strong as Qwen-coder

The hybrid that's actually shipping in 2026

Most production agent teams use two or three model families together. Common patterns: Claude for the orchestrator + reasoning steps, GPT for tool-calling-heavy sub-agents, local Ollama-Gemma for PII-bearing pre-processing, Gemini for very-long-context tasks (1M+). The Agents domain hub frames this explicitly: "the question isn't which framework or which model, it's which framework where and which model when." Multi-model is the production reality.

Frontier-tier pricing, with real production trade-offs.

Claude is not cheap at the frontier tier. Sonnet 4.6 is meaningfully more affordable than Opus 4.7 and very close in quality — for most production work, Sonnet is the right default. Haiku 4.5 is cheap enough for high-volume tasks. Pricing is USD per million tokens, with separate input and output rates, and extended thinking tokens billed at output rate. Always check anthropic.com/pricing for current numbers — the rates here are accurate to mid-2026 but Anthropic adjusts pricing as model generations evolve.

The shape of the pricing curve (illustrative; check the official page for exact numbers):

  • Opus 4.7 — frontier-tier rates. Output tokens billed roughly 5× input. Extended thinking tokens billed at output rate. The most expensive tier.
  • Sonnet 4.6 — meaningfully cheaper than Opus, similar 5× output multiplier. The right default for production volume.
  • Haiku 4.5 — cheapest of the three by an order of magnitude. The right tier for routing, classification, and structured extraction at scale.
  • Prompt caching — 90% discount on cached input tokens for repeated context. Significant cost saver for RAG agents that share common prompts across calls.
  • Batch API — 50% discount on input + output for non-urgent batch workloads. Useful for nightly processing.

The cost-saving routing pattern

Build a Haiku-driven router as the first agent in the chain. It classifies incoming requests by type and difficulty, then dispatches to: a Haiku agent for simple cases (60%+ of traffic typically), a Sonnet agent for the default case, and an Opus agent only for the genuinely hard cases (5-15% typically). This pattern can cut Claude costs by 50-80% versus running everything on a single tier. The Anthropic Agent SDK's sub-agent pattern fits this routing model directly.

When Claude is the right model. When it isn't.

Use Claude when

  • Instruction-following reliability is load-bearing — the model has to do exactly what you ask
  • Long-context reasoning matters (200k+ tokens, real use of the context)
  • You want extended thinking for hard reasoning steps
  • You're building production agents that need auditable behaviour
  • MCP-native tool integration is a feature you want
  • Computer-use or vision-heavy workflows where Claude leads
  • You're already standardised on Claude (most SA enterprise that's picked one frontier model)

Where Claude lands in SA delivery work.

Enterprise · AWS Bedrock af-south-1 as the residency-clean path

For SA enterprise (banks, insurers, telcos) with POPIA cross-border concerns, AWS Bedrock in Cape Town hosts Claude Sonnet and Haiku with full SA data residency. That's the structurally clean answer for PII-bearing workloads — data stays on-region, audit logs flow into CloudWatch, IAM controls integrate with existing AWS posture. Opus 4.7 availability on Bedrock af-south-1 can lag the Anthropic API by weeks; if you need bleeding-edge Opus, plan for direct API access with documented Section 72 cross-border consent flows.

Studio · the daily-driver tier

Many SA studios have made Claude the daily driver via Claude Code (the CLI), Claude Desktop, or Cursor's Claude integration. The honest reality: this site was substantially written with Claude running through Claude Code. For studio dev work where productivity-per-engineer matters more than per-token cost, Claude is hard to beat — and the SDK leaf in this domain explains how to extend that productivity into your own production agents.

FX exposure note

Claude is USD-billed. At pilot volume (single-digit thousands of requests / month), the cost is invisible. At production volume, it's a real line item. Mitigations that work in practice: (a) the Haiku router pattern in the previous section, (b) prompt caching for RAG agents, (c) the Batch API for nightly workloads, (d) hybrid local+cloud where Ollama-Gemma 3 handles the 60-80% of work that doesn't need frontier reasoning. Most SA studios that run Claude in production end up with all four mitigations in place.

Where Claude links in the tree.

Primary sources only.