know.2nth.ai › Agents › Claude

agents · Claude · Skill Leaf

Reasoning quality, priced and tiered.

Claude is Anthropic's flagship model family. Three production tiers in 2026: Opus 4.7 for the hardest reasoning, Sonnet 4.6 for everyday agent work, Haiku 4.5 for high-volume routing and classification. 200k standard context window, 1M extended on premium tiers. Native extended thinking, MCP-first tool layer, vision and computer use built in. The default for production agents that need long-context reasoning, reliable function-calling, and instruction-following beyond what current open-weights models reach.

Claude 4.x family · current Opus 4.7 · Sonnet 4.6 · Haiku 4.5 200k → 1M context Extended thinking MCP-native

01 · What it is

Anthropic's flagship family.

Claude is the model family from Anthropic, the AI safety company founded in 2021 by former OpenAI researchers. The first Claude model shipped in early 2023; the family has been on a steady release cadence since, with Claude 4.x being the current generation as of mid-2026.

Where GPT and Gemini compete on raw capability and ecosystem reach, Claude has historically led on three things: instruction-following reliability (does what you ask, even when you ask weird things), long-context reasoning quality (uses 200k+ context windows well, doesn't degrade like cheaper models), and safety-focused behaviour (refuses cleanly, hallucinates less in adversarial domains, avoids the "confidently wrong" failure mode that plagues mid-tier models). Those three properties make it the default choice for production agents that have to be right and have to be auditable.

Distribution: Claude is available via the Anthropic API directly, via AWS Bedrock (including in af-south-1 Cape Town for SA-resident calls), via Google Cloud Vertex AI, via the Claude.ai web app, the Claude Desktop app, the Claude Code CLI, and through every major aggregator (OpenRouter, LangSmith, etc.). Enterprise plans include Claude Enterprise with SSO, audit logging, and admin controls.

The family naming convention

Claude versions follow Claude {Tier} {Major.Minor}. Tier is the size class (Opus / Sonnet / Haiku — large / medium / small). Major is the model generation; minor is the within-generation iteration. So Claude Opus 4.7 is the seventh Opus 4.x release. The full model ID for API calls drops the "Claude" prefix and uses dashes: claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5.

02 · The Claude 4.x lineup

Three tiers, one quality curve.

The Claude family is deliberately tiered. Each tier is meaningfully cheaper than the one above and meaningfully less capable on the hardest tasks. The right production pattern is multi-tier routing: Haiku for the volume work, Sonnet for the default, Opus for the hard cases. The Anthropic Agent SDK supports this routing trivially — sub-agents can run on different tiers.

Tier · frontier

Claude Opus 4.7

The hardest-reasoning tier. Best for: deep research, complex code review, multi-step planning, agentic problem-solving. Frontier-level on reasoning benchmarks. Most expensive of the three. 200k standard / 1M extended context.

Tier · default

Claude Sonnet 4.6

The "good for almost everything" tier. ~80% of Opus quality at meaningfully lower cost. The right default for most production agents. 200k context. Most teams' day-to-day workhorse.

Tier · volume

Claude Haiku 4.5

The fastest, cheapest tier. Best for: classification, routing, summarisation, structured-extraction at high throughput. Lower reasoning depth but very fast and cheap. Good fit for "first-pass" agents that escalate to Sonnet/Opus when needed.

Capability · cross-tier

All tiers ship

Tool use / function calling, vision (image input), MCP-native integration, JSON mode for structured outputs, extended thinking on Opus and Sonnet, computer use on Sonnet (vision + UI control), 200k+ context, streaming.

Extended thinking — the Claude differentiator

Claude Opus and Sonnet 4.x support extended thinking: when the prompt indicates complex reasoning is required, the model can allocate a separately-billed thinking-token budget to "think before answering" — producing internal reasoning traces that improve final answer quality on hard tasks. The Anthropic Agent SDK exposes this as thinking={"budget_tokens": 16000}. Most other model families either don't have this or expose it only via specialised "reasoning" SKUs (OpenAI's o-series, Gemini Thinking).

03 · vs GPT, Gemini, Llama

Where Claude is the right pick — and where it isn't.

Honest cross-family positioning. Each model family has a real strength and a real constraint. Most production stacks in 2026 use multiple families for different work; few teams pick one and use it for everything.

Family	Strengths	Watch out for
Claude (Anthropic)	Best instruction-following, strong long-context reasoning, native extended thinking, tight MCP integration	Closed model; USD billing; rate limits hit hard at high volume on free tier
GPT (OpenAI)	Largest API ecosystem, broadest tooling, strong structured outputs, leading on multimodal extremes	Voice / video / DALL-E billed separately; more "confidently wrong" failure mode than Claude
Gemini (Google)	Largest context window (1M+ on Pro/Ultra), native search grounding, voice via Live API, GCP integration	Less consistent on instruction-following than Claude; Gemini-only via ADK/Vertex unless via OpenRouter
Llama (Meta, open-weights)	Open weights, runs locally via Ollama, largest community fine-tune ecosystem	Smaller frontier gap; instruction-following lags Claude/GPT; Meta licence has commercial restrictions over 700M MAU
Gemma (Google, open-weights)	Open weights, multimodal, frontier-lab safety tuning, runs locally	Smaller community fine-tune ecosystem than Llama; not as code-strong as Qwen-coder

The hybrid that's actually shipping in 2026

Most production agent teams use two or three model families together. Common patterns: Claude for the orchestrator + reasoning steps, GPT for tool-calling-heavy sub-agents, local Ollama-Gemma for PII-bearing pre-processing, Gemini for very-long-context tasks (1M+). The Agents domain hub frames this explicitly: "the question isn't which framework or which model, it's which framework where and which model when." Multi-model is the production reality.

04 · Pricing reality

Frontier-tier pricing, with real production trade-offs.

Claude is not cheap at the frontier tier. Sonnet 4.6 is meaningfully more affordable than Opus 4.7 and very close in quality — for most production work, Sonnet is the right default. Haiku 4.5 is cheap enough for high-volume tasks. Pricing is USD per million tokens, with separate input and output rates, and extended thinking tokens billed at output rate. Always check anthropic.com/pricing for current numbers — the rates here are accurate to mid-2026 but Anthropic adjusts pricing as model generations evolve.

The shape of the pricing curve (illustrative; check the official page for exact numbers):

Opus 4.7 — frontier-tier rates. Output tokens billed roughly 5× input. Extended thinking tokens billed at output rate. The most expensive tier.
Sonnet 4.6 — meaningfully cheaper than Opus, similar 5× output multiplier. The right default for production volume.
Haiku 4.5 — cheapest of the three by an order of magnitude. The right tier for routing, classification, and structured extraction at scale.
Prompt caching — 90% discount on cached input tokens for repeated context. Significant cost saver for RAG agents that share common prompts across calls.
Batch API — 50% discount on input + output for non-urgent batch workloads. Useful for nightly processing.

The cost-saving routing pattern

Build a Haiku-driven router as the first agent in the chain. It classifies incoming requests by type and difficulty, then dispatches to: a Haiku agent for simple cases (60%+ of traffic typically), a Sonnet agent for the default case, and an Opus agent only for the genuinely hard cases (5-15% typically). This pattern can cut Claude costs by 50-80% versus running everything on a single tier. The Anthropic Agent SDK's sub-agent pattern fits this routing model directly.

05 · Decision guide

When Claude is the right model. When it isn't.

Use Claude when

Instruction-following reliability is load-bearing — the model has to do exactly what you ask
Long-context reasoning matters (200k+ tokens, real use of the context)
You want extended thinking for hard reasoning steps
You're building production agents that need auditable behaviour
MCP-native tool integration is a feature you want
Computer-use or vision-heavy workflows where Claude leads
You're already standardised on Claude (most SA enterprise that's picked one frontier model)

Skip Claude when

POPIA / data-residency forbids cross-border to US (mitigate via AWS Bedrock af-south-1)
Cost-sensitive at extreme scale — open-weights via Ollama is cheaper
You need 1M+ context windows — Gemini Pro / Ultra leads
Voice / video native — Gemini Live or specialised voice APIs
DALL-E or specialised image-gen integration — OpenAI ecosystem fits
You want a fully open / self-hostable model — Claude is closed

06 · South African context

Where Claude lands in SA delivery work.

Enterprise · AWS Bedrock `af-south-1` as the residency-clean path

For SA enterprise (banks, insurers, telcos) with POPIA cross-border concerns, AWS Bedrock in Cape Town hosts Claude Sonnet and Haiku with full SA data residency. That's the structurally clean answer for PII-bearing workloads — data stays on-region, audit logs flow into CloudWatch, IAM controls integrate with existing AWS posture. Opus 4.7 availability on Bedrock af-south-1 can lag the Anthropic API by weeks; if you need bleeding-edge Opus, plan for direct API access with documented Section 72 cross-border consent flows.

Studio · the daily-driver tier

Many SA studios have made Claude the daily driver via Claude Code (the CLI), Claude Desktop, or Cursor's Claude integration. The honest reality: this site was substantially written with Claude running through Claude Code. For studio dev work where productivity-per-engineer matters more than per-token cost, Claude is hard to beat — and the SDK leaf in this domain explains how to extend that productivity into your own production agents.

FX exposure note

Claude is USD-billed. At pilot volume (single-digit thousands of requests / month), the cost is invisible. At production volume, it's a real line item. Mitigations that work in practice: (a) the Haiku router pattern in the previous section, (b) prompt caching for RAG agents, (c) the Batch API for nightly workloads, (d) hybrid local+cloud where Ollama-Gemma 3 handles the 60-80% of work that doesn't need frontier reasoning. Most SA studios that run Claude in production end up with all four mitigations in place.

07 · Connections

Where Claude links in the tree.

agents

Agents hub

The sub-tree landing. Claude sits in the Models band; the Anthropic Agent SDK in the Frameworks band is its natural framework pairing.

agents/anthropic-agent-sdk

Anthropic Agent SDK

The Claude-first agent framework. MCP-native, Task-tool sub-agents, native extended thinking. The default SDK if Claude is your model.

agents/mcp

Model Context Protocol

Anthropic-originated. Claude is MCP's reference consumer; Claude Desktop's MCP integration is the canonical pattern.

agents/langgraph

LangGraph

Most-common LangGraph pairing — LangGraph + Claude is one of the production stacks. Use langchain-anthropic for the integration.

agents/google-adk

Google ADK

ADK supports Claude via LiteLLM. The combination is unusual (ADK is Gemini-first) but works for teams on GCP infrastructure who want Claude as the model.

agents/ollama

Ollama

The local-inference complement. Common hybrid: Claude for hard reasoning + Ollama-Gemma for high-volume PII pre-processing. Cost and POPIA story improves significantly.

agents/gemma

Gemma

The open-weights complement. When Claude is overkill or POPIA-residency requires local inference, Gemma 3 27B is the pragmatic step down.

agents/gpt

GPT · OpenAI

The major frontier alternative. Different strengths (broader API ecosystem, more multimodal extremes); often used together with Claude in multi-model production stacks.

08 · Resources

Primary sources only.

Site anthropic.com/claude anthropic.com/claude Docs Anthropic API documentation docs.anthropic.com Pricing anthropic.com/pricing anthropic.com/pricing Models Model card & capability matrix docs.anthropic.com/.../models Feature Extended thinking docs docs.anthropic.com/.../extended-thinking Hosting Claude on AWS Bedrock (af-south-1) aws.amazon.com/bedrock/claude