know.2nth.ai › Agents › Claude

agents · Claude · Skill Leaf

Reasoning quality, priced and tiered.

Claude is Anthropic's flagship model family. Four tiers in 2026: Fable 5 — a Mythos-class model and the new top tier above Opus — for the hardest, longest-horizon work, Opus 4.8 for hard reasoning, Sonnet 4.6 for everyday agent work, Haiku 4.5 for high-volume routing and classification. 1M context on Fable, Opus, and Sonnet; 200k on Haiku. Adaptive thinking, MCP-first tool layer, vision and computer use built in. The default for production agents that need long-context reasoning, reliable function-calling, and instruction-following beyond what current open-weights models reach.

Fable 5 · Claude 4.x · current Fable 5 · Opus 4.8 · Sonnet 4.6 · Haiku 4.5 up to 1M context Adaptive thinking MCP-native

01 · What it is

Anthropic's flagship family.

Claude is the model family from Anthropic, the AI safety company founded in 2021 by former OpenAI researchers. The first Claude model shipped in early 2023; the family has been on a steady release cadence since, with Claude 4.x being the current generation as of mid-2026.

Where GPT and Gemini compete on raw capability and ecosystem reach, Claude has historically led on three things: instruction-following reliability (does what you ask, even when you ask weird things), long-context reasoning quality (uses 200k+ context windows well, doesn't degrade like cheaper models), and safety-focused behaviour (refuses cleanly, hallucinates less in adversarial domains, avoids the "confidently wrong" failure mode that plagues mid-tier models). Those three properties make it the default choice for production agents that have to be right and have to be auditable.

Distribution: Claude is available via the Anthropic API directly, via AWS Bedrock (including in af-south-1 Cape Town for SA-resident calls), via Google Cloud Vertex AI, via the Claude.ai web app, the Claude Desktop app, the Claude Code CLI, and through every major aggregator (OpenRouter, LangSmith, etc.). Enterprise plans include Claude Enterprise with SSO, audit logging, and admin controls.

The family naming convention

Most Claude versions follow Claude {Tier} {Major.Minor}. Tier is the size class (Opus / Sonnet / Haiku — large / medium / small). Major is the model generation; minor is the within-generation iteration. So Claude Opus 4.8 is the eighth Opus 4.x release. The API model ID drops the "Claude" prefix and uses dashes: claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5. Fable 5 breaks the pattern: it's a Mythos-class model — a distinct top tier above Opus rather than a size within the 4.x line — with ID claude-fable-5. It shares the same request surface as Opus 4.7/4.8 (adaptive thinking only; no sampling params), with one wrinkle: an explicit thinking: {type: "disabled"} is rejected — omit the field to run without thinking.

02 · The Claude lineup

Four tiers, one quality curve.

The Claude family is deliberately tiered. Each tier is meaningfully cheaper than the one above and meaningfully less capable on the hardest tasks. The right production pattern is multi-tier routing: Haiku for the volume work, Sonnet for the default, Opus for hard cases, Fable 5 for the genuinely hardest and longest-horizon ones. The Anthropic Agent SDK supports this routing trivially — sub-agents can run on different tiers.

Tier · Mythos-class

Claude Fable 5

The new top tier, above Opus. State-of-the-art on nearly all tested benchmarks, with its lead widening as tasks get longer and more complex — long-horizon agentic execution, deep software engineering, scientific research, and vision. The most capable (and most expensive) Claude model. 1M context, 128k max output. Reach for it when correctness on a hard, multi-step task outweighs cost.

Tier · frontier

Claude Opus 4.8

The hardest-reasoning Opus tier. Best for: deep research, complex code review, multi-step planning, agentic problem-solving. Highly autonomous on long-horizon work. Half the price of Fable 5 and still frontier-class — the right pick when Fable would be overkill. 1M context, 128k max output.

Tier · default

Claude Sonnet 4.6

The "good for almost everything" tier. Strong fraction of Opus quality at meaningfully lower cost. The right default for most production agents. 1M context. Most teams' day-to-day workhorse.

Tier · volume

Claude Haiku 4.5

The fastest, cheapest tier. Best for: classification, routing, summarisation, structured-extraction at high throughput. Lower reasoning depth but very fast and cheap. Good fit for "first-pass" agents that escalate to Sonnet/Opus/Fable when needed. 200k context.

Capability · cross-tier

All tiers ship

Tool use / function calling, vision (image input), MCP-native integration, structured outputs (JSON-schema-constrained), adaptive thinking on Fable/Opus/Sonnet, computer use (vision + UI control), prompt caching, streaming.

Adaptive thinking — the Claude differentiator

Claude Fable 5, Opus, and Sonnet support adaptive thinking: the model decides on its own when and how much to "think before answering," producing internal reasoning that lifts answer quality on hard tasks. You set thinking={"type": "adaptive"} and steer depth with the effort parameter (low…max) rather than a fixed token budget. The older fixed budget_tokens knob is removed on Fable 5 and Opus 4.7+ (it 400s) and deprecated elsewhere. Most other model families either don't have this or expose it only via specialised "reasoning" SKUs (OpenAI's o-series, Gemini Thinking).

03 · vs GPT, Gemini, Llama

Where Claude is the right pick — and where it isn't.

Honest cross-family positioning. Each model family has a real strength and a real constraint. Most production stacks in 2026 use multiple families for different work; few teams pick one and use it for everything.

Family	Strengths	Watch out for
Claude (Anthropic)	Best instruction-following, strong long-context reasoning, native extended thinking, tight MCP integration	Closed model; USD billing; rate limits hit hard at high volume on free tier
GPT (OpenAI)	Largest API ecosystem, broadest tooling, strong structured outputs, leading on multimodal extremes	Voice / video / DALL-E billed separately; more "confidently wrong" failure mode than Claude
Gemini (Google)	Largest context window (1M+ on Pro/Ultra), native search grounding, voice via Live API, GCP integration	Less consistent on instruction-following than Claude; Gemini-only via ADK/Vertex unless via OpenRouter
Llama (Meta, open-weights)	Open weights, runs locally via Ollama, largest community fine-tune ecosystem	Smaller frontier gap; instruction-following lags Claude/GPT; Meta licence has commercial restrictions over 700M MAU
Gemma (Google, open-weights)	Open weights, multimodal, frontier-lab safety tuning, runs locally	Smaller community fine-tune ecosystem than Llama; not as code-strong as Qwen-coder

The hybrid that's actually shipping in 2026

Most production agent teams use two or three model families together. Common patterns: Claude for the orchestrator + reasoning steps, GPT for tool-calling-heavy sub-agents, local Ollama-Gemma for PII-bearing pre-processing, Gemini for very-long-context tasks (1M+). The Agents domain hub frames this explicitly: "the question isn't which framework or which model, it's which framework where and which model when." Multi-model is the production reality.

04 · Pricing reality

Frontier-tier pricing, with real production trade-offs.

Claude is not cheap at the top tier. Sonnet 4.6 is meaningfully more affordable than Opus and very close in quality — for most production work, Sonnet is the right default. Haiku 4.5 is cheap enough for high-volume tasks; Fable 5 is the premium you pay only when a hard, long-horizon task has to be right. Pricing is USD per million tokens, with separate input and output rates, and thinking tokens billed at the output rate. Always check anthropic.com/pricing for current numbers — the rates here are accurate to mid-2026 but Anthropic adjusts pricing as model generations evolve.

List rates, per million tokens (input / output):

Fable 5 — $10 / $50. The top tier; ~2× Opus on both input and output. Worth it when a hard, multi-step task's correctness outweighs cost — not for volume.
Opus 4.8 — $5 / $25. Frontier-class at half Fable's rate. The hard-reasoning default before you reach for Fable.
Sonnet 4.6 — $3 / $15. The right default for production volume — most of Opus's quality at a fraction of the cost.
Haiku 4.5 — $1 / $5. Cheapest by far. The tier for routing, classification, and structured extraction at scale.
Prompt caching — up to 90% discount on cached input tokens for repeated context. Significant cost saver for RAG agents that share common prompts across calls.
Batch API — 50% discount on input + output for non-urgent batch workloads. Useful for nightly processing.

Output is billed ~5× input across all four tiers. The 1M context window on Fable, Opus, and Sonnet is at standard rates — no long-context premium.

The cost-saving routing pattern

Build a Haiku-driven router as the first agent in the chain. It classifies incoming requests by type and difficulty, then dispatches to: a Haiku agent for simple cases (60%+ of traffic typically), a Sonnet agent for the default case, and an Opus agent only for the genuinely hard cases (5-15% typically). This pattern can cut Claude costs by 50-80% versus running everything on a single tier. The Anthropic Agent SDK's sub-agent pattern fits this routing model directly.

05 · Decision guide

When Claude is the right model. When it isn't.

Use Claude when

Instruction-following reliability is load-bearing — the model has to do exactly what you ask
Long-context reasoning matters (200k+ tokens, real use of the context)
You want extended thinking for hard reasoning steps
You're building production agents that need auditable behaviour
MCP-native tool integration is a feature you want
Computer-use or vision-heavy workflows where Claude leads
You're already standardised on Claude (most SA enterprise that's picked one frontier model)

Skip Claude when

POPIA / data-residency forbids cross-border to US (mitigate via AWS Bedrock af-south-1)
Cost-sensitive at extreme scale — open-weights via Ollama is cheaper
You need context beyond 1M tokens — Claude now ships 1M on Fable/Opus/Sonnet, but Gemini Pro / Ultra still leads past that
Voice / video native — Gemini Live or specialised voice APIs
DALL-E or specialised image-gen integration — OpenAI ecosystem fits
You want a fully open / self-hostable model — Claude is closed

06 · South African context

Where Claude lands in SA delivery work.

Enterprise · AWS Bedrock `af-south-1` as the residency-clean path

For SA enterprise (banks, insurers, telcos) with POPIA cross-border concerns, AWS Bedrock in Cape Town hosts Claude Sonnet and Haiku with full SA data residency. That's the structurally clean answer for PII-bearing workloads — data stays on-region, audit logs flow into CloudWatch, IAM controls integrate with existing AWS posture. Top-tier availability on Bedrock af-south-1 can lag the Anthropic API by weeks — the newest Opus and Fable 5 land on the direct API first; if you need the bleeding edge, plan for direct API access with documented Section 72 cross-border consent flows.

Studio · the daily-driver tier

Many SA studios have made Claude the daily driver via Claude Code (the CLI), Claude Desktop, or Cursor's Claude integration. The honest reality: this site was substantially written with Claude running through Claude Code. For studio dev work where productivity-per-engineer matters more than per-token cost, Claude is hard to beat — and the SDK leaf in this domain explains how to extend that productivity into your own production agents.

FX exposure note

Claude is USD-billed. At pilot volume (single-digit thousands of requests / month), the cost is invisible. At production volume, it's a real line item. Mitigations that work in practice: (a) the Haiku router pattern in the previous section, (b) prompt caching for RAG agents, (c) the Batch API for nightly workloads, (d) hybrid local+cloud where Ollama-Gemma 3 handles the 60-80% of work that doesn't need frontier reasoning. Most SA studios that run Claude in production end up with all four mitigations in place.

07 · Connections

Where Claude links in the tree.

agents

Agents hub

The sub-tree landing. Claude sits in the Models band; the Anthropic Agent SDK in the Frameworks band is its natural framework pairing.

agents/anthropic-agent-sdk

Anthropic Agent SDK

The Claude-first agent framework. MCP-native, Task-tool sub-agents, native extended thinking. The default SDK if Claude is your model.

agents/mcp

Model Context Protocol

Anthropic-originated. Claude is MCP's reference consumer; Claude Desktop's MCP integration is the canonical pattern.

agents/langgraph

LangGraph

Most-common LangGraph pairing — LangGraph + Claude is one of the production stacks. Use langchain-anthropic for the integration.

agents/google-adk

Google ADK

ADK supports Claude via LiteLLM. The combination is unusual (ADK is Gemini-first) but works for teams on GCP infrastructure who want Claude as the model.

agents/ollama

Ollama

The local-inference complement. Common hybrid: Claude for hard reasoning + Ollama-Gemma for high-volume PII pre-processing. Cost and POPIA story improves significantly.

agents/gemma

Gemma

The open-weights complement. When Claude is overkill or POPIA-residency requires local inference, Gemma 3 27B is the pragmatic step down.

agents/gpt

GPT · OpenAI

The major frontier alternative. Different strengths (broader API ecosystem, more multimodal extremes); often used together with Claude in multi-model production stacks.

08 · Resources

Primary sources only.

Site anthropic.com/claude anthropic.com/claude Docs Anthropic API documentation docs.anthropic.com Pricing anthropic.com/pricing anthropic.com/pricing Models Model card & capability matrix docs.anthropic.com/.../models Feature Extended thinking docs docs.anthropic.com/.../extended-thinking Hosting Claude on AWS Bedrock (af-south-1) aws.amazon.com/bedrock/claude Cookbook anthropics/claude-cookbooks github.com/anthropics/claude-cookbooks Course anthropics/courses · prompt engineering & evals github.com/anthropics/courses