Claude is Anthropic's flagship model family. Four tiers in 2026: Fable 5 — a Mythos-class model and the new top tier above Opus — for the hardest, longest-horizon work, Opus 4.8 for hard reasoning, Sonnet 4.6 for everyday agent work, Haiku 4.5 for high-volume routing and classification. 1M context on Fable, Opus, and Sonnet; 200k on Haiku. Adaptive thinking, MCP-first tool layer, vision and computer use built in. The default for production agents that need long-context reasoning, reliable function-calling, and instruction-following beyond what current open-weights models reach.
Claude is the model family from Anthropic, the AI safety company founded in 2021 by former OpenAI researchers. The first Claude model shipped in early 2023; the family has been on a steady release cadence since, with Claude 4.x being the current generation as of mid-2026.
Where GPT and Gemini compete on raw capability and ecosystem reach, Claude has historically led on three things: instruction-following reliability (does what you ask, even when you ask weird things), long-context reasoning quality (uses 200k+ context windows well, doesn't degrade like cheaper models), and safety-focused behaviour (refuses cleanly, hallucinates less in adversarial domains, avoids the "confidently wrong" failure mode that plagues mid-tier models). Those three properties make it the default choice for production agents that have to be right and have to be auditable.
Distribution: Claude is available via the Anthropic API directly, via AWS Bedrock (including in af-south-1 Cape Town for SA-resident calls), via Google Cloud Vertex AI, via the Claude.ai web app, the Claude Desktop app, the Claude Code CLI, and through every major aggregator (OpenRouter, LangSmith, etc.). Enterprise plans include Claude Enterprise with SSO, audit logging, and admin controls.
Most Claude versions follow Claude {Tier} {Major.Minor}. Tier is the size class (Opus / Sonnet / Haiku — large / medium / small). Major is the model generation; minor is the within-generation iteration. So Claude Opus 4.8 is the eighth Opus 4.x release. The API model ID drops the "Claude" prefix and uses dashes: claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5. Fable 5 breaks the pattern: it's a Mythos-class model — a distinct top tier above Opus rather than a size within the 4.x line — with ID claude-fable-5. It shares the same request surface as Opus 4.7/4.8 (adaptive thinking only; no sampling params), with one wrinkle: an explicit thinking: {type: "disabled"} is rejected — omit the field to run without thinking.
The Claude family is deliberately tiered. Each tier is meaningfully cheaper than the one above and meaningfully less capable on the hardest tasks. The right production pattern is multi-tier routing: Haiku for the volume work, Sonnet for the default, Opus for hard cases, Fable 5 for the genuinely hardest and longest-horizon ones. The Anthropic Agent SDK supports this routing trivially — sub-agents can run on different tiers.
The new top tier, above Opus. State-of-the-art on nearly all tested benchmarks, with its lead widening as tasks get longer and more complex — long-horizon agentic execution, deep software engineering, scientific research, and vision. The most capable (and most expensive) Claude model. 1M context, 128k max output. Reach for it when correctness on a hard, multi-step task outweighs cost.
The hardest-reasoning Opus tier. Best for: deep research, complex code review, multi-step planning, agentic problem-solving. Highly autonomous on long-horizon work. Half the price of Fable 5 and still frontier-class — the right pick when Fable would be overkill. 1M context, 128k max output.
The "good for almost everything" tier. Strong fraction of Opus quality at meaningfully lower cost. The right default for most production agents. 1M context. Most teams' day-to-day workhorse.
The fastest, cheapest tier. Best for: classification, routing, summarisation, structured-extraction at high throughput. Lower reasoning depth but very fast and cheap. Good fit for "first-pass" agents that escalate to Sonnet/Opus/Fable when needed. 200k context.
Tool use / function calling, vision (image input), MCP-native integration, structured outputs (JSON-schema-constrained), adaptive thinking on Fable/Opus/Sonnet, computer use (vision + UI control), prompt caching, streaming.
Claude Fable 5, Opus, and Sonnet support adaptive thinking: the model decides on its own when and how much to "think before answering," producing internal reasoning that lifts answer quality on hard tasks. You set thinking={"type": "adaptive"} and steer depth with the effort parameter (low…max) rather than a fixed token budget. The older fixed budget_tokens knob is removed on Fable 5 and Opus 4.7+ (it 400s) and deprecated elsewhere. Most other model families either don't have this or expose it only via specialised "reasoning" SKUs (OpenAI's o-series, Gemini Thinking).
Honest cross-family positioning. Each model family has a real strength and a real constraint. Most production stacks in 2026 use multiple families for different work; few teams pick one and use it for everything.
| Family | Strengths | Watch out for |
|---|---|---|
| Claude (Anthropic) | Best instruction-following, strong long-context reasoning, native extended thinking, tight MCP integration | Closed model; USD billing; rate limits hit hard at high volume on free tier |
| GPT (OpenAI) | Largest API ecosystem, broadest tooling, strong structured outputs, leading on multimodal extremes | Voice / video / DALL-E billed separately; more "confidently wrong" failure mode than Claude |
| Gemini (Google) | Largest context window (1M+ on Pro/Ultra), native search grounding, voice via Live API, GCP integration | Less consistent on instruction-following than Claude; Gemini-only via ADK/Vertex unless via OpenRouter |
| Llama (Meta, open-weights) | Open weights, runs locally via Ollama, largest community fine-tune ecosystem | Smaller frontier gap; instruction-following lags Claude/GPT; Meta licence has commercial restrictions over 700M MAU |
| Gemma (Google, open-weights) | Open weights, multimodal, frontier-lab safety tuning, runs locally | Smaller community fine-tune ecosystem than Llama; not as code-strong as Qwen-coder |
Most production agent teams use two or three model families together. Common patterns: Claude for the orchestrator + reasoning steps, GPT for tool-calling-heavy sub-agents, local Ollama-Gemma for PII-bearing pre-processing, Gemini for very-long-context tasks (1M+). The Agents domain hub frames this explicitly: "the question isn't which framework or which model, it's which framework where and which model when." Multi-model is the production reality.
Claude is not cheap at the top tier. Sonnet 4.6 is meaningfully more affordable than Opus and very close in quality — for most production work, Sonnet is the right default. Haiku 4.5 is cheap enough for high-volume tasks; Fable 5 is the premium you pay only when a hard, long-horizon task has to be right. Pricing is USD per million tokens, with separate input and output rates, and thinking tokens billed at the output rate. Always check anthropic.com/pricing for current numbers — the rates here are accurate to mid-2026 but Anthropic adjusts pricing as model generations evolve.
List rates, per million tokens (input / output):
$10 / $50. The top tier; ~2× Opus on both input and output. Worth it when a hard, multi-step task's correctness outweighs cost — not for volume.$5 / $25. Frontier-class at half Fable's rate. The hard-reasoning default before you reach for Fable.$3 / $15. The right default for production volume — most of Opus's quality at a fraction of the cost.$1 / $5. Cheapest by far. The tier for routing, classification, and structured extraction at scale.Output is billed ~5× input across all four tiers. The 1M context window on Fable, Opus, and Sonnet is at standard rates — no long-context premium.
Build a Haiku-driven router as the first agent in the chain. It classifies incoming requests by type and difficulty, then dispatches to: a Haiku agent for simple cases (60%+ of traffic typically), a Sonnet agent for the default case, and an Opus agent only for the genuinely hard cases (5-15% typically). This pattern can cut Claude costs by 50-80% versus running everything on a single tier. The Anthropic Agent SDK's sub-agent pattern fits this routing model directly.
af-south-1)af-south-1 as the residency-clean pathFor SA enterprise (banks, insurers, telcos) with POPIA cross-border concerns, AWS Bedrock in Cape Town hosts Claude Sonnet and Haiku with full SA data residency. That's the structurally clean answer for PII-bearing workloads — data stays on-region, audit logs flow into CloudWatch, IAM controls integrate with existing AWS posture. Top-tier availability on Bedrock af-south-1 can lag the Anthropic API by weeks — the newest Opus and Fable 5 land on the direct API first; if you need the bleeding edge, plan for direct API access with documented Section 72 cross-border consent flows.
Many SA studios have made Claude the daily driver via Claude Code (the CLI), Claude Desktop, or Cursor's Claude integration. The honest reality: this site was substantially written with Claude running through Claude Code. For studio dev work where productivity-per-engineer matters more than per-token cost, Claude is hard to beat — and the SDK leaf in this domain explains how to extend that productivity into your own production agents.
Claude is USD-billed. At pilot volume (single-digit thousands of requests / month), the cost is invisible. At production volume, it's a real line item. Mitigations that work in practice: (a) the Haiku router pattern in the previous section, (b) prompt caching for RAG agents, (c) the Batch API for nightly workloads, (d) hybrid local+cloud where Ollama-Gemma 3 handles the 60-80% of work that doesn't need frontier reasoning. Most SA studios that run Claude in production end up with all four mitigations in place.
langchain-anthropic for the integration.