Gemini is Google DeepMind's flagship model family. The current line in mid-2026 is Gemini 3.x: Gemini 3.1 Pro for the hardest reasoning and agentic coding, Gemini 3.5 Flash for frontier-class agentic and coding work, Gemini 3.1 Flash-Lite for high-volume routing (the 2.5 line is now legacy). The differentiators that matter: 1M+ token context (the largest in the field, reportedly up to ~2M on the Pro tier), native multimodal across text, image, video, and audio, native Google Search grounding, native bidirectional voice via the Live API, and now native multimodal generation via Gemini Omni — conversational video and image creation from any input. Hosted via Google AI Studio, Vertex AI (including africa-south1 in Johannesburg), and OpenRouter. The default closed-frontier model when extreme context, search grounding, voice, or media generation is the load-bearing capability.
Gemini is the model family from Google DeepMind, the merged Google research lab that combines DeepMind's frontier-AI work with the production model engineering that powers Google's products. The first Gemini family shipped in late 2023 (Gemini 1.0 Pro, Ultra, Nano); subsequent generations followed roughly annually, with Gemini 1.5 expanding context to 1M tokens, Gemini 2.0 adding native multimodal generation, Gemini 2.5 bringing production-grade reasoning and the Live API, and the current Gemini 3.x line pushing frontier agentic and coding performance.
Where Claude leads on instruction-following and GPT on function-calling reach, Gemini has historically led on three things: extreme context length (1M+ tokens reliably used, not just nominally supported), native multimodality (text, image, video, audio in a single model rather than bolted-together components), and native search grounding (the model can search Google and ground responses in current information without external RAG infrastructure). Those three properties make it the default choice for use cases where any of them is structurally needed.
Distribution: Gemini is available via Google AI Studio (the developer-friendly direct path, also free up to a quota), Vertex AI (the enterprise GCP path, including the JHB africa-south1 region for SA-resident calls), Gemini.google.com (the consumer product), and through every major aggregator (OpenRouter, LiteLLM). Inside Google's own products — Search AI Overviews, Workspace AI features, Pixel devices — Gemini is the model behind much of what users actually interact with daily, making it one of the most-deployed model families by raw query count.
Gemini's naming pattern: Gemini {version} {tier}. Version is the generation (1.0 → 1.5 → 2.0 → 2.5 → the current 3.x). Tier is the size class — Pro (the flagship reasoning SKU at the top), Flash (frontier-class, faster), Flash-Lite (cheapest / fastest), and Nano (on-device, mobile-targeted). The 3.x line dropped the separate "Ultra" SKU; Pro is now the top. API model IDs use dashes: gemini-3.1-pro, gemini-3.5-flash, gemini-3.1-flash-lite. The whole Gemini 2.5 line (gemini-2.5-pro, -flash, -flash-lite) is still served but now legacy — check the model availability matrix before pinning anything older.
The Gemini family is tiered like Claude and GPT: flagship at the top for hard reasoning, a frontier-class Flash for the everyday default, a Flash-Lite for high-volume routine work. The difference that's held across generations: the whole family ships a 1M+ context window. The 3.x line also collapsed the old "Ultra" tier — Pro is now the single top SKU.
The top reasoning SKU. Best for: complex problem-solving, deep research, code generation, agentic and "vibe" coding, long-context analysis. 1M+ token window (reportedly up to ~2M on Pro). Input pricing roughly doubles for prompts past 200k tokens.
Google's pitch is "most intelligent model for sustained frontier performance on agentic and coding tasks" — frontier-class quality at Flash pricing. Launched at I/O 2026, it's positioned as outperforming Gemini 3.1 Pro on almost all benchmarks while running ~4× faster — the speed engine for real-world agentic loops, and the model behind Antigravity's Managed Agents. The right default for production agent volume. (Gemini 3 Flash is the preview sibling at an even lower price point.)
The cheapest Gemini 3 tier. Best for: routing, classification, summarisation, structured extraction at scale. Still carries the large context window — ideal for "scan a long document, answer one question" patterns where a heavier tier would be overkill.
1M+ context, native multimodal (text/image/video/audio in, text/image out), function-calling, structured outputs, native Google Search grounding (paid feature), the Live API for bidirectional streaming voice and video.
Many models nominally support large context but degrade meaningfully past 32k or 64k. Gemini Pro is the rare model that genuinely uses its full window — you can pass an entire codebase, a 200-page document, or several hours of audio transcription and get coherent reasoning back. For "ask the company" agents over large internal corpora, Gemini's context advantage often beats the RAG-engineering complexity Claude or GPT would otherwise require. The trade-offs: latency grows with context (multiple seconds at the top of the window), and on Gemini 3 Pro input pricing roughly doubles once a prompt crosses 200k tokens.
The reasoning tiers above read media; Gemini Omni creates it. Google's pitch is "create anything from any input — starting with video": conversational video and image generation and editing, with consistency held across iterative edits, physics-aware motion (gravity, kinetic energy, fluid dynamics), and world knowledge baked in. It takes video, image, text, and audio as input. This is Gemini moving from understanding multimodal input to generating multimodal output in the same family. Honest caveat for builders: at launch it's surfaced through consumer and creative products — the Gemini app, Google Flow, and YouTube Shorts — not a confirmed Vertex/AI Studio API SKU. Treat it as a capability signal for now; check the model availability matrix before designing an agent around a programmatic Omni endpoint.
Full Gemini Omni leaf — capabilities, demos, prompt guide, benchmarks ↗
Honest cross-family positioning. Gemini's strengths sit on context length, multimodal range, search grounding, and voice; its weaknesses are around instruction-following consistency (relative to Claude) and ecosystem reach (relative to OpenAI). For specific use cases, Gemini is the only credible answer in 2026.
| Family | Strengths | Watch out for |
|---|---|---|
| Gemini (Google) | 1M+ context (the largest), native multimodal across all media, native Google Search grounding, voice via Live API, JHB Vertex region for SA residency | Less consistent on instruction-following than Claude; smaller community / ecosystem than OpenAI; Live API still maturing on TS/JS support |
| Claude (Anthropic) | Best instruction-following, adaptive thinking, MCP-native, top tier (Fable 5) leads on long-horizon agentic work, now 1M context, Bedrock af-south-1 for SA residency | Closed; weaker multimodal range than Gemini; no native voice/video |
| GPT (OpenAI) | Largest API ecosystem, best function-calling reliability, broadest multimodal (vision/audio/image-gen), built-in reasoning, GPT-5.5 now ~1M context | "Confidently wrong" failure mode more than Claude; no SA-resident region as clean as Vertex JHB |
| Llama (Meta) | Open weights, runs locally, largest community fine-tune ecosystem | Frontier gap; smaller context than Gemini Pro; Meta licence has commercial restrictions |
| Gemma (Google) | Open weights, multimodal, frontier-lab safety tuning, runs locally via Ollama | Smaller fine-tune ecosystem than Llama; not as code-strong as Qwen-coder |
Three use-case shapes where the other frontier families simply can't compete: (1) genuine 1M+ context reasoning — codebase analysis, multi-document synthesis, video understanding; (2) bidirectional voice agents with vision — the Live API is the cleanest implementation of streaming multimodal interaction available; (3) agents that need fresh information without external RAG — Google Search grounding gives you up-to-date answers without standing up your own search infrastructure. For any of these three shapes, Gemini wins by default in 2026.
Gemini stays the cheapest frontier family at the Flash tiers, and uniquely carries a context-length pricing step: on Gemini 3 Pro, input pricing roughly doubles once a prompt crosses 200k tokens. Always check ai.google.dev/pricing for current numbers — rates below are mid-2026.
List rates, per million tokens (input / output):
$2 / $12 for prompts up to 200k tokens; input roughly doubles above that. The flagship reasoning SKU — and notably cheaper than Claude Opus or GPT-5.5 at the top.$1.50 / $9. Frontier-class agentic/coding quality at Flash pricing; the right production default.$0.25 / $1.50. The most cost-effective Gemini 3 tier for routing and extraction at scale. (Legacy 2.5 Flash-Lite goes lower still at $0.10 / $0.40.)Same logic as the Claude and GPT leaves: don't run everything on Pro. Build a Flash-Lite-driven router that classifies incoming requests, dispatches the routine 60-80% to 3.5 Flash, and escalates the genuinely hard 5-15% to 3.1 Pro. Combined with Google's context caching for any repeated-context workload (common with 1M-token RAG), this pattern cuts Gemini costs 60-80% versus running everything on Pro. Across closed-frontier models in 2026, Gemini Flash plus context caching is often the most cost-effective path for long-context-heavy agents.
africa-south1africa-south1Vertex AI's Johannesburg region (africa-south1) hosts the Gemini Flash tiers and (in most cases) Pro for SA-resident inference, with full POPIA compliance, IAM controls, and Cloud Logging audit trails. For SA banks, insurers, and telcos already on GCP — or evaluating it — this is the structurally cleanest path among closed frontier models. The honest constraint: not every Gemini variant lands in africa-south1 at launch. The newest flagship and preview models (e.g. the latest 3.x Pro) sometimes lag the US regions by weeks or months. Plan for either a US-East fallback or accept the lag if residency is non-negotiable.
For SA studios without enterprise residency requirements, Google AI Studio is the simplest path. Free tier covers prototypes; usage-based billing scales to production. The Studio UI is genuinely good for prompt iteration — better than OpenAI Platform for context-heavy workflows. Pragmatic SA studio path: prototype in AI Studio direct, ship pilots from there, only move to Vertex AI if a client requires it.
The Live API is the most-credible answer for "build a voice agent in SA" in 2026. Bidirectional streaming voice + video, native multilingual (English + Afrikaans + isiZulu work meaningfully well), low-latency from africa-south1. For SA studios building voice agents for telcos, banks, or government, Gemini Live + Vertex JHB is structurally easier than the OpenAI Realtime API or Anthropic's separate audio endpoints — both of which lack a regional SA hosting story.
langchain-google-genai.africa-south1 is the SA-residency-clean path for Gemini workloads. The full GCP integration (IAM, audit logging, Cloud Run, Memory Bank) lives there.