know.2nth.ai Media ElevenLabs
media · voice & speech · vendor

ElevenLabs.

The deepest voice AI platform as of mid-2026. Three product lines: ElevenCreative (text-to-speech across 70+ languages, voice cloning, music, sound effects), ElevenAgents (conversational voice agents, omnichannel), and ElevenAPI for developers. The default pick for creators and developers — not always the right pick. This leaf covers what's in the box, the models, the pricing reality, and when to choose something else.

Voice AI platform Reference explainer Hot · quarterly review

The voice AI “kitchen sink.”

Where most voice vendors specialise — Cartesia on latency, Hume on emotion, Deepgram on transcription — ElevenLabs does the whole arena competently. That breadth is the product.

ElevenLabs built its reputation on text-to-speech that crossed the uncanny-valley line earlier than the field, then expanded outward: voice cloning, dubbing, conversational agents, sound effects, and (more recently) music and image/video creation. By mid-2026 it positions itself as “the most realistic voice AI platform,” with research-backed foundational models and a safety story — moderation, accountability, audio provenance disclosure — built in rather than bolted on.

The enterprise client list is real: Disney, Cisco, Meta, Nvidia, and government entities. But the centre of gravity remains creators and developers — people generating audiobooks, podcast voices, game dialogue, video narration, and increasingly the voice layer of conversational products.

The honest framing: ElevenLabs is the default, not the optimum. If you need the absolute lowest latency, Cartesia beats it. If you need emotion as a first-class signal, Hume beats it. If you need voice inside your LLM reasoning loop, OpenAI's gpt-realtime beats it. ElevenLabs wins when you want one platform that does everything well and a developer experience that does not fight you.

Three product lines.

ElevenLabs organises around three surfaces. Most teams touch one or two; the API underpins all of them.

ElevenCreative

The all-in-one content creation platform. Ultra-realistic text-to-speech across 70+ languages, voice cloning, studio-grade music generation from natural-language prompts, sound effects, and audio design. Increasingly also image and video creation/editing.

ElevenAgents

Conversational AI — natural, human-sounding voice agents in 70+ languages, omnichannel across phone, chat, email, and WhatsApp. Ships analytics, testing, guardrails, and workflow management for agent deployments.

ElevenAPI

Developer access to the core models — TTS, STT, voice cloning, dubbing — behind a clean API. The layer that puts ElevenLabs voices inside other products. Separate API pricing tiers from the consumer plans.

Flash, Multilingual, v3, Scribe, Music.

ElevenLabs ships several models tuned for different points on the latency / expressiveness trade-off. Pick by use case — real-time agent vs long-form narration vs maximum emotion.

ModelCategoryBest for
Eleven Flash (v2.5)TTSReal-time agents. ~75ms latency, the fast tier. The pick when conversation pacing matters.
Eleven Turbo (v2.5)TTSBalanced — ~250–300ms, lower cost than the expressive models, good quality. The middle option.
Eleven Multilingual v2TTSLong-form — audiobooks, narration. Consistency and lifelike quality over raw speed.
Eleven v3TTSMaximum expressiveness — the most emotionally nuanced model. Use when delivery matters more than latency.
Scribe v2STTSpeech-to-text. ElevenLabs' transcription model — the speech-in side of the platform.
Eleven MusicMusic genStudio-grade music from natural-language prompts. Trained on licensed data, so commercial use is clean.

The trade-off in one line: Flash for agents, Multilingual v2 for long-form, v3 when the emotion is the point. Most teams end up using two — one fast model for interactive paths, one expressive model for produced content.

Credit-based, per character.

ElevenLabs prices in credits, consumed per character of generated speech. Faster models cost fewer credits per character; expressive long-form models cost more. Plans tier on monthly credit allotment.

The shape of it

Consumer / creator plans run from a free tier (a small monthly credit allotment, non-commercial) up through Starter, Creator, Pro, Scale, and Business — each step buying a larger monthly credit pool and unlocking features (commercial licensing, professional voice cloning, higher concurrency).

API plans are tiered separately from the consumer plans, priced for production volume.

Credit cost varies by model — the fast Flash / Turbo models consume roughly half the credits per character of the expressive long-form models. Budgeting means estimating characters-per-month and the model mix, not just picking a plan tier.

Exact figures move — check elevenlabs.io/pricing against the date on this leaf. The structure (credit-based, per-character, model-dependent rates, separate API tiers) is stable; the numbers are not.

The lock-in to watch

Credit systems encourage commitment — you buy a monthly pool and there is mild pressure to use it. That is fine if voice is core to the product. If voice is a secondary feature with spiky usage, a per-character or per-minute pay-as-you-go vendor (or an open model like Kokoro self-hosted) may pencil better. Model the real usage curve before committing to a large tier.

When to use ElevenLabs — and when to skip.

Breadth is the strength and the tell. ElevenLabs wins on “does everything well”; it loses to specialists on any single axis.

Use ElevenLabs if…

It fits

  • You want one platform for TTS, cloning, dubbing, agents, and music
  • Voice quality and naturalness are the priority
  • You're a creator or developer — audiobooks, video narration, game dialogue, product voice
  • You need voice cloning with commercial-licensing clarity and consent enforcement
  • You want a developer experience that doesn't fight you
  • 70+ language coverage matters

What lands in SA.

SA English. ElevenLabs handles South African English, though — like every major vendor — it has not tuned a model specifically for the SA-English accent to the bar it hits for US or UK English. Accent prompting and careful voice selection get most of the way there; budget time to audition voices.

Indigenous languages. ElevenLabs' 70+ language coverage does not meaningfully include isiZulu or isiXhosa at production quality. For those, the landscape leaf's guidance holds — expect Qfrency or a custom solution, not ElevenLabs. Don't promise a client isiZulu TTS on the strength of the “70+ languages” headline.

Pricing in ZAR. Credit plans are USD-denominated; at mid-2026 exchange rates, the Creator-tier plans land in the low-to-mid hundreds of rand per month, the Pro and Scale tiers materially higher. The free tier is genuinely useful for prototyping but is non-commercial — do not ship client work on it.

Consent and POPIA. Voice cloning of real people — staff, clients, public figures — needs documented consent. ElevenLabs enforces consent on its professional voice cloning, which helps, but the legal responsibility sits with whoever commissions the clone. Treat a cloned voice as personal information under POPIA.

Where this leaf links into the tree.

Primary sources.

ElevenLabs ships fast — verify model names and prices against the date on this leaf.