ElevenLabs Development History: From Voice AI Beta to Multilingual v2, Turbo, and Beyond
Voice recording and synthesis workflow central to ElevenLabs text-to-speech and voice cloning
ElevenLabs was founded in 2022 with a mission to make content universally accessible in any language and voice. After opening a public beta in early 2023, it quickly became known for natural, expressive AI speech that avoided the robotic tone of earlier TTS. The company then shipped Eleven Multilingual v2, Turbo models for low-latency streaming, voice cloning, sound effects, speech-to-text, and later music and conversational agents. This article traces ElevenLabs' development from beta to a full-stack voice and audio platform.
Timeline Overview
| Date | Milestone | Details |
|---|---|---|
| 2022 | Founding | ElevenLabs co-founded by Piotr Dąbkowski and Mati Staniszewski; focus on deep learning speech synthesis |
| Jan 2023 | Public beta | Public beta launched; attention for natural inflection, emotion, and expressiveness; $2M pre-seed |
| Jun 2023 | Series A | $19M Series A; platform growth and model improvements |
| 2023–2024 | Out of beta / Multilingual v2 | Official exit from beta; release of Eleven Multilingual v2 as foundational model for ~29 languages with consistent voice and accents |
| Jan 2024 | Series B | $80M Series B; scaling infrastructure and product breadth |
| 2024–2025 | Turbo v2 / Turbo v2.5 | Low-latency Turbo v2; Turbo v2.5 with ~250–300ms latency, 32 languages, Vietnamese/Hungarian/Norwegian; ~3× faster in places |
| 2025 | Agents, Music, Scribe | Eleven v3 API; Agents platform (widgets, Twilio, knowledge base); Music Generation API; Scribe (speech-to-text); Global TTS preview |
| Later | Funding and scale | Series D $500M at ~$11B valuation; reported $330M+ ARR; expansion into ElevenAgents, ElevenCreative, ElevenAPI |
Core Models and Products
Text-to-Speech
- Eleven Multilingual v2: Foundational model for ~29 languages; automatic language detection; consistent voice and accent across languages; used for high-quality narration and dubbing
- Eleven Turbo v2.5: Low-latency TTS in 32 languages (~250–300ms); 3× faster than earlier Turbo; added Vietnamese, Hungarian, Norwegian
- Eleven Flash v2.5: Ultra-low latency (~75ms), 32 languages, lower cost per character for real-time use cases
- Eleven v3: Newer API model with improved quality and Text-to-Voice Design (custom voices from text descriptions)
Content creation and audiobook-style workflows powered by ElevenLabs voice AI
Voice Cloning and Design
ElevenLabs offers instant and professional voice cloning so users can create digital voices that speak in nearly 30 languages. Voice design and a library of 3,000+ community voices support both custom and pre-made options for ads, audiobooks, and video.
Beyond TTS
- Speech-to-text (Scribe): Transcription with diarization and background noise reduction
- Sound effects: AI-generated sound effects for video and games
- Music generation: AI music composition and streaming API for paid users
- ElevenAgents: Conversational AI with customizable widgets, Twilio outbound calling, knowledge bases
Ecosystem and Access
ElevenLabs is available via the web app, API, and integrations (e.g. Twilio, various no-code tools). Developers use the API for TTS, voice cloning, and—with paid plans—music and agents. Platforms like FuseAI Tools bundle ElevenLabs models for text-to-speech, speech-to-text, sound effects, and audio isolation so users can try them without managing API keys.
Summary
ElevenLabs grew from a 2022 founding and 2023 beta into one of the leading voice AI companies. Eleven Multilingual v2 and the Turbo line established high-quality, low-latency TTS in dozens of languages; voice cloning and sound effects extended the stack; Scribe, Music, and Agents turned it into a broad audio and conversation platform. With significant funding and reported nine-figure ARR, ElevenLabs continues to push voice and audio AI for global, multilingual content.
Key Takeaways
- Founded 2022; public beta January 2023; exited beta with Eleven Multilingual v2 (≈29 languages).
- Turbo v2/v2.5 deliver low-latency TTS (down to ~75ms with Flash) in 32 languages.
- Voice cloning, sound effects, Scribe (STT), Music, and Agents expand beyond core TTS.
- Backed by a16z, Sequoia, and others; Series D $500M at ~$11B valuation; $330M+ ARR reported.
Try ElevenLabs on FuseAI Tools for text-to-speech, speech-to-text, sound effects, and audio isolation in one place.