Multilingual v2: 29 languages and regional variants (e.g. US/UK English, Mandarin/Taiwan); domain terms and cultural nuance. Natural rhythm, emotion (joy, sadness, excitement, seriousness), age (child to elderly), and style (news, audiobook, business). For film dubbing, audiobooks, education, and corporate training.
Turbo 2.5: Sub-400ms end-to-end; high concurrency; real-time pace, tone, and emotion; edge-friendly. Use for virtual assistants, live translation, game NPCs, and live-stream interaction.
Speech to Text: 100+ languages, dialect and accent support, domain terms (medical, legal, tech), noise robustness. Auto punctuation and paragraphs, speaker diarization, word-level timestamps, SRT/VTT/TXT. For subtitles, meeting notes, content indexing, and accessibility.
Sound Effect v2: Natural-language descriptions; physics-based and abstract sounds; mood and atmosphere; layered effects. Environment, object, abstract (sci-fi, fantasy), and transition/UI sounds. Control pitch, duration, intensity, space; envelopes, filters, multi-layer mix; multiple output formats.
AI Audio Isolation: Clean vocal and instrument separation (drums, bass, guitar, keys); background and multi-source separation. High quality, phase preservation, noise suppression; real-time capable. For music production, karaoke, post-production, and archival restoration.