Gemini 3.1 Flash TTS Preview is Google's next-generation text-to-speech model, delivering a major upgrade over Gemini 2.5 Flash TTS. It converts text into natural audio across 70+ languages, with significantly expanded language coverage and improved quality. The model introduces 200+ inline audio control tags (e.g., [whispers], [laughs], [excited]) for fine-grained control over emotion, tone, and pacing, along with support for two speakers with independent voice and style settings. It outputs 24 kHz / 16-bit PCM audio, includes SynthID watermarking, and supports a 32K token context window. Designed for expressive and controllable voice generation, it is well suited for dialogue systems, storytelling, character-driven content, and advanced audio production workflows.
Use the Apertis AI SDK, the OpenAI SDK, or make direct HTTP requests to our API.
from openai import OpenAI client = OpenAI( api_key="YOUR_API_KEY", base_url="https://api.apertis.ai/v1") response = client.chat.completions.create( model="gemini-3.1-flash-tts-preview", messages=[ {"role": "user", "content": "Hello!"} ], max_tokens=1024, temperature=0.7) print(response.choices[0].message.content) # Optional: Enable context compression to reduce token usage# response = client.chat.completions.create(# model="gemini-3.1-flash-tts-preview",# messages=[{"role": "user", "content": "Hello!"}],# extra_body={"compression": {"enabled": True, "model": "gpt-4.1-mini"}}# )Common parameters: modelinputvoiceresponse_formatspeed
Extended parameters: instructionsstream_format
Use these namespaced identifiers in Cursor IDE to avoid conflicts with built-in models.
See how this model compares to others from the same provider.
Veo 3.1 is a state-of-the-art generative AI video model developed by Google DeepMind (part of the broader Gemini/Flow ecosystem). It builds on the earlier Veo models to make AI-generated video creation more realistic, expressive, and controllable.
Veo 3.1 is a state-of-the-art generative AI video model developed by Google DeepMind (part of the broader Gemini/Flow ecosystem). It builds on the earlier Veo models to make AI-generated video creation more realistic, expressive, and controllable.
Gemini 2.5 Flash Preview (May 2025) is Google's high-performance general model built for advanced reasoning, coding, math, and science. It includes built-in “thinking” features to deliver more accurate, context-aware answers.
Gemini Embedding 2 is Google's advanced text embedding model designed for high-accuracy semantic representation across large-scale retrieval and understanding tasks. It converts text into dense vector embeddings optimized for semantic search, retrieval-augmented generation (RAG), clustering, classification, and recommendation systems. Built for production use, it offers strong multilingual support, improved semantic similarity accuracy, and efficient embedding generation, making it well suited for large knowledge indexing pipelines and enterprise-scale retrieval applications.