Googlevoice

Gemini 3.1 Flash TTS Preview

Gemini 3.1 Flash TTS Preview is Google's next-generation text-to-speech model, delivering a major upgrade over Gemini 2.5 Flash TTS. It converts text into natural audio across 70+ languages, with significantly expanded language coverage and improved quality. The model introduces 200+ inline audio control tags (e.g., [whispers], [laughs], [excited]) for fine-grained control over emotion, tone, and pacing, along with support for two speakers with independent voice and style settings. It outputs 24 kHz / 16-bit PCM audio, includes SynthID watermarking, and supports a 32K token context window. Designed for expressive and controllable voice generation, it is well suited for dialogue systems, storytelling, character-driven content, and advanced audio production workflows.

Get API Key Compare

Pricing

Input$27.50 / 1M

Output$0 / 1M

Cache Write$0 / 1M

Cache Read$0 / 1M

Web Search$0 / 1M

Quick Start

Use the Apertis AI SDK, the OpenAI SDK, or make direct HTTP requests to our API.

Endpoint:

python

from openai import OpenAI client = OpenAI(    api_key="YOUR_API_KEY",    base_url="https://api.apertis.ai/v1") response = client.chat.completions.create(    model="gemini-3.1-flash-tts-preview",    messages=[        {"role": "user", "content": "Hello!"}    ],    max_tokens=1024,    temperature=0.7) print(response.choices[0].message.content) # Optional: Enable context compression to reduce token usage# response = client.chat.completions.create(#     model="gemini-3.1-flash-tts-preview",#     messages=[{"role": "user", "content": "Hello!"}],#     extra_body={"compression": {"enabled": True, "model": "gpt-4.1-mini"}}# )

Supported Parameters

Common parameters: modelinputvoiceresponse_formatspeed

Extended parameters: instructionsstream_format

View full API documentation ->

Cursor IDE Model IDs

Use these namespaced identifiers in Cursor IDE to avoid conflicts with built-in models.

gemini-3.1-flash-tts-preview

Compare with Other Models

See how this model compares to others from the same provider.

Veo 3.1 (4K)

Veo 3.1 is a state-of-the-art generative AI video model developed by Google DeepMind (part of the broader Gemini/Flow ecosystem). It builds on the earlier Veo models to make AI-generated video creation more realistic, expressive, and controllable.

Veo 3.1 (Fast)

Gemini 2.5 Flash Preview 05-20

Gemini 2.5 Flash Preview (May 2025) is Google's high-performance general model built for advanced reasoning, coding, math, and science. It includes built-in “thinking” features to deliver more accurate, context-aware answers.

Gemini Embedding 2

Gemini Embedding 2 is Google's advanced text embedding model designed for high-accuracy semantic representation across large-scale retrieval and understanding tasks. It converts text into dense vector embeddings optimized for semantic search, retrieval-augmented generation (RAG), clustering, classification, and recommendation systems. Built for production use, it offers strong multilingual support, improved semantic similarity accuracy, and efficient embedding generation, making it well suited for large knowledge indexing pipelines and enterprise-scale retrieval applications.