OpenAIvoice

Whisper Large V3 Turbo

Whisper Large V3 Turbo is an optimized version of OpenAI's Whisper Large V3 speech recognition model, designed for high-speed and cost-efficient transcription. It supports 99+ languages and accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg. With a ~12% word error rate and real-time speed factors up to 216×, it delivers fast, scalable performance for latency-sensitive and high-throughput transcription workloads, making it ideal for real-time and large-scale speech processing applications.

Get API Key Compare

Pricing

Input$3.33 / 1M

Output$0 / 1M

Cache Write$0 / 1M

Cache Read$0 / 1M

Web Search$0 / 1M

Quick Start

Use the Apertis AI SDK, the OpenAI SDK, or make direct HTTP requests to our API.

Endpoint:

python

from openai import OpenAI client = OpenAI(    api_key="YOUR_API_KEY",    base_url="https://api.apertis.ai/v1") response = client.chat.completions.create(    model="whisper-large-v3-turbo",    messages=[        {"role": "user", "content": "Hello!"}    ],    max_tokens=1024,    temperature=0.7) print(response.choices[0].message.content) # Optional: Enable context compression to reduce token usage# response = client.chat.completions.create(#     model="whisper-large-v3-turbo",#     messages=[{"role": "user", "content": "Hello!"}],#     extra_body={"compression": {"enabled": True, "model": "gpt-4.1-mini"}}# )

Supported Parameters

Common parameters: modelfilelanguagepromptresponse_format

Extended parameters: temperaturetimestamp_granularities

View full API documentation ->

Cursor IDE Model IDs

Use these namespaced identifiers in Cursor IDE to avoid conflicts with built-in models.

whisper-large-v3-turbo

Compare with Other Models

See how this model compares to others from the same provider.

text-embedding-3-large

text-embedding-3-large is OpenAI's most powerful embedding model, producing numeric representations of text to measure similarity. It works well for both English and non-English content and is widely used for tasks like search, clustering, recommendations, anomaly detection, and classification.

GPT-4o Audio Preview (2024-12-17)

gpt-4o-audio-preview adds support for audio inputs, allowing the model to understand nuances in audio recordings and enrich responses. It currently does not generate audio outputs, and audio input is billed per million audio tokens.

GPT-5.1 Codex

GPT-5.1 Codex is a coding-focused version of GPT-5.1 designed for both interactive development and long autonomous engineering tasks. It can build projects, add features, debug, refactor, and review code with higher steerability and cleaner outputs than GPT-5.1. It integrates with developer tools (CLI, IDEs, GitHub, cloud), supports adjustable reasoning effort, handles images/screenshots for UI work, and uses tools for search and environment setup — making it purpose-built for agentic coding workflows.

GPT-5 Codex Low

GPT-5 Codex (Low) is a coding-focused version of GPT-5 built for both interactive development and long autonomous engineering tasks. It can create projects, add features, debug, refactor, and review code, producing cleaner and more controllable outputs than GPT-5. It integrates with developer tools (CLI, IDEs, GitHub, cloud), supports adjustable reasoning effort, handles multimodal inputs, and uses tools for search and environment setup — making it purpose-built for agentic coding workflows.