OpenAIvoice

Whisper Large V3

Whisper Large V3 is OpenAI's advanced open-source automatic speech recognition (ASR) model, supporting both audio transcription and translation across 99+ languages. It accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg, and delivers strong performance in noisy, real-world conditions. With 1.55B parameters and a low 10.3% word error rate, it provides accurate, multilingual transcription with support for word- and segment-level timestamps, making it well suited for high-quality, noise-robust speech processing applications.

Get API Key Compare

Pricing

Input$9.25 / 1M

Output$0 / 1M

Cache Write$0 / 1M

Cache Read$0 / 1M

Web Search$0 / 1M

Quick Start

Use the Apertis AI SDK, the OpenAI SDK, or make direct HTTP requests to our API.

Endpoint:

python

from openai import OpenAI client = OpenAI(    api_key="YOUR_API_KEY",    base_url="https://api.apertis.ai/v1") response = client.chat.completions.create(    model="whisper-large-v3",    messages=[        {"role": "user", "content": "Hello!"}    ],    max_tokens=1024,    temperature=0.7) print(response.choices[0].message.content) # Optional: Enable context compression to reduce token usage# response = client.chat.completions.create(#     model="whisper-large-v3",#     messages=[{"role": "user", "content": "Hello!"}],#     extra_body={"compression": {"enabled": True, "model": "gpt-4.1-mini"}}# )

Supported Parameters

Common parameters: modelfilelanguagepromptresponse_format

Extended parameters: temperaturetimestamp_granularities

View full API documentation ->

Cursor IDE Model IDs

Use these namespaced identifiers in Cursor IDE to avoid conflicts with built-in models.

whisper-large-v3

Compare with Other Models

See how this model compares to others from the same provider.

GPT-5 Chat

GPT-5 Chat is built for advanced, natural, and context-aware multimodal conversations, tailored for enterprise-grade applications.

GPT-4o Audio Preview (2025-06-03)

gpt-4o-audio-preview adds support for audio inputs, allowing the model to understand nuances in audio recordings and enrich responses. It currently does not generate audio outputs, and audio input is billed per million audio tokens.

GPT-5 Codex High

GPT-5 Codex (High) is a coding-focused version of GPT-5 built for both interactive development and long autonomous engineering tasks. It can create projects, add features, debug, refactor, and review code, producing cleaner and more controllable outputs than GPT-5. It integrates with developer tools (CLI, IDEs, GitHub, cloud), supports adjustable reasoning effort, handles multimodal inputs, and uses tools for search and environment setup — making it purpose-built for agentic coding workflows.

GPT-5.5 Pro

GPT-5.5 Pro is OpenAI's high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It supports text and image inputs and features a 1M+ token context window (≈922K input, 128K output) for handling large-scale, long-context tasks. Designed for long-horizon problem solving, agentic coding, and precise multi-step execution, GPT-5.5 Pro delivers strong reliability and performance across advanced engineering, research, and complex workflow scenarios.