Whisper Large V3 is OpenAI's advanced open-source automatic speech recognition (ASR) model, supporting both audio transcription and translation across 99+ languages. It accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg, and delivers strong performance in noisy, real-world conditions. With 1.55B parameters and a low 10.3% word error rate, it provides accurate, multilingual transcription with support for word- and segment-level timestamps, making it well suited for high-quality, noise-robust speech processing applications.
Use the Apertis AI SDK, the OpenAI SDK, or make direct HTTP requests to our API.
from openai import OpenAI client = OpenAI( api_key="YOUR_API_KEY", base_url="https://api.apertis.ai/v1") response = client.chat.completions.create( model="whisper-large-v3", messages=[ {"role": "user", "content": "Hello!"} ], max_tokens=1024, temperature=0.7) print(response.choices[0].message.content) # Optional: Enable context compression to reduce token usage# response = client.chat.completions.create(# model="whisper-large-v3",# messages=[{"role": "user", "content": "Hello!"}],# extra_body={"compression": {"enabled": True, "model": "gpt-4.1-mini"}}# )Common parameters: modelfilelanguagepromptresponse_format
Extended parameters: temperaturetimestamp_granularities
Use these namespaced identifiers in Cursor IDE to avoid conflicts with built-in models.
See how this model compares to others from the same provider.
GPT-5 Chat is built for advanced, natural, and context-aware multimodal conversations, tailored for enterprise-grade applications.
gpt-4o-audio-preview adds support for audio inputs, allowing the model to understand nuances in audio recordings and enrich responses. It currently does not generate audio outputs, and audio input is billed per million audio tokens.
GPT-5 Codex (High) is a coding-focused version of GPT-5 built for both interactive development and long autonomous engineering tasks. It can create projects, add features, debug, refactor, and review code, producing cleaner and more controllable outputs than GPT-5. It integrates with developer tools (CLI, IDEs, GitHub, cloud), supports adjustable reasoning effort, handles multimodal inputs, and uses tools for search and environment setup — making it purpose-built for agentic coding workflows.
GPT-5.5 Pro is OpenAI's high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It supports text and image inputs and features a 1M+ token context window (≈922K input, 128K output) for handling large-scale, long-context tasks. Designed for long-horizon problem solving, agentic coding, and precise multi-step execution, GPT-5.5 Pro delivers strong reliability and performance across advanced engineering, research, and complex workflow scenarios.