Whisper (whisper-1) is OpenAI's open-source automatic speech recognition (ASR) model, designed for audio transcription and translation. It supports 50+ languages and processes audio files up to 25 MB, accepting formats such as mp3, mp4, wav, and webm. Optimized for reliable speech-to-text conversion across diverse audio inputs, Whisper is priced per minute of audio, billed to the nearest second, making it well suited for transcription, localization, and voice-driven applications.
Use the Apertis AI SDK, the OpenAI SDK, or make direct HTTP requests to our API.
from openai import OpenAI client = OpenAI( api_key="YOUR_API_KEY", base_url="https://api.apertis.ai/v1") response = client.chat.completions.create( model="whisper-1", messages=[ {"role": "user", "content": "Hello!"} ], max_tokens=1024, temperature=0.7) print(response.choices[0].message.content) # Optional: Enable context compression to reduce token usage# response = client.chat.completions.create(# model="whisper-1",# messages=[{"role": "user", "content": "Hello!"}],# extra_body={"compression": {"enabled": True, "model": "gpt-4.1-mini"}}# )Common parameters: modelfilelanguagepromptresponse_format
Extended parameters: temperaturetimestamp_granularities
Use these namespaced identifiers in Cursor IDE to avoid conflicts with built-in models.
See how this model compares to others from the same provider.
GPT-5.1 Codex is a coding-focused version of GPT-5.1 designed for both interactive development and long autonomous engineering tasks. It can build projects, add features, debug, refactor, and review code with higher steerability and cleaner outputs than GPT-5.1. It integrates with developer tools (CLI, IDEs, GitHub, cloud), supports adjustable reasoning effort, handles images/screenshots for UI work, and uses tools for search and environment setup — making it purpose-built for agentic coding workflows.
o4-mini-deep-research is a faster, lower-cost version of OpenAI's deep-research model, designed for complex, multi-step investigations. It automatically relies on web_search for information gathering, which always adds extra usage cost.
Whisper Large V3 Turbo is an optimized version of OpenAI's Whisper Large V3 speech recognition model, designed for high-speed and cost-efficient transcription. It supports 99+ languages and accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg. With a ~12% word error rate and real-time speed factors up to 216×, it delivers fast, scalable performance for latency-sensitive and high-throughput transcription workloads, making it ideal for real-time and large-scale speech processing applications.
GPT-5.2-Codex is OpenAI's most advanced agentic coding model yet, built for complex, real-world software engineering and defensive cybersecurity. It’s a version of GPT-5.2 further optimized for Codex, with improvements in long-horizon coding tasks (like refactors and migrations), better handling of long contexts, stronger performance on large code changes, enhanced Windows support, and significantly stronger cybersecurity capabilities.