OpenAIvoice

Whisper 1

Whisper (whisper-1) is OpenAI's open-source automatic speech recognition (ASR) model, designed for audio transcription and translation. It supports 50+ languages and processes audio files up to 25 MB, accepting formats such as mp3, mp4, wav, and webm. Optimized for reliable speech-to-text conversion across diverse audio inputs, Whisper is priced per minute of audio, billed to the nearest second, making it well suited for transcription, localization, and voice-driven applications.

Get API Key Compare

Pricing

Input$75.00 / 1M

Output$75.00 / 1M

Cache Write$0 / 1M

Cache Read$0 / 1M

Web Search$0 / 1M

Quick Start

Use the Apertis AI SDK, the OpenAI SDK, or make direct HTTP requests to our API.

Endpoint:

python

from openai import OpenAI client = OpenAI(    api_key="YOUR_API_KEY",    base_url="https://api.apertis.ai/v1") response = client.chat.completions.create(    model="whisper-1",    messages=[        {"role": "user", "content": "Hello!"}    ],    max_tokens=1024,    temperature=0.7) print(response.choices[0].message.content) # Optional: Enable context compression to reduce token usage# response = client.chat.completions.create(#     model="whisper-1",#     messages=[{"role": "user", "content": "Hello!"}],#     extra_body={"compression": {"enabled": True, "model": "gpt-4.1-mini"}}# )

Supported Parameters

Common parameters: modelfilelanguagepromptresponse_format

Extended parameters: temperaturetimestamp_granularities

View full API documentation ->

Cursor IDE Model IDs

Use these namespaced identifiers in Cursor IDE to avoid conflicts with built-in models.

whisper-1

Compare with Other Models

See how this model compares to others from the same provider.

GPT-5.1 Codex (Mini)

GPT-5.1 Codex is a coding-focused version of GPT-5.1 designed for both interactive development and long autonomous engineering tasks. It can build projects, add features, debug, refactor, and review code with higher steerability and cleaner outputs than GPT-5.1. It integrates with developer tools (CLI, IDEs, GitHub, cloud), supports adjustable reasoning effort, handles images/screenshots for UI work, and uses tools for search and environment setup — making it purpose-built for agentic coding workflows.

o4 Mini Deep Research

o4-mini-deep-research is a faster, lower-cost version of OpenAI's deep-research model, designed for complex, multi-step investigations. It automatically relies on web_search for information gathering, which always adds extra usage cost.

Whisper Large V3 Turbo

Whisper Large V3 Turbo is an optimized version of OpenAI's Whisper Large V3 speech recognition model, designed for high-speed and cost-efficient transcription. It supports 99+ languages and accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg. With a ~12% word error rate and real-time speed factors up to 216×, it delivers fast, scalable performance for latency-sensitive and high-throughput transcription workloads, making it ideal for real-time and large-scale speech processing applications.

GPT-5.2 Codex

GPT-5.2-Codex is OpenAI's most advanced agentic coding model yet, built for complex, real-world software engineering and defensive cybersecurity. It’s a version of GPT-5.2 further optimized for Codex, with improvements in long-horizon coding tasks (like refactors and migrations), better handling of long contexts, stronger performance on large code changes, enhanced Windows support, and significantly stronger cybersecurity capabilities.