OpenAIvoice

GPT-4o Transcribe

GPT-4o Transcribe is OpenAI's high-quality speech-to-text model built on GPT-4o's audio capabilities. It delivers accurate transcription with strong language understanding, making it suitable for a wide range of audio processing tasks. Priced per token (input and output), it offers transparent, fine-grained billing, making it well suited for workflows that require scalable transcription, integration with LLM pipelines, and cost-aware processing.

Get API Key Compare

Pricing

Input$1.25 / 1M

Output$0 / 1M

Cache Write$0 / 1M

Cache Read$0 / 1M

Web Search$0 / 1M

Quick Start

Use the Apertis AI SDK, the OpenAI SDK, or make direct HTTP requests to our API.

Endpoint:

python

from openai import OpenAI client = OpenAI(    api_key="YOUR_API_KEY",    base_url="https://api.apertis.ai/v1") response = client.chat.completions.create(    model="gpt-4o-transcribe",    messages=[        {"role": "user", "content": "Hello!"}    ],    max_tokens=1024,    temperature=0.7) print(response.choices[0].message.content) # Optional: Enable context compression to reduce token usage# response = client.chat.completions.create(#     model="gpt-4o-transcribe",#     messages=[{"role": "user", "content": "Hello!"}],#     extra_body={"compression": {"enabled": True, "model": "gpt-4.1-mini"}}# )

Supported Parameters

Common parameters: modelfilelanguagepromptresponse_format

Extended parameters: temperaturetimestamp_granularities

View full API documentation ->

Cursor IDE Model IDs

Use these namespaced identifiers in Cursor IDE to avoid conflicts with built-in models.

gpt-4o-transcribe

Compare with Other Models

See how this model compares to others from the same provider.

GPT-Image-1

gpt-image-1 is OpenAI's image generation model designed to create, edit, and enhance images from natural language prompts. It supports tasks like producing detailed visuals, modifying existing images, generating variations, and upscaling — making it useful for design, illustration, marketing assets, and creative exploration.

GPT-5.3-Codex

GPT-Codex-5.3 is OpenAI's most advanced agentic coding model, designed for software engineering workflows that extend beyond single prompts into long-running, tool-driven execution. It combines the frontier coding performance of earlier Codex models with stronger reasoning and professional knowledge capabilities, enabling reliable handling of complex refactors, multi-step debugging, research-driven development, and autonomous task execution. Optimized for developer productivity, GPT-Codex-5.3 supports interactive collaboration during execution, allowing users to steer tasks in real time without losing context. With improved agentic reliability, faster inference, and stronger performance on long-horizon engineering tasks, it is well suited for coding agents, IDE and CLI workflows, and end-to-end software development pipelines where persistence, tool use, and execution continuity are critical.

o3 Deep Research

o3-deep-research is OpenAI's advanced research model, built for complex, multi-step investigation and analysis. It automatically performs web searches to gather and synthesize information — but this always incurs additional cost since web_search is used by default.

Sora 2 Landscape

Sora 2 is the higher-quality version of OpenAI's Sora 2 text-to-video and audio generation model, designed to produce more realistic, controllable, and detailed AI-generated videos with synchronized audio and advanced world simulation capabilities. It builds on Sora 2's breakthrough in video realism and physics-aware generation, offering enhanced visual fidelity and extended features for creative and professional use. Early access has been available to ChatGPT Pro subscribers via sora.com, with wider availability expected after the invite/beta rollout.