15 AI Models — Browse, Compare & Use

Explore pricing, context windows, and capabilities across every major AI provider, all accessible through a single API.

15 AI Models — Browse, Compare & Use

Explore pricing, context windows, and capabilities across every major AI provider, all accessible through a single API.

Sort
Compare
15 models
ProviderModelPriceContext

Gemini 3.1 Flash TTS Preview is Google's next-generation text-to-speech model, delivering a major upgrade over Gemini 2.5 Flash TTS. It converts text into natural audio across 70+ languages, with significantly expanded language coverage and improved quality. The model introduces 200+ inline audio control tags (e.g., [whispers], [laughs], [excited]) for fine-grained control over emotion, tone, and pacing, along with support for two speakers with independent voice and style settings. It outputs 24 kHz / 16-bit PCM audio, includes SynthID watermarking, and supports a 32K token context window. Designed for expressive and controllable voice generation, it is well suited for dialogue systems, storytelling, character-driven content, and advanced audio production workflows.

IN:$0.0275OUT:$0/1K TksContext:8K

GPT-4o Mini TTS is OpenAI's cost-efficient text-to-speech model, designed to convert text into natural-sounding audio output. It supports a variety of voices and tones, enabling flexible and expressive speech generation. Optimized for scalability and low cost, it is well suited for real-time voice applications, content narration, and high-volume audio generation workflows.

IN:$0.0003OUT:$0/1K TksContext:4K

GPT-4o Mini Transcribe is a smaller, cost-efficient speech-to-text model built on GPT-4o Mini's audio capabilities. It is designed for high-volume transcription workloads, delivering reliable performance with lower cost and latency. Priced per token (input and output), it provides transparent, fine-grained billing, making it well suited for scalable transcription pipelines, real-time applications, and cost-sensitive deployments.

IN:$0.000625OUT:$0.000625/1K TksContext:128K

GPT-4o Transcribe is OpenAI's high-quality speech-to-text model built on GPT-4o's audio capabilities. It delivers accurate transcription with strong language understanding, making it suitable for a wide range of audio processing tasks. Priced per token (input and output), it offers transparent, fine-grained billing, making it well suited for workflows that require scalable transcription, integration with LLM pipelines, and cost-aware processing.

IN:$0.00125OUT:$0/1K TksContext:128K

Whisper Large V3 Turbo is an optimized version of OpenAI's Whisper Large V3 speech recognition model, designed for high-speed and cost-efficient transcription. It supports 99+ languages and accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg. With a ~12% word error rate and real-time speed factors up to 216×, it delivers fast, scalable performance for latency-sensitive and high-throughput transcription workloads, making it ideal for real-time and large-scale speech processing applications.

IN:$0.00333OUT:$0/1K Tks-

Whisper Large V3 is OpenAI's advanced open-source automatic speech recognition (ASR) model, supporting both audio transcription and translation across 99+ languages. It accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg, and delivers strong performance in noisy, real-world conditions. With 1.55B parameters and a low 10.3% word error rate, it provides accurate, multilingual transcription with support for word- and segment-level timestamps, making it well suited for high-quality, noise-robust speech processing applications.

IN:$0.00925OUT:$0/1K Tks-

Whisper (whisper-1) is OpenAI's open-source automatic speech recognition (ASR) model, designed for audio transcription and translation. It supports 50+ languages and processes audio files up to 25 MB, accepting formats such as mp3, mp4, wav, and webm. Optimized for reliable speech-to-text conversion across diverse audio inputs, Whisper is priced per minute of audio, billed to the nearest second, making it well suited for transcription, localization, and voice-driven applications.

IN:$0.075OUT:$0.075/1K Tks-

Voxtral Small is an upgraded version of Mistral Small 3 that adds advanced audio understanding while preserving strong text performance. It handles speech transcription, translation, and audio comprehension, with audio input billed per million seconds.

IN:$0.000125OUT:$0.000375/1K TksContext:32K

gpt-4o-mini-audio-preview is a smaller preview version of OpenAI's audio-capable GPT-4o mini model that supports both audio and text inputs and outputs via the API. It enables the model to understand nuances in audio recordings and incorporate them into responses, making it useful for audio-enabled applications like transcription, speech understanding, and voice-driven interactions.

IN:$0.0000525OUT:$0.00021/1K TksContext:128K

gpt-4o-mini-audio-preview is a smaller preview version of OpenAI’s audio-capable GPT-4o mini model that supports both audio and text inputs and outputs via the API. It enables the model to understand nuances in audio recordings and incorporate them into responses, making it useful for audio-enabled applications like transcription, speech understanding, and voice-driven interactions.

IN:$0.000875OUT:$0.00175/1K TksContext:128K

gpt-4o-audio-preview adds support for audio inputs, allowing the model to understand nuances in audio recordings and enrich responses. It currently does not generate audio outputs, and audio input is billed per million audio tokens.

IN:$0.000875OUT:$0.0035/1K TksContext:128K

gpt-4o-audio-preview adds support for audio inputs, allowing the model to understand nuances in audio recordings and enrich responses. It currently does not generate audio outputs, and audio input is billed per million audio tokens.

IN:$0.00375OUT:$0.015/1K TksContext:128K

gpt-4o-mini-audio-preview is a smaller preview version of OpenAI’s audio-capable GPT-4o mini model that supports both audio and text inputs and outputs via the API. It enables the model to understand nuances in audio recordings and incorporate them into responses, making it useful for audio-enabled applications like transcription, speech understanding, and voice-driven interactions.

IN:$0.0000525OUT:$0.00021/1K TksContext:128K

gpt-4o-audio-preview adds support for audio inputs, allowing the model to understand nuances in audio recordings and enrich responses. It currently does not generate audio outputs, and audio input is billed per million audio tokens.

IN:$0.000875OUT:$0.0035/1K TksContext:128K

gpt-4o-audio-preview adds support for audio inputs, allowing the model to understand nuances in audio recordings and enrich responses. It currently does not generate audio outputs, and audio input is billed per million audio tokens.

IN:$0.000875OUT:$0.0035/1K TksContext:128K