507 AI Models
Browse pricing, context windows, and capabilities across every major AI provider through one API.
507 AI Models
Browse pricing, context windows, and capabilities across every major AI provider through one API.
| Provider | Model | Price | Context |
|---|---|---|---|
GLM-5.2 is Z.AI's flagship model for long-horizon task execution, designed to handle complex, project-scale workflows with high reliability. Featuring a 1M-token context window, it can maintain and reason over extensive engineering context, enabling consistent execution across large, multi-stage tasks. Optimized for end-to-end software development, GLM-5.2 follows engineering standards reliably and can manage the full workflow from requirements analysis and implementation to testing and multi-platform deployment, making it well suited for advanced coding agents and large-scale autonomous engineering projects. | IN:$0.0014OUT:$0.0044/1K tokens | Context:1M | |
Kimi K2.7 Code is a coding-focused model in Moonshot AI's Kimi K2 family, designed for long-horizon software engineering and agentic development workflows. Built on a native multimodal Mixture-of-Experts (MoE) architecture, it supports text, image, and video inputs and operates exclusively in thinking mode, preserving reasoning across multi-turn interactions. With approximately 1T total parameters and 32B activated per token, plus a 256K-token context window, K2.7 Code excels at end-to-end programming tasks, agentic task decomposition, repository-scale reasoning, and extended coding conversations, making it well suited for advanced coding agents and long-context development workflows. | IN:$0.00095OUT:$0.004/1K tokens | Context:262K | |
NVIDIA Nemotron 3 Ultra is an open frontier reasoning and orchestration model featuring a 550B-parameter Mixture-of-Experts (MoE) architecture with 55B active parameters per token. Built on a hybrid Transformer–Mamba design, it supports text input and output with a 1M-token context window, enabling large-scale reasoning and long-horizon task execution. Optimized for agent orchestration, coding agents, deep research, and complex enterprise workflows, the model excels at multi-step reasoning, planning, and sustained execution. With high-throughput inference designed for large-scale agent pipelines, Nemotron 3 Ultra serves as a powerful foundation for advanced agentic AI systems. | Free | Context:1M | |
NVIDIA Nemotron 3 Ultra is an open frontier reasoning and orchestration model featuring a 550B-parameter Mixture-of-Experts (MoE) architecture with 55B active parameters per token. Built on a hybrid Transformer–Mamba design, it supports text input and output with a 1M-token context window, enabling large-scale reasoning and long-horizon task execution. Optimized for agent orchestration, coding agents, deep research, and complex enterprise workflows, the model excels at multi-step reasoning, planning, and sustained execution. With high-throughput inference designed for large-scale agent pipelines, Nemotron 3 Ultra serves as a powerful foundation for advanced agentic AI systems. | IN:$0.0005OUT:$0.0025/1K tokens | Context:1M | |
NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, designed for content moderation, safety classification, and AI policy enforcement. Supporting text and image inputs with text output, it evaluates both user prompts and model responses, providing safe/unsafe classifications, safety category labels, and optional reasoning traces. Fine-tuned from Gemma-3-4B and supporting 12 languages with a 128K-token context window, the model is well suited for prompt moderation, response filtering, content classification, and enterprise safety pipelines. As part of the NVIDIA Nemotron family, it offers a configurable reasoning mode and integrates easily into agentic AI systems requiring robust guardrails and compliance controls. | Free | Context:128K | |
Qwen3.7-Plus is a cost-effective multimodal model in Alibaba's Qwen3.7 series, supporting text and image inputs with text output. It combines the series' strong language capabilities with significantly enhanced vision-language understanding, while retaining full-stack agent-level intelligence for coding, tool use, and productivity workflows. Its standout capability is multimodal interactive agency—the ability to perceive real-world scenes, understand screens and graphical interfaces, generate code from visual references, and perform end-to-end navigation within applications. This makes Qwen3.7-Plus well suited for GUI automation, visual coding, productivity agents, and multimodal task execution. | IN:$0.0004OUT:$0.0016/1K tokens | Context:1M | |
MiniMax-M3 is a multimodal foundation model from MiniMax, supporting text, image, and video inputs with text output and a 1M-token context window. It is designed for long-horizon agentic workflows, coding, and tool-driven task execution, enabling sustained reasoning across complex tasks. Built on MiniMax Sparse Attention (MSA), the model dramatically improves long-context efficiency by replacing full attention with KV-block selection, reducing compute costs at 1M-token contexts while maintaining strong performance. Trained as a native multimodal model and optimized for multi-turn, production-style collaboration, MiniMax-M3 excels at extended, multi-step workflows rather than single-turn interactions. | IN:$0.0003OUT:$0.0012/1K tokens | Context:1M | |
This is the fast version of Opus 4.8 | IN:$0.01OUT:$0.05/1K tokens | Context:1M | |
Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family, designed for highly autonomous agents, long-horizon workflows, and advanced knowledge work. It supports text, image, and file inputs with text output, includes reasoning capabilities, and features a 1M-token context window for maintaining coherence across extended tasks and sessions. The model excels at multi-step reasoning, complex coding, and end-to-end project orchestration, including large codebases, multi-stage debugging, and long-running asynchronous agent pipelines. Beyond software engineering, it is highly effective for document drafting, presentation creation, data analysis, and memory-driven workflows, delivering consistent quality across very long outputs and complex projects. | IN:$0.004OUT:$0.02/1K tokens | Context:1M | |
Qwen3.7-Max is the flagship model in Alibaba's Qwen3.7 series, designed for agent-centric workloads with strong performance in coding, productivity, and long-horizon autonomous execution. It supports text input and output and delivers notable improvements in coding and agentic capabilities over previous Qwen generations. Optimized for real-world workflows, the model also supports explicit prompt caching for efficient reuse of repeated context, making it well suited for scalable development, office automation, and advanced agent systems. | IN:$0.0025OUT:$0.0075/1K tokens | Context:1M | |
Grok Build 0.1 is xAI's fast coding model designed specifically for agentic software engineering workflows. It supports text and image inputs with text output, and is optimized for interactive coding agents, tool use, and multi-step development tasks. Powering the Grok Build CLI, the model features a 256K token context window with effectively no text output limit, making it well suited for long-horizon coding, automation, and continuous development workflows. Currently available in early access. | IN:$0.001OUT:$0.002/1K tokens | Context:256K | |
Gemini 3.5 Flash is Google's high-efficiency multimodal model, delivering near-Pro level performance in coding and reasoning at Flash-tier speed and cost. It supports text, image, video, audio, and PDF inputs, making it well suited for diverse multimodal workflows. Optimized for coding proficiency and parallel agentic execution, the model defaults to medium thinking effort for faster, cost-efficient responses while supporting configurable thinking levels (minimal, low, medium, high) for fine-grained cost–performance control. | IN:$0.0015OUT:$0.009/1K tokens | Context:1M | |
This is the fast version of Opus 4.7 | IN:$0.03OUT:$0.15/1K tokens | Context:1M | |
Gemini 3.1 Flash TTS Preview is Google's next-generation text-to-speech model, delivering a major upgrade over Gemini 2.5 Flash TTS. It converts text into natural audio across 70+ languages, with significantly expanded language coverage and improved quality. The model introduces 200+ inline audio control tags (e.g., [whispers], [laughs], [excited]) for fine-grained control over emotion, tone, and pacing, along with support for two speakers with independent voice and style settings. It outputs 24 kHz / 16-bit PCM audio, includes SynthID watermarking, and supports a 32K token context window. Designed for expressive and controllable voice generation, it is well suited for dialogue systems, storytelling, character-driven content, and advanced audio production workflows. | IN:$0.0275OUT:$0/1K tokens | Context:8K | |
GPT-4o Mini TTS is OpenAI's cost-efficient text-to-speech model, designed to convert text into natural-sounding audio output. It supports a variety of voices and tones, enabling flexible and expressive speech generation. Optimized for scalability and low cost, it is well suited for real-time voice applications, content narration, and high-volume audio generation workflows. | IN:$0.0003OUT:$0/1K tokens | Context:4K | |
GPT-4o Mini Transcribe is a smaller, cost-efficient speech-to-text model built on GPT-4o Mini's audio capabilities. It is designed for high-volume transcription workloads, delivering reliable performance with lower cost and latency. Priced per token (input and output), it provides transparent, fine-grained billing, making it well suited for scalable transcription pipelines, real-time applications, and cost-sensitive deployments. | IN:$0.000625OUT:$0.000625/1K tokens | Context:128K | |
GPT-4o Transcribe is OpenAI's high-quality speech-to-text model built on GPT-4o's audio capabilities. It delivers accurate transcription with strong language understanding, making it suitable for a wide range of audio processing tasks. Priced per token (input and output), it offers transparent, fine-grained billing, making it well suited for workflows that require scalable transcription, integration with LLM pipelines, and cost-aware processing. | IN:$0.00125OUT:$0/1K tokens | Context:128K | |
Whisper Large V3 Turbo is an optimized version of OpenAI's Whisper Large V3 speech recognition model, designed for high-speed and cost-efficient transcription. It supports 99+ languages and accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg. With a ~12% word error rate and real-time speed factors up to 216×, it delivers fast, scalable performance for latency-sensitive and high-throughput transcription workloads, making it ideal for real-time and large-scale speech processing applications. | IN:$0.00333OUT:$0/1K tokens | - | |
Whisper Large V3 is OpenAI's advanced open-source automatic speech recognition (ASR) model, supporting both audio transcription and translation across 99+ languages. It accepts common audio formats including mp3, mp4, wav, webm, flac, and ogg, and delivers strong performance in noisy, real-world conditions. With 1.55B parameters and a low 10.3% word error rate, it provides accurate, multilingual transcription with support for word- and segment-level timestamps, making it well suited for high-quality, noise-robust speech processing applications. | IN:$0.00925OUT:$0/1K tokens | - | |
Whisper (whisper-1) is OpenAI's open-source automatic speech recognition (ASR) model, designed for audio transcription and translation. It supports 50+ languages and processes audio files up to 25 MB, accepting formats such as mp3, mp4, wav, and webm. Optimized for reliable speech-to-text conversion across diverse audio inputs, Whisper is priced per minute of audio, billed to the nearest second, making it well suited for transcription, localization, and voice-driven applications. | IN:$0.075OUT:$0.075/1K tokens | - | |
Mistral Medium 3.5 is a 128B dense instruction-following model from Mistral AI, supporting text and image inputs with text output. It is designed for agentic workflows, coding, and complex multi-step reasoning, with strong reliability in multi-tool orchestration and long-horizon tasks. The model features a 256K token context window, configurable reasoning effort per request, and a custom vision encoder that handles variable image sizes and aspect ratios. With support for self-hosting on as few as four GPUs and availability under open weights, it is well suited for scalable, production-grade deployments. | IN:$0.0015OUT:$0.0075/1K tokens | Context:262K | |
Grok 4.3 is a reasoning-focused model from xAI designed for agentic workflows, instruction following, and high factual accuracy tasks. It supports text and image inputs with text output, with reasoning always active and not configurable by effort level. The model features a 1M-token context window with effectively no output token limit, making it well suited for long-document analysis, deep research, and multi-step agentic workflows. It uses tiered pricing, with higher rates applied to requests exceeding 200K total tokens. | IN:$0.00125OUT:$0.0025/1K tokens | Context:1M | |
NVIDIA Nemotron 3 Nano Omni is an open 30B-A3B multimodal model designed as a perception and context sub-agent for enterprise agent systems. It supports text, image, video, and audio inputs with text output, enabling unified multimodal reasoning within a single inference loop. Built on a hybrid MoE Transformer–Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS), it delivers significantly improved efficiency for video reasoning—achieving ~2× higher throughput and 2.5× lower compute compared to separate pipelines. With up to 300K context length and extended thinking support, it is well suited for scalable, multimodal agent workflows. | Free | Context:256K | |
Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse Mixture-of-Experts (MoE) architecture with approximately 1 trillion parameters. It is optimized for agentic coding, tool use, and long-context reasoning, supporting a 262K token context window. The model includes an integrated thinking mode that preserves reasoning across multi-turn interactions, along with support for structured outputs and function calling. Available exclusively via Alibaba Cloud Model Studio and Qwen Studio APIs, it is designed for high-performance, production-grade agent workflows. | IN:$0.0013OUT:$0.0078/1K tokens | Context:262K | |
Qwen3.6 Flash is a fast and efficient model from Alibaba's Qwen 3.6 series, supporting text, image, and video inputs with a 1M-token context window for high-context multimodal workflows. Optimized for performance and cost efficiency, it features tiered pricing beyond 256K tokens and supports prompt caching with both cache creation and read pricing, making it well suited for large-scale, high-throughput applications. | IN:$0.00025OUT:$0.0015/1K tokens | Context:1M | |
Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba, supporting text, image, and video inputs with text output. It features a 1M-token context window, enabling large-scale reasoning and multimodal workflows within a single interaction. This updated version of Qwen3.5 Plus introduces tiered pricing beyond 256K tokens, making it suitable for high-context applications while maintaining flexibility for cost optimization in long-input scenarios. | IN:$0.0004OUT:$0.0024/1K tokens | Context:1M | |
GPT-5.5 is OpenAI's frontier model for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on challenging tasks. It supports text and image inputs and features a 1M+ token context window (≈922K input, 128K output) for large-scale, high-context workflows. Designed for advanced applications, GPT-5.5 excels in reasoning, coding, and multimodal workflows, enabling efficient execution of complex, multi-step tasks within a single system. | IN:$0.004OUT:$0.024/1K tokens | Context:1.1M | |
GPT-5.5 Pro is OpenAI's high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It supports text and image inputs and features a 1M+ token context window (≈922K input, 128K output) for handling large-scale, long-context tasks. Designed for long-horizon problem solving, agentic coding, and precise multi-step execution, GPT-5.5 Pro delivers strong reliability and performance across advanced engineering, research, and complex workflow scenarios. | IN:$0.03OUT:$0.18/1K tokens | Context:1.1M | |
DeepSeek V4 Pro is a large-scale Mixture-of-Experts (MoE) model with 1.6T total parameters and 49B activated per token, supporting a 1M-token context window for advanced reasoning and long-horizon workflows. It delivers strong performance across knowledge, mathematics, and software engineering tasks, making it suitable for complex, real-world applications. Built on a hybrid attention architecture for efficient long-context processing, the model supports configurable reasoning modes to balance speed and depth. It is well suited for full codebase analysis, multi-step automation, and large-scale information synthesis, where both capability and efficiency are essential.https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro | IN:$0.000435OUT:$0.00087/1K tokens | Context:1.0M | |
DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts (MoE) model with 284B total parameters and 13B activated per token, designed for fast inference and high-throughput workloads. It supports a 1M-token context window, enabling large-scale reasoning and long-context processing. Built with hybrid attention for efficiency, the model maintains strong performance in reasoning and coding while offering configurable reasoning modes. It is well suited for coding assistants, chat systems, and agent workflows where responsiveness and cost efficiency are critical. | IN:$0.00014OUT:$0.00028/1K tokens | Context:1.0M | |
Qwen3.6-35B-A3B is an open-weight Mixture-of-Experts (MoE) multimodal model designed for agentic coding and long-horizon workflows. It features ~35–36B total parameters with ~3B activated per token, enabling strong performance with high inference efficiency. The model supports text and image inputs with a ~260K token context window, and is optimized for repository-level reasoning, multi-step development, and tool-driven workflows. With strong benchmark performance and improved coherence across extended tasks, Qwen3.6-35B-A3B is well suited for developer tools, coding agents, and real-world engineering applications that require both reasoning depth and efficiency. | IN:$0.0054OUT:$0.0324/1K tokens | Context:262K | |
Qwen3.6-27B is an open-weight 27B-parameter dense multimodal model from the Qwen3.6 series, designed to deliver flagship-level coding and agentic performance at a practical deployment scale. It supports both text and image inputs and introduces improvements in agentic coding, repository-level reasoning, and iterative development workflows. Despite its relatively compact size, it achieves state-of-the-art results on coding benchmarks, outperforming much larger models in tasks such as SWE-bench and terminal-based workflows. It also provides strong reasoning and multimodal capabilities, along with features like thinking preservation to maintain context across interactions, making it well suited for developer tools, coding agents, and real-world engineering tasks. | IN:$0.000195OUT:$0.00156/1K tokens | Context:262K | |
MiMo-V2.5-Pro is Xiaomi's flagship model, delivering top-tier performance in agentic capabilities, complex software engineering, and long-horizon tasks. It ranks highly on benchmarks such as ClawEval, GDPVal, and SWE-bench Pro, demonstrating strong real-world reliability. The model can autonomously complete professional tasks that would take human experts days or weeks, executing thousands of tool calls within a single workflow. With a 1M-token context window, it is well suited for integration into advanced agent frameworks and large-scale task orchestration systems. | IN:$0.001OUT:$0.003/1K tokens | Context:1.0M | |
MiMo-V2.5 is Xiaomi's native omnimodal model, delivering pro-level agentic performance at roughly half the inference cost. It surpasses MiMo-V2-Omni in multimodal perception, particularly in image and video understanding. With a 1M-token context window, it can handle complete documents, extended conversations, and complex task contexts in a single pass. Combining strong reasoning, rich perception, and cost efficiency, MiMo-V2.5 is well suited for integration into advanced agent frameworks and real-world multimodal applications. | IN:$0.0004OUT:$0.002/1K tokens | Context:1.0M | |
GPT Image 2 combines OpenAI's GPT-5.4 with advanced image generation capabilities from GPT Image 2, enabling fully integrated multimodal workflows. It allows users to seamlessly transition between reasoning, coding, and visual generation within a single interaction, making it well suited for creative, development, and agent-driven applications that require both intelligence and visual output. | IN:$0OUT:$0/1K tokens | Context:272K | |
Kimi K2.6 is Moonshot AI's next-generation multimodal model, designed for long-horizon coding, UI/UX generation, and multi-agent orchestration. It handles complex end-to-end development tasks across languages such as Python, Rust, and Go, and can transform prompts and visual inputs into production-ready interfaces. Powered by a scalable agent swarm architecture, K2.6 can coordinate hundreds of parallel sub-agents for autonomous task decomposition, enabling the generation of documents, websites, and spreadsheets in a single run without human intervention. | IN:$0.0006OUT:$0.0028/1K tokens | Context:262K | |
Opus 4.7 is the next generation of Anthropic's Opus family, designed for long-running, asynchronous agent workflows. Building on Opus 4.6, it delivers stronger performance on complex, multi-step tasks and more reliable execution across extended pipelines such as large codebases, multi-stage debugging, and end-to-end project orchestration. Beyond coding, Opus 4.7 enhances knowledge work capabilities, including document drafting, presentation creation, and data analysis. With strong coherence over long outputs and sessions, it is well suited for tasks requiring persistence, judgment, and sustained execution. | IN:$0.004OUT:$0.02/1K tokens | Context:1M | |
Opus 4.6 is Anthropic's more faster version of Opus 4.6 model for coding and long-running professional workflows, designed for agents that operate across entire workflows rather than single prompts. It demonstrates strong performance on large codebases, complex refactoring, and multi-step debugging, with improved contextual understanding, deeper problem decomposition, and higher reliability on challenging engineering tasks compared to earlier generations. Beyond software development, Opus 4.6 excels at sustained knowledge work, producing near production-ready documents, technical plans, and analyses in a single pass while maintaining coherence across long outputs and extended sessions. Its strength in persistence, judgment, and structured execution makes it well suited for technical design, migration planning, and end-to-end project execution. | IN:$0.03OUT:$0.15/1K tokens | Context:1M | |
Qwen 3.6 Plus Preview is the next-generation evolution of the Qwen Plus series, built on an advanced hybrid architecture that enhances efficiency and scalability. It delivers improved reasoning capabilities and more reliable agentic behavior compared to the 3.5 series, with benchmark performance at or above leading state-of-the-art models. Designed as a flagship preview model, it excels in agentic coding, front-end development, and complex problem solving, making it well suited for advanced development workflows and high-performance applications. | IN:$0.000325OUT:$0.00195/1K tokens | Context:1M | |
GLM-5.1 delivers a major advancement in coding capability, with significant improvements in handling long-horizon tasks. It is designed to operate beyond short interactions, enabling continuous, autonomous execution over extended periods. The model can work independently on a single task for 8+ hours, performing planning, execution, and iterative self-improvement to produce complete, engineering-grade results, making it well suited for complex development workflows and autonomous agent systems. | IN:$0.00126OUT:$0.00396/1K tokens | Context:203K | |
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind, featuring 25.2B total parameters with only 3.8B activated per token—delivering near 31B-class quality at a fraction of the compute cost. It supports multimodal inputs including text, images, and video (up to 60s at 1fps). The model includes a 256K token context window, native function calling, configurable thinking/reasoning modes, and structured output support. Released under the Apache 2.0 license, it is well suited for efficient, production-ready multimodal and agentic applications. | Free | Context:262K | |
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind, featuring 25.2B total parameters with only 3.8B activated per token—delivering near 31B-class quality at a fraction of the compute cost. It supports multimodal inputs including text, images, and video (up to 60s at 1fps). The model includes a 256K token context window, native function calling, configurable thinking/reasoning modes, and structured output support. Released under the Apache 2.0 license, it is well suited for efficient, production-ready multimodal and agentic applications. | IN:$0.00013OUT:$0.0004/1K tokens | Context:262K | |
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model, supporting text and image inputs with text outputs. It features a 256K token context window, configurable thinking/reasoning modes, native function calling, and broad multilingual support across 140+ languages. The model delivers strong performance in coding, reasoning, and document understanding, making it well suited for developer workflows, multilingual applications, and structured knowledge tasks. | Free | Context:262K | |
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model, supporting text and image inputs with text outputs. It features a 256K token context window, configurable thinking/reasoning modes, native function calling, and broad multilingual support across 140+ languages. The model delivers strong performance in coding, reasoning, and document understanding, making it well suited for developer workflows, multilingual applications, and structured knowledge tasks. | IN:$0.00014OUT:$0.0004/1K tokens | Context:262K | |
GLM-5V-Turbo is Z.ai's first native multimodal agent foundation model, designed for vision-based coding and agent-driven workflows. It natively supports image, video, and text inputs, enabling integrated multimodal reasoning and execution. The model excels at long-horizon planning, complex coding, and multi-step task execution, and works seamlessly with agents to complete the full loop of “perceive → plan → execute”, making it well suited for advanced multimodal automation and real-world agent systems. | IN:$0.0012OUT:$0.004/1K tokens | Context:203K | |
Grok 4.20 Multi-Agent is a specialized variant of xAI's Grok 4.20 designed for collaborative, agent-based workflows. It enables multiple agents to operate in parallel, coordinating tool use and synthesizing information to handle complex, multi-step tasks. Optimized for deep research and large-scale problem solving, the model supports configurable reasoning effort: 4 agents for low/medium settings and up to 16 agents for high/xhigh settings, enabling scalable parallel reasoning and execution. | IN:$0.002OUT:$0.006/1K tokens | Context:2M | |
Grok 4.20 is xAI's newest flagship model, designed for high-performance reasoning with industry-leading speed and strong agentic tool-calling capabilities. It emphasizes strict prompt adherence and low hallucination rates, delivering highly precise and reliable responses. Optimized for agent workflows and real-time applications, Grok 4.20 provides consistent, truthful outputs while maintaining fast inference and robust task execution. | IN:$0.002OUT:$0.006/1K tokens | Context:2M | |
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with advanced agentic capabilities, including visual grounding, multi-step planning, tool use, and code execution. With a 256K context window, MiMo-V2-Omni is well suited for complex real-world tasks that span multiple modalities, enabling integrated reasoning and execution across diverse input types. | IN:$0.0004OUT:$0.002/1K tokens | Context:262K | |
MiMo-V2-Pro is Xiaomi's flagship foundation model with over 1T parameters and a 1M-token context window, optimized for advanced agentic workflows. It is highly adaptable to general agent frameworks such as OpenClaw, delivering strong performance in complex, real-world task execution. Ranking among the top tier on benchmarks like PinchBench and ClawBench, with performance approaching models like Opus 4.6, MiMo-V2-Pro is designed to act as the core intelligence of agent systems, orchestrating workflows, driving production engineering tasks, and delivering reliable results at scale. | IN:$0.001OUT:$0.003/1K tokens | Context:1.0M | |
MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. It incorporates advanced multi-agent collaboration, enabling the model to plan, execute, and iteratively refine complex tasks across dynamic environments. Built for production-grade workflows, M2.7 supports tasks such as live debugging, root cause analysis, financial modeling, and full document generation across Word, Excel, and PowerPoint. With strong benchmark performance—including 56.2% on SWE-Pro, 57.0% on Terminal Bench 2, and 1495 ELO on GDPval-AA—it sets a new standard for multi-agent systems in real-world digital workflows. | IN:$0.0003OUT:$0.0012/1K tokens | Context:205K | |
GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume workloads. It supports text and image inputs and is designed for low-latency tasks such as classification, data extraction, ranking, and sub-agent execution. Prioritizing responsiveness and efficiency over deep reasoning, GPT-5.4 nano is ideal for real-time systems, background processing, and distributed agent pipelines where minimizing cost and latency is essential. | IN:$0.0002OUT:$0.00125/1K tokens | Context:400K | |
GPT-5.4 mini brings the core capabilities of GPT-5.4 into a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs and delivers strong performance across reasoning, coding, and tool use, while reducing latency and cost for large-scale deployments. Designed for production environments, GPT-5.4 mini balances capability and efficiency, making it well suited for chat applications, coding assistants, and scalable agent workflows. It provides reliable instruction following, solid multi-step reasoning, and consistent performance across diverse tasks with improved cost efficiency. | IN:$0.00075OUT:$0.0045/1K tokens | Context:400K | |
Mistral Small 4 is the latest release in the Mistral Small family, unifying capabilities from multiple flagship models into a single system. It integrates strong reasoning (Magistral), multimodal understanding (Pixtral), and agentic coding capabilities (Devstral), enabling a versatile, all-in-one model. Designed to handle complex analysis, software development, and visual tasks within the same workflow, Mistral Small 4 is well suited for integrated agentic applications and end-to-end problem solving across domains. | IN:$0.00015OUT:$0.0006/1K tokens | Context:262K | |
GLM-5 Turbo is a high-performance model from Z.ai optimized for fast inference and agent-driven workflows. Designed for real-world environments such as OpenClaw scenarios, it delivers strong performance across long execution chains and complex task pipelines. The model features improved instruction decomposition, tool integration, scheduled and persistent execution, and enhanced stability for extended multi-step tasks, making it well suited for autonomous agents and production automation workflows. | IN:$0.00096OUT:$0.0032/1K tokens | Context:203K | |
Gemini Embedding 2 is Google's advanced text embedding model designed for high-accuracy semantic representation across large-scale retrieval and understanding tasks. It converts text into dense vector embeddings optimized for semantic search, retrieval-augmented generation (RAG), clustering, classification, and recommendation systems. Built for production use, it offers strong multilingual support, improved semantic similarity accuracy, and efficient embedding generation, making it well suited for large knowledge indexing pipelines and enterprise-scale retrieval applications. | IN:$0.0006OUT:$0.0024/1K tokens | Context:8K | |
Grok 4.20 Multi-Agent Beta is a specialized variant of xAI's Grok 4.20 designed for collaborative, agent-based workflows. It enables multiple agents to operate in parallel, coordinating tool use and information synthesis to handle complex tasks. Optimized for deep research and multi-step problem solving, the model supports parallel reasoning, coordinated execution, and structured knowledge synthesis across large and complex workflows. | IN:$0.002OUT:$0.006/1K tokens | Context:2M | |
Grok 4.20 Beta is xAI's newest flagship model, designed for high-performance reasoning with industry-leading speed and strong agentic tool-calling capabilities. It emphasizes strict prompt adherence and low hallucination rates, enabling highly reliable and precise responses across complex tasks. Optimized for agent workflows and real-time applications, Grok 4.20 Beta delivers consistent, truthful outputs while maintaining fast inference and strong task execution reliability. | IN:$0.002OUT:$0.006/1K tokens | Context:2M | |
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid Mixture-of-Experts model designed for complex multi-agent and long-horizon reasoning workflows. It activates only 12B parameters per token, enabling high compute efficiency while maintaining strong accuracy on advanced tasks. Built on a hybrid Mamba–Transformer MoE architecture with multi-token prediction (MTP), the model delivers significantly higher token generation throughput than leading open models. It supports a 1M-token context window for long-context reasoning, cross-document analysis, and multi-step task planning. Trained with multi-environment reinforcement learning across diverse benchmarks—including AIME 2025, TerminalBench, and SWE-Bench Verified—Nemotron 3 Super achieves strong performance across reasoning and coding tasks. Released fully open with weights, datasets, and training recipes, it supports flexible customization and secure deployment from local workstations to cloud environments. | Free | Context:262K | |
Seed-2.0-Lite is a balanced model designed for high-frequency enterprise workloads, optimizing for both capability and cost efficiency. It surpasses the previous-generation Seed-1.8 in overall performance while maintaining stable, production-ready quality. The model supports long-context processing, multi-source information fusion, multi-step instruction execution, and high-fidelity structured outputs. It is well suited for enterprise scenarios such as unstructured data processing, content generation, search and recommendation, and data analysis, delivering reliable results while significantly reducing operational cost. | IN:$0.0002OUT:$0.0016/1K tokens | Context:262K | |
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, built to deliver strong reasoning, coding, and visual understanding within an efficient 9B-parameter architecture. It adopts a unified vision-language design with early fusion of multimodal tokens, enabling the model to process and reason across text and images within the same context. With balanced multimodal capability and efficient deployment requirements, Qwen3.5-9B is well suited for applications that combine visual analysis, coding assistance, and general reasoning. | IN:$0.0001OUT:$0.00015/1K tokens | Context:262K | |
GPT-5.4 Pro is OpenAI's most advanced model, built on the unified GPT-5.4 architecture with enhanced reasoning capabilities for complex and high-stakes tasks. It supports text and image inputs and features a 1M+ token context window (≈922K input, 128K output) for handling large-scale workflows and long-context analysis. Optimized for step-by-step reasoning, instruction following, and accuracy, GPT-5.4 Pro excels in agentic coding, long-context problem solving, and complex multi-step workflows, making it well suited for advanced engineering, research, and high-reliability applications. | IN:$0.024OUT:$0.144/1K tokens | Context:1.1M | |
GPT-5.4 is OpenAI's latest frontier model, unifying the GPT and Codex lines into a single system designed for both general intelligence and advanced software engineering workflows. It supports text and image inputs and features a 1M+ token context window (≈922K input, 128K output), enabling high-context reasoning, coding, and multimodal analysis within a single workflow. The model delivers improved performance in coding, document understanding, tool use, and instruction following, and is designed as a strong default for complex tasks. It can generate production-quality code, synthesize information across large datasets, and execute multi-step workflows with fewer iterations and greater token efficiency. | IN:$0.002OUT:$0.012/1K tokens | Context:1.1M | |
Gemini 3.1 Flash Lite Preview is Google's high-efficiency model designed for high-volume and cost-sensitive workloads. It improves overall quality compared to Gemini 2.5 Flash Lite while approaching the performance of Gemini 2.5 Flash across key capabilities. The model delivers enhancements in audio input/ASR, RAG snippet ranking, translation, data extraction, and code completion, and supports configurable thinking levels (minimal, low, medium, high) for flexible cost–performance optimization. With pricing at roughly half the cost of Gemini 3 Flash, it is well suited for large-scale production deployments. | IN:$0.00025OUT:$0.0015/1K tokens | Context:1.0M | |
Grok 4.2 Image refers to the extended image understanding and generation capabilities associated with the Grok 4.2 model family from xAI. While Grok's primary focus remains advanced reasoning, multimodal support in the Grok ecosystem enables image interpretation and generation, allowing the model to process and respond to visual inputs alongside text. This includes understanding images such as screenshots and photos, and generating visual outputs when combined with associated image-generation tools in the Grok platform. | IN:$0OUT:$0/1K tokens | Context:128K | |
Grok 4.1 Image refers to the multimodal visual capabilities associated with xAI's Grok 4.1 model, enabling the system to process and generate content involving images alongside text. Grok 4.1 supports image understanding and image-based interaction through its API, allowing users to submit images (e.g., screenshots or photos) for analysis or multimodal question-answering. While the core Grok 4.1 LLM focuses on advanced reasoning and dialogue, its infrastructure includes visual processing features that extend the model's utility in multimodal tasks. | IN:$0OUT:$0/1K tokens | Context:128K | |
GPT-5.3 Chat is an updated version of ChatGPT's most widely used conversational model, designed to make everyday interactions smoother, more accurate, and more helpful. It improves contextual understanding and response quality while reducing unnecessary refusals, excessive caveats, and overly cautious phrasing that can disrupt conversational flow. Optimized for general-purpose dialogue, GPT-5.3 Chat delivers more natural, reliable responses across a wide range of everyday tasks and discussions. | IN:$0.000875OUT:$0.007/1K tokens | Context:128K | |
Grok 4.2 is the next major iteration of xAI's Grok series, advancing the model's reasoning, coding, and multimodal capabilities with architectural improvements over Grok 4 and 4.1. It is positioned as a more powerful and general-purpose frontier AI model in the Grok family with stronger deep reasoning and real-world task performance. | IN:$0.0021OUT:$0.0105/1K tokens | Context:1M | |
Gemini 3.1 Flash Image Preview (also known as "Nano Banana 2") is Google's latest state-of-the-art image generation and editing model, delivering Pro-level visual quality at Flash-level speed. It combines strong contextual understanding with fast, cost-efficient inference, enabling high-quality image generation and seamless iterative editing. Optimized for both performance and accessibility, it makes advanced visual creation workflows faster and more scalable. | IN:$0.00025OUT:$0.0015/1K tokens | Context:66K | |
Seed-2.0-mini is designed for latency-sensitive, high-concurrency, and cost-sensitive deployments, prioritizing fast response times and flexible inference scalability. It delivers performance comparable to ByteDance-Seed-1.6 while offering improved efficiency for lightweight, production workloads. The model supports a 256K context window, four adjustable reasoning effort modes (minimal, low, medium, high), and multimodal understanding. Optimized for scenarios where speed, scalability, and cost efficiency are critical, Seed-2.0-mini is well suited for real-time applications and lightweight agent workflows. | IN:$0.0001OUT:$0.0004/1K tokens | Context:262K | |
Qwen3.5 Vision-Language Flash models are built on a hybrid architecture that combines linear attention mechanisms with a sparse Mixture-of-Experts (MoE) design to achieve higher inference efficiency. Compared with the Qwen3 generation, the 3.5 Flash models deliver significant improvements in both pure-text reasoning and multimodal understanding. Optimized for fast response times, they strike a strong balance between inference speed and overall performance, making them well suited for real-time multimodal and agent-based applications. | IN:$0.0001OUT:$0.0004/1K tokens | Context:1M | |
Qwen3.5-122B-A10B is a native vision-language model built on a hybrid architecture that combines linear attention mechanisms with a sparse Mixture-of-Experts (MoE) design for improved inference efficiency. In overall performance, it ranks just below Qwen3.5-397B-A17B, delivering substantial gains over previous generations. Its text capabilities significantly exceed Qwen3-235B-2507, while its visual performance surpasses Qwen3-VL-235B, making it a strong high-end option for advanced multimodal and agent-driven applications. | IN:$0.0004OUT:$0.0032/1K tokens | Context:262K | |
Qwen3.5-35B-A3B is a native vision-language model built on a hybrid architecture that combines linear attention mechanisms with a sparse Mixture-of-Experts (MoE) design to enhance inference efficiency. It delivers balanced multimodal performance with overall capabilities comparable to Qwen3.5-27B, making it a practical option for efficient vision-language and agent-based applications. | IN:$0.00025OUT:$0.002/1K tokens | Context:262K | |
Qwen3.5-27B is a native vision-language dense model that incorporates a linear attention mechanism to deliver fast response times while maintaining a strong balance between inference speed and overall performance. Despite its smaller scale, its overall capabilities are comparable to Qwen3.5-122B-A10B, making it an efficient and practical choice for multimodal applications that require both responsiveness and high-quality reasoning. | IN:$0.0003OUT:$0.0024/1K tokens | Context:262K | |
Gemini 3.1 Pro Preview is Google's frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Built on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window for large-scale tasks. The 3.1 update introduces measurable gains on SWE benchmarks and real-world coding environments, along with stronger autonomous execution in structured domains such as finance and spreadsheet-based workflows. Designed for advanced development and agentic systems, it improves long-horizon stability and tool orchestration while adding a new medium thinking level to better balance cost, speed, and performance. Gemini 3.1 Pro Preview is well suited for agentic coding, structured planning, multimodal analysis, financial modeling, spreadsheet automation, and high-context enterprise applications. | IN:$0.001OUT:$0.006/1K tokens | Context:1.0M | |
Sonnet 4.6 is Anthropic's most capable Sonnet-class model, delivering frontier-level performance across coding, agent workflows, and professional knowledge tasks. It excels at iterative development, complex codebase navigation, and end-to-end project execution, supported by strong contextual understanding and persistent task handling. Beyond engineering tasks, Sonnet 4.6 produces polished documents and analyses while demonstrating reliable computer-use capabilities for web QA, workflow automation, and structured productivity workflows, making it well suited for both development and professional applications. | IN:$0.0024OUT:$0.012/1K tokens | Context:1M | |
Qwen3.5-397B-A17B is a native vision-language model built on a hybrid architecture that combines linear attention mechanisms with a sparse Mixture-of-Experts (MoE) design to achieve higher inference efficiency at large scale. It delivers state-of-the-art performance across a broad range of tasks, including language understanding, logical reasoning, code generation, agent-based workflows, image and video understanding, and GUI interaction. With strong coding and agent capabilities, Qwen3.5-397B-A17B demonstrates robust generalization across diverse multimodal and agentic scenarios, making it well suited for advanced applications that require integrated reasoning across text, vision, and interactive environments. | IN:$0.0006OUT:$0.0036/1K tokens | Context:256K | |
Qwen3.5 Vision-Language Plus models are part of the native multimodal Qwen3.5 series, built on a hybrid architecture that combines linear attention mechanisms with sparse Mixture-of-Experts (MoE) designs to improve inference efficiency at scale. Across a wide range of evaluations, the series demonstrates performance comparable to leading state-of-the-art models. Compared with the Qwen3 generation, the 3.5 Plus models deliver significant improvements in both pure-text reasoning and multimodal understanding, making them well suited for applications that require strong performance across language, vision, and agent-based tasks. | IN:$0.0004OUT:$0.0024/1K tokens | Context:1M | |
MiniMax-M2.5-Lightning is the high-speed variant of the M2.5 series, optimized for low latency, real-time responsiveness, and high-frequency workloads. It retains the core planning and execution strengths of M2.5 while further improving inference efficiency and response speed, making it ideal for interactive applications, rapid coding assistance, and workflow automation. With enhanced cost efficiency and reduced latency, M2.5-Lightning is particularly well suited for high-throughput, always-on deployments and production environments where speed and scalability are critical. | IN:$0.0003OUT:$0.0022/1K tokens | Context:128K | |
MiniMax-M2.5 is a state-of-the-art large language model designed for real-world productivity and digital work environments. Building on the coding strengths of M2.1, it expands into general office workflows, demonstrating strong capability in generating and operating Word, Excel, and PowerPoint files, switching context across software environments, and collaborating effectively with both human users and agent systems. Trained on diverse real-world working scenarios, M2.5 combines strong planning ability with improved token efficiency, enabling more effective task execution. With strong benchmark performance—including 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp—it is well suited for productivity automation, coding workflows, and agent-driven knowledge work. | IN:$0.0003OUT:$0.0012/1K tokens | Context:205K | |
GLM-5 is Z.AI's flagship open-source foundation model, engineered for complex systems design and long-horizon agent workflows. Built with expert developers in mind, it delivers production-grade performance on large-scale programming tasks, rivaling leading closed-source models. With strong agentic planning, deep backend reasoning, and iterative self-correction capabilities, GLM-5 extends beyond traditional code generation to support full-system construction and autonomous execution, making it well suited for advanced engineering and agent-driven development environments. | IN:$0.0006OUT:$0.0022/1K tokens | Context:203K | |
Gemini-Embedding-001 is Google's high-quality text embedding model designed for semantic understanding and retrieval tasks. It converts text into dense vector representations optimized for semantic search, retrieval-augmented generation (RAG), clustering, classification, and recommendation systems. The model emphasizes strong multilingual performance, high semantic accuracy, and efficient embedding generation, making it well suited for large-scale knowledge indexing and production retrieval pipelines. | IN:$0.000075OUT:$0.0003/1K tokens | Context:128K | |
Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it achieves substantial improvements in factual accuracy, complex reasoning, instruction following, alignment with human preferences, and agentic behavior. Optimized for advanced problem solving and long-horizon tasks, Qwen3-Max-Thinking is well suited for research, complex analysis, and agentic applications where reliability and structured reasoning are critical. | IN:$0.0012OUT:$0.006/1K tokens | Context:262K | |
Opus 4.6 is Anthropic's most capable model for coding and long-running professional workflows, designed for agents that operate across entire workflows rather than single prompts. It demonstrates strong performance on large codebases, complex refactoring, and multi-step debugging, with improved contextual understanding, deeper problem decomposition, and higher reliability on challenging engineering tasks compared to earlier generations. Beyond software development, Opus 4.6 excels at sustained knowledge work, producing near production-ready documents, technical plans, and analyses in a single pass while maintaining coherence across long outputs and extended sessions. Its strength in persistence, judgment, and structured execution makes it well suited for technical design, migration planning, and end-to-end project execution. | IN:$0.004OUT:$0.02/1K tokens | Context:1M | |
Opus 4.6 is Anthropic's most capable model for coding and long-running professional workflows, designed for agents that operate across entire workflows rather than single prompts. It demonstrates strong performance on large codebases, complex refactoring, and multi-step debugging, with improved contextual understanding, deeper problem decomposition, and higher reliability on challenging engineering tasks compared to earlier generations. Beyond software development, Opus 4.6 excels at sustained knowledge work, producing near production-ready documents, technical plans, and analyses in a single pass while maintaining coherence across long outputs and extended sessions. Its strength in persistence, judgment, and structured execution makes it well suited for technical design, migration planning, and end-to-end project execution. | IN:$0.004OUT:$0.02/1K tokens | Context:1M | |
GPT-Codex-5.3 is OpenAI's most advanced agentic coding model, designed for software engineering workflows that extend beyond single prompts into long-running, tool-driven execution. It combines the frontier coding performance of earlier Codex models with stronger reasoning and professional knowledge capabilities, enabling reliable handling of complex refactors, multi-step debugging, research-driven development, and autonomous task execution.  Optimized for developer productivity, GPT-Codex-5.3 supports interactive collaboration during execution, allowing users to steer tasks in real time without losing context. With improved agentic reliability, faster inference, and stronger performance on long-horizon engineering tasks, it is well suited for coding agents, IDE and CLI workflows, and end-to-end software development pipelines where persistence, tool use, and execution continuity are critical. | IN:$0.00175OUT:$0.014/1K tokens | Context:400K | |
Qwen3-Coder-Next is an open-weight causal language model purpose-built for coding agents and local development workflows. It employs a sparse Mixture-of-Experts (MoE) architecture with 80B total parameters and only 3B activated per token, achieving performance comparable to models with 10–20× higher active compute. This efficiency makes it especially well suited for cost-sensitive, always-on agent deployments. Trained with a strong agentic focus, Qwen3-Coder-Next performs reliably on long-horizon coding tasks, complex tool interactions, and robust recovery from execution failures. With a native 256K context window, it integrates smoothly into real-world CLI and IDE environments and aligns well with common agent scaffolding used by modern coding tools. The model operates exclusively in non-thinking mode and does not emit <think> blocks, simplifying production integration for coding agents. | IN:$0.000175OUT:$0.0014/1K tokens | Context:262K | |
Doubao-Seedream 5.0 Lite is ByteDance's optimized text-to-image generation model designed for fast, cost-efficient visual creation while retaining strong visual quality. It offers improved prompt understanding and rendering performance over previous "Lite" variants, making it suitable for real-time applications and interactive creative workflows. With a focus on speed, responsiveness, and lightweight deployment, Seedream 5.0-Lite enables rapid generation of visually appealing images across a wide range of styles and scenarios, making it ideal for user-facing creative tools and large-scale content pipelines. | IN:$0OUT:$0/1K tokens | Context:128K | |
Kimi K2.5 is Moonshot AI's native multimodal model, designed to deliver state-of-the-art visual coding capabilities and support a self-directed agent swarm paradigm. Built upon Kimi K2 and further enhanced through continued pretraining on approximately 15 trillion mixed visual and text tokens, it achieves strong, well-balanced performance across general reasoning, visual understanding and coding, and agentic tool-calling workflows. With its robust multimodal foundations and agent-oriented design, Kimi K2.5 is well suited for advanced applications that combine vision, code, and autonomous agent collaboration. | IN:$0.0004OUT:$0.00224/1K tokens | Context:262K | |
Qwen3-Max-Thinking is Alibaba's latest flagship reasoning-enhanced large language model, evolving the Qwen3-Max architecture to emphasize deep, multi-step analytical reasoning and tool collaboration. It scales the model's capacity significantly—reportedly to over 1 trillion parameters—and integrates a “Thinking Mode” where the model can expose and leverage step-by-step reasoning traces before producing final answers, enabling more reliable solutions to complex problems such as advanced mathematics, logic, and multi-stage tasks. | IN:$0.00125OUT:$0.005/1K tokens | Context:262K | |
GLM-4.7-Flash is a state-of-the-art 30B-class model designed to strike a strong balance between performance and efficiency. It is specifically optimized for agentic coding scenarios, with enhanced capabilities in code generation, long-horizon task planning, and tool-based collaboration. Among open-source models of comparable size, GLM-4.7-Flash has achieved leading results on multiple public benchmark leaderboards, establishing itself as a competitive and practical choice for advanced developer and agent workflows. | IN:$0.00006OUT:$0.0004/1K tokens | Context:200K | |
Veo 3.1 is a state-of-the-art generative AI video model developed by Google DeepMind (part of the broader Gemini/Flow ecosystem). It builds on the earlier Veo models to make AI-generated video creation more realistic, expressive, and controllable. | IN:$0OUT:$0/1K tokens | - | |
Veo 3.1 is a state-of-the-art generative AI video model developed by Google DeepMind (part of the broader Gemini/Flow ecosystem). It builds on the earlier Veo models to make AI-generated video creation more realistic, expressive, and controllable. | IN:$0OUT:$0/1K tokens | - | |
Veo 3.1 is a state-of-the-art generative AI video model developed by Google DeepMind (part of the broader Gemini/Flow ecosystem). It builds on the earlier Veo models to make AI-generated video creation more realistic, expressive, and controllable. | IN:$0OUT:$0/1K tokens | - | |
Veo 3.1 is a state-of-the-art generative AI video model developed by Google DeepMind (part of the broader Gemini/Flow ecosystem). It builds on the earlier Veo models to make AI-generated video creation more realistic, expressive, and controllable. | IN:$0OUT:$0/1K tokens | - | |
Molmo2-8B is an open vision-language model from AI2 that supports image, video, and multi-image understanding. Built on Qwen3-8B with a SigLIP 2 vision backbone, it excels at short-video tasks like counting and captioning while remaining competitive on long-video understanding, outperforming other open-weight, open-data models in its class. | Free | Context:128K | |
Olmo 3.1 32B Instruct is a 32B-parameter instruction-tuned model optimized for conversational AI and multi-turn dialogue. It focuses on strong instruction following and responsive chat behavior while maintaining solid reasoning and coding performance. Released by AI2 under Apache 2.0, it is fully open and transparent. | IN:$0.0002OUT:$0.0006/1K tokens | Context:66K | |
Seed 1.6 Flash is an ultra-fast multimodal deep-thinking model by ByteDance Seed, supporting both text and visual understanding, with a 256k context window and up to 16k tokens of output generation. | IN:$0.000075OUT:$0.0003/1K tokens | Context:262K | |
Seed 1.6 is a general-purpose multimodal model from the ByteDance Seed team, featuring adaptive deep thinking and a 256K context window. | IN:$0.00025OUT:$0.002/1K tokens | Context:262K | |
MiniMax-M2.1 is a lightweight, state-of-the-art model optimized for coding and agentic workflows, using just 10B activated parameters to deliver strong real-world performance with low latency and high cost efficiency. It improves on M2 with cleaner outputs and faster responses, leads in multilingual coding benchmarks (49.4% Multi-SWE-Bench, 72.5% SWE-Bench Multilingual), and serves as a versatile agent core for IDEs, coding tools, and general applications. | IN:$0.000225OUT:$0.0009/1K tokens | Context:205K | |
GLM-4.7 is Z.AI's newest flagship model, upgraded for stronger programming performance and more reliable multi-step reasoning. It handles complex agent tasks better while offering smoother conversations and improved UI/experience quality. | IN:$0.0003OUT:$0.0005/1K tokens | Context:203K |