Changelog

Type

May 2026

Feature

System Update

Audio APIs Now Live

Full Audio API Support

Apertis now supports the OpenAI-compatible Audio API. Use a single API key to access leading TTS (text-to-speech) and STT (speech-to-text) models across providers.

Supported Models

Text-to-Speech (TTS)

gemini-3.1-flash-tts-preview — Google's latest Flash TTS preview
gpt-4o-mini-tts — OpenAI's lightweight real-time speech synthesis

Speech-to-Text (STT)

gpt-4o-transcribe — Flagship high-accuracy transcription
gpt-4o-mini-transcribe — Cost-efficient real-time transcription
whisper-large-v3-turbo — Accelerated Whisper v3
whisper-large-v3 — Full-precision Whisper
whisper-1 — The classic, battle-tested baseline

Endpoints

Drop-in compatible with the OpenAI SDK — no code changes required:

POST /v1/audio/speech — text → audio
POST /v1/audio/transcriptions — audio → text
POST /v1/audio/translations — audio → translated text

Billing

PAYG (pay-as-you-go): shares the same quota balance as chat/completions
Per-dimension billing: priced separately on input tokens / output

tokens / audio seconds, with admin-tunable AudioRatio

File limit: 25 MB per multipart upload
Subscriptions: audio models are PAYG-only for now (not included

in subscription plans)

Example

  from openai import OpenAI

  client = OpenAI(
      api_key="sk-your-apertis-key",
      base_url="https://api.apertis.ai/v1"
  )

  # TTS
  speech = client.audio.speech.create(
      model="gpt-4o-mini-tts",
      voice="alloy",
      input="Hello from Apertis."
  )
  speech.stream_to_file("hello.mp3")

  # STT
  with open("audio.mp3", "rb") as f:
      transcript = client.audio.transcriptions.create(
          model="whisper-large-v3-turbo",
          file=f
      )
  print(transcript.text)

Model Detail Page Updates

Endpoint and code samples auto-switch based on the model's task
TTS models now emit ready-to-run OpenAI SDK Python snippets
Web Search pricing column hidden for voice models (:web is unsupported)

Feature

Grok 4.3 is a reasoning-focused model from xAI designed for agentic workflows, instruction following, and high factual accuracy tasks. It supports text and image inputs with text output, with reasoning always active and not configurable by effort level.

The model features a 1M-token context window with effectively no output token limit, making it well suited for long-document analysis, deep research, and multi-step agentic workflows. It uses tiered pricing, with higher rates applied to requests exceeding 200K total tokens.

Enjoy it.

April 2026

Feature

Models Added

Add Nemotron 3 Nano Omni (Free)

Nemotron 3 Nano Omni (Free)

NVIDIA Nemotron 3 Nano Omni is an open 30B-A3B multimodal model designed as a perception and context sub-agent for enterprise agent systems. It supports text, image, video, and audio inputs with text output, enabling unified multimodal reasoning within a single inference loop. Built on a hybrid MoE Transformer–Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS), it delivers significantly improved efficiency for video reasoning—achieving ~2× higher throughput and 2.5× lower compute compared to separate pipelines.

With up to 300K context length and extended thinking support, it is well suited for scalable, multimodal agent workflows.

Enjoy it.

Feature

Models Added

Add latest Qwen Models

Qwen3.5 Plus 2026-04-20Qwen3.6 FlashQwen3.6 Max Preview

Qwen3.5 Plus 2026-04-20

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba, supporting text, image, and video inputs with text output. It features a 1M-token context window, enabling large-scale reasoning and multimodal workflows within a single interaction.

This updated version of Qwen3.5 Plus introduces tiered pricing beyond 256K tokens, making it suitable for high-context applications while maintaining flexibility for cost optimization in long-input scenarios.

Qwen3.6 Flash

Qwen3.6 Flash is a fast and efficient model from Alibaba's Qwen 3.6 series, supporting text, image, and video inputs with a 1M-token context window for high-context multimodal workflows.

Optimized for performance and cost efficiency, it features tiered pricing beyond 256K tokens and supports prompt caching with both cache creation and read pricing, making it well suited for large-scale, high-throughput applications.

Qwen3.6 Max Preview

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse Mixture-of-Experts (MoE) architecture with approximately 1 trillion parameters. It is optimized for agentic coding, tool use, and long-context reasoning, supporting a 262K token context window.

The model includes an integrated thinking mode that preserves reasoning across multi-turn interactions, along with support for structured outputs and function calling.

Enjoy them.

Feature

Models Added

Add GPT-5.5 & GPT-5.5 Pro

GPT-5.5GPT-5.5 Pro

GPT-5.5

GPT-5.5 is OpenAI's frontier model for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on challenging tasks. It supports text and image inputs and features a 1M+ token context window (≈922K input, 128K output) for large-scale, high-context workflows.

Designed for advanced applications, GPT-5.5 excels in reasoning, coding, and multimodal workflows, enabling efficient execution of complex, multi-step tasks within a single system.

GPT-5.5 Pro

GPT-5.5 Pro is OpenAI's high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It supports text and image inputs and features a 1M+ token context window (≈922K input, 128K output) for handling large-scale, long-context tasks.

Designed for long-horizon problem solving, agentic coding, and precise multi-step execution, GPT-5.5 Pro delivers strong reliability and performance across advanced engineering, research, and complex workflow scenarios.

Enjoy them.