Changelog

Type

May 2026

Feature

System Update

Audio APIs Now Live

Full Audio API Support

Apertis now supports the OpenAI-compatible Audio API. Use a single API key to access leading TTS (text-to-speech) and STT (speech-to-text) models across providers.

Supported Models

Text-to-Speech (TTS)

  • gemini-3.1-flash-tts-preview — Google's latest Flash TTS preview
  • gpt-4o-mini-tts — OpenAI's lightweight real-time speech synthesis

Speech-to-Text (STT)

  • gpt-4o-transcribe — Flagship high-accuracy transcription
  • gpt-4o-mini-transcribe — Cost-efficient real-time transcription
  • whisper-large-v3-turbo — Accelerated Whisper v3
  • whisper-large-v3 — Full-precision Whisper
  • whisper-1 — The classic, battle-tested baseline

Endpoints

Drop-in compatible with the OpenAI SDK — no code changes required:

  • POST /v1/audio/speech — text → audio
  • POST /v1/audio/transcriptions — audio → text
  • POST /v1/audio/translations — audio → translated text

Billing

  • PAYG (pay-as-you-go): shares the same quota balance as chat/completions
  • Per-dimension billing: priced separately on input tokens / output

tokens / audio seconds, with admin-tunable AudioRatio

  • File limit: 25 MB per multipart upload
  • Subscriptions: audio models are PAYG-only for now (not included

in subscription plans)

Example

  from openai import OpenAI

  client = OpenAI(
      api_key="sk-your-apertis-key",
      base_url="https://api.apertis.ai/v1"
  )

  # TTS
  speech = client.audio.speech.create(
      model="gpt-4o-mini-tts",
      voice="alloy",
      input="Hello from Apertis."
  )
  speech.stream_to_file("hello.mp3")

  # STT
  with open("audio.mp3", "rb") as f:
      transcript = client.audio.transcriptions.create(
          model="whisper-large-v3-turbo",
          file=f
      )
  print(transcript.text)

Model Detail Page Updates

  • Endpoint and code samples auto-switch based on the model's task
  • TTS models now emit ready-to-run OpenAI SDK Python snippets
  • Web Search pricing column hidden for voice models (:web is unsupported)
Read more

Feature

Models Added

Add Grok 4.3

Grok 4.3

Grok 4.3

Grok 4.3 is a reasoning-focused model from xAI designed for agentic workflows, instruction following, and high factual accuracy tasks. It supports text and image inputs with text output, with reasoning always active and not configurable by effort level.

The model features a 1M-token context window with effectively no output token limit, making it well suited for long-document analysis, deep research, and multi-step agentic workflows. It uses tiered pricing, with higher rates applied to requests exceeding 200K total tokens.

Enjoy it.

Read more

April 2026

Feature

Models Added

Add Nemotron 3 Nano Omni (Free)

Nemotron 3 Nano Omni (Free)

Nemotron 3 Nano Omni (Free)

NVIDIA Nemotron 3 Nano Omni is an open 30B-A3B multimodal model designed as a perception and context sub-agent for enterprise agent systems. It supports text, image, video, and audio inputs with text output, enabling unified multimodal reasoning within a single inference loop. Built on a hybrid MoE Transformer–Mamba architecture with Conv3D video layers and Efficient Video Sampling (EVS), it delivers significantly improved efficiency for video reasoning—achieving ~2× higher throughput and 2.5× lower compute compared to separate pipelines.

With up to 300K context length and extended thinking support, it is well suited for scalable, multimodal agent workflows.

Enjoy it.

Read more

Feature

Models Added

Add latest Qwen Models

Qwen3.5 Plus 2026-04-20Qwen3.6 FlashQwen3.6 Max Preview

Qwen3.5 Plus 2026-04-20

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba, supporting text, image, and video inputs with text output. It features a 1M-token context window, enabling large-scale reasoning and multimodal workflows within a single interaction.

This updated version of Qwen3.5 Plus introduces tiered pricing beyond 256K tokens, making it suitable for high-context applications while maintaining flexibility for cost optimization in long-input scenarios.

Qwen3.6 Flash

Qwen3.6 Flash is a fast and efficient model from Alibaba's Qwen 3.6 series, supporting text, image, and video inputs with a 1M-token context window for high-context multimodal workflows.

Optimized for performance and cost efficiency, it features tiered pricing beyond 256K tokens and supports prompt caching with both cache creation and read pricing, making it well suited for large-scale, high-throughput applications.

Qwen3.6 Max Preview

Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse Mixture-of-Experts (MoE) architecture with approximately 1 trillion parameters. It is optimized for agentic coding, tool use, and long-context reasoning, supporting a 262K token context window.

The model includes an integrated thinking mode that preserves reasoning across multi-turn interactions, along with support for structured outputs and function calling.

Enjoy them.

Read more

Feature

Models Added

Add GPT-5.5 & GPT-5.5 Pro

GPT-5.5GPT-5.5 Pro

GPT-5.5

GPT-5.5 is OpenAI's frontier model for complex professional workloads, building on GPT-5.4 with stronger reasoning, higher reliability, and improved token efficiency on challenging tasks. It supports text and image inputs and features a 1M+ token context window (≈922K input, 128K output) for large-scale, high-context workflows.

Designed for advanced applications, GPT-5.5 excels in reasoning, coding, and multimodal workflows, enabling efficient execution of complex, multi-step tasks within a single system.

GPT-5.5 Pro

GPT-5.5 Pro is OpenAI's high-capability model optimized for deep reasoning and accuracy on complex, high-stakes workloads. It supports text and image inputs and features a 1M+ token context window (≈922K input, 128K output) for handling large-scale, long-context tasks.

Designed for long-horizon problem solving, agentic coding, and precise multi-step execution, GPT-5.5 Pro delivers strong reliability and performance across advanced engineering, research, and complex workflow scenarios.

Enjoy them.

Read more