Changelog

Type

May 2026

Feature

Feature Added

Switch Advisor — see the cost before you switch models

Switch Advisor estimates what your recent usage would cost on a different model — projected from your own pay-as-you-go activity, not a generic price list.

New in Settings → Usage

Switch Advisor estimates what your recent usage would cost on a different model — projected from your own pay-as-you-go activity, not a generic price list.

How it works

  • Pick a From model — your recently used models, highest-spend preselected
  • Search any platform model as the Candidate
  • Get an instant projection: current cost, projected cost, and savings (absolute + %)

The projection replays your real input/output token volumes from the selected time range on the candidate model's pricing — so the figure reflects how you actually use the model, not a marketing average.

Example: a workload on claude-opus-4-6 costing $36.00 over 30 days projects to $2.61 on deepseek-v4-pro — a 92.75% reduction.

Available now in Settings → Usage, right below the per-model usage table. No setup required.

▎ Cost estimate only — it does not compare output quality or response speed.

Read more

Feature

Models Added

Add Gemini 3.5 Flash

Gemini 3.5 Flash

Gemini 3.5 Flash

Gemini 3.5 Flash is Google's high-efficiency multimodal model, delivering near-Pro level performance in coding and reasoning at Flash-tier speed and cost. It supports text, image, video, audio, and PDF inputs, making it well suited for diverse multimodal workflows.

Optimized for coding proficiency and parallel agentic execution, the model defaults to medium thinking effort for faster, cost-efficient responses while supporting configurable thinking levels (minimal, low, medium, high) for fine-grained cost–performance control.

Enjoy it.

Read more

Feature

System Update

Model Availability Heartbeat

Apertis model detail pages now include a recent availability heartbeat, showing the latest observed delivery signal for each model directly on the page.

What changed

  • Added a Recent Availability card to model detail pages.
  • Added heartbeat bars for the recent delivery window.
  • Added hover and keyboard-focus tooltips with date, time, availability percentage, and status.
  • Counted retried or fallback-routed requests as healthy when the request ultimately succeeds.

How to read it

The card reports observed delivery, not a synthetic uptime probe.

If Apertis has recent successful delivery for a model, the relevant heartbeat bucket is green. If an actual delivery failure is observed, the bucket can move to degraded or unavailable based on the observed success rate.

When there is no recent traffic for a bucket, Apertis treats that silence as no observed anomaly and displays it as green 100%. This keeps the signal aligned with the rule that a model should not look unhealthy just because no one called it during that interval.

Why this matters

You can now check model-level health from the same page where you review pricing, context, endpoints, and examples. Teams choosing between models can see recent delivery quality without waiting for a separate status page or paying for active probes.

The implementation stays cost-aware by using delivery results Apertis already sees during normal routing.

Enjoy it.

Read more

Feature

Models Added

Add Mistral Medium 3.5 & Baidu Cobuddy

Mistral Medium 3.5CoBuddy (Free)

Mistral Medium 3.5

Mistral Medium 3.5 is a 128B dense instruction-following model from Mistral AI, supporting text and image inputs with text output. It is designed for agentic workflows, coding, and complex multi-step reasoning, with strong reliability in multi-tool orchestration and long-horizon tasks.

The model features a 256K token context window, configurable reasoning effort per request, and a custom vision encoder that handles variable image sizes and aspect ratios. With support for self-hosting on as few as four GPUs and availability under open weights, it is well suited for scalable, production-grade deployments.

Cobuddy

CoBuddy is a code generation model from Baidu, optimized for coding tasks and AI agent workflows. It delivers high inference throughput and low end-to-end latency, making it well suited for responsive development and automation environments.

The model includes native support for tool calling and reasoning, runs with FP8 quantization for efficient deployment, and supports a 131K token context window with up to 65K output tokens, enabling long-context coding and multi-step agentic workflows.

Enjoy them.

Read more