Changelog

Type

February 2026

Feature

Models Added

Add Nano Banana 2 (Gemini 3.1 Flash Image Preview)

Nano Banana 2 (Gemini 3.1 Flash Image Preview)

Gemini 3.1 Flash Image Preview (also known as "Nano Banana 2") is Google's latest state-of-the-art image generation and editing model, delivering Pro-level visual quality at Flash-level speed. It combines strong contextual understanding with fast, cost-efficient inference, enabling high-quality image generation and seamless iterative editing. Optimized for both performance and accessibility, it makes advanced visual creation workflows faster and more scalable.

Read more

Feature

System Update

Cached responses now support streaming (SSE) delivery, covering ~80% of API traffic that uses stream: true.

New feature: Cached responses now support streaming (SSE) delivery, covering ~80% of API traffic that uses stream: true.

  • On cache hit, the system emits synthetic SSE chunks from the stored response — no upstream API call needed
  • Content is split on rune boundaries (50 runes/chunk, 10ms intervals) to preserve multi-byte characters
  • Proper X-Cache-Hit, X-Cached-Tokens, and X-Actual-Model headers on streaming cache hits
  • Non-streaming cache hits continue to work as before (direct JSON response)

Cache Correctness Hardening

  • Temperature guard: Only caches requests where temperature: 0 is explicitly present in the raw JSON body. Omitted temperature (Go zero value 0.0) is no longer falsely treated as

cacheable — providers default to ~1.0 for omitted values

  • SSE error safety: If synthetic SSE emission fails mid-stream, the handler returns immediately instead of falling through to normal processing, preventing HTTP double-write

corruption

  • Tool call exclusion: Responses containing tool_calls are excluded from cache storage since the SSE emitter only supports text content replay

Cache TTL & Infrastructure

  • Default prompt cache TTL extended from 10 → 30 minutes

Enjoy it.

Read more

Feature

Feature Added

✨ New Feature: Monthly Budget Controls

You can now set a monthly spending cap on your API usage. Once enabled, usage is tracked against your limit and automatically resets on your chosen billing cycle date.

You can now set a monthly spending cap on your API usage. Once enabled, usage is tracked against your limit and automatically resets on your chosen billing cycle date.

  • Monthly spending limit — Set a dollar amount that caps total API usage across all your keys each month.
  • Custom reset day — Choose any day from the 1st to the 28th as your monthly cycle start date.
  • Per-key budgets — Optionally allocate portions of your monthly limit to individual API keys for finer-grained control.
  • Usage alerts — Get an email notification when your usage crosses a customizable threshold (default 80%).
Read more

Feature

Models Added

Add MiniMax M2.5 (Lightning)

MiniMax M2.5 (Lightning)

MiniMax-M2.5-Lightning is the high-speed variant of the M2.5 series, optimized for low latency, real-time responsiveness, and high-frequency workloads. It retains the core planning and execution strengths of M2.5 while further improving inference efficiency and response speed, making it ideal for interactive applications, rapid coding assistance, and workflow automation. With enhanced cost efficiency and reduced latency, M2.5-Lightning is particularly well suited for high-throughput, always-on deployments and production environments where speed and scalability are critical.

https://apertis.ai/models/minimax-m2.5-lightning

Read more