Changelog

Type

March 2026

Feature

Feature Added

✨ New Feature: Context Compression

Context Compression automatically summarizes conversation history using a smaller, cost-efficient model before sending requests to your primary model. This significantly reduces input token costs while preserving conversation context.

Highlights

  • Up to 78% token savings on long multi-turn conversations
  • Three compression strategies to balance quality vs. savings:
  • conservative — compresses after 8+ turns (minimal context loss)
  • on — compresses after 6+ turns (balanced)
  • aggressive — compresses after 3+ turns (maximum savings)
  • All endpoints supported:
  • POST /v1/chat/completions
  • POST /v1/messages
  • POST /v1/responses

How to Enable

Option 1: API Key Dashboard (Zero Code Changes)

Go to API Key Management → Edit your API key → Enable Context Compression and select your preferred strategy. All requests using that key will automatically apply compression.

Option 2: Per-Request via Request Body

  {
    "model": "gpt-4.1",
    "messages": [...],
    "compression": {
      "enabled": true,
      "strategy": "on",
      "model": "gpt-4.1-mini"
    }
  }

Option 3: Per-Request via HTTP Headers

X-Context-Compression: on X-Compression-Model: gpt-4.1-mini

SDK Support

Compression examples are now available for all supported SDKs:

  • Python SDK (OpenAI, Anthropic, Responses API)
  • TypeScript / Vercel AI SDK (@apertis/ai-sdk-provider)
  • LangChain (via default_headers)
  • LlamaIndex (via additional_kwargs)
  • LiteLLM (via extra_headers)

Priority

Request body params > HTTP headers > API key defaults. Per-request settings always override key-level defaults.

See more on **Documentation**

Read more

February 2026

Feature

Models Added

Add Grok 4.2

Grok 4.2

Grok 4.2 is the next major iteration of xAI's Grok series, advancing the model's reasoning, coding, and multimodal capabilities with architectural improvements over Grok 4 and 4.1. It is positioned as a more powerful and general-purpose frontier AI model in the Grok family with stronger deep reasoning and real-world task performance.

Read more

Feature

Models Added

Add Nano Banana 2 (Gemini 3.1 Flash Image Preview)

Nano Banana 2 (Gemini 3.1 Flash Image Preview)

Gemini 3.1 Flash Image Preview (also known as "Nano Banana 2") is Google's latest state-of-the-art image generation and editing model, delivering Pro-level visual quality at Flash-level speed. It combines strong contextual understanding with fast, cost-efficient inference, enabling high-quality image generation and seamless iterative editing. Optimized for both performance and accessibility, it makes advanced visual creation workflows faster and more scalable.

Read more

Feature

System Update

Cached responses now support streaming (SSE) delivery, covering ~80% of API traffic that uses stream: true.

New feature: Cached responses now support streaming (SSE) delivery, covering ~80% of API traffic that uses stream: true.

  • On cache hit, the system emits synthetic SSE chunks from the stored response — no upstream API call needed
  • Content is split on rune boundaries (50 runes/chunk, 10ms intervals) to preserve multi-byte characters
  • Proper X-Cache-Hit, X-Cached-Tokens, and X-Actual-Model headers on streaming cache hits
  • Non-streaming cache hits continue to work as before (direct JSON response)

Cache Correctness Hardening

  • Temperature guard: Only caches requests where temperature: 0 is explicitly present in the raw JSON body. Omitted temperature (Go zero value 0.0) is no longer falsely treated as

cacheable — providers default to ~1.0 for omitted values

  • SSE error safety: If synthetic SSE emission fails mid-stream, the handler returns immediately instead of falling through to normal processing, preventing HTTP double-write

corruption

  • Tool call exclusion: Responses containing tool_calls are excluded from cache storage since the SSE emitter only supports text content replay

Cache TTL & Infrastructure

  • Default prompt cache TTL extended from 10 → 30 minutes

Enjoy it.

Read more