Feature

2.2.14System Update

Cached responses now support streaming (SSE) delivery, covering ~80% of API traffic that uses stream: true.

New feature: Cached responses now support streaming (SSE) delivery, covering ~80% of API traffic that uses stream: true.

  • On cache hit, the system emits synthetic SSE chunks from the stored response — no upstream API call needed
  • Content is split on rune boundaries (50 runes/chunk, 10ms intervals) to preserve multi-byte characters
  • Proper X-Cache-Hit, X-Cached-Tokens, and X-Actual-Model headers on streaming cache hits
  • Non-streaming cache hits continue to work as before (direct JSON response)

Cache Correctness Hardening

  • Temperature guard: Only caches requests where temperature: 0 is explicitly present in the raw JSON body. Omitted temperature (Go zero value 0.0) is no longer falsely treated as

cacheable — providers default to ~1.0 for omitted values

  • SSE error safety: If synthetic SSE emission fails mid-stream, the handler returns immediately instead of falling through to normal processing, preventing HTTP double-write

corruption

  • Tool call exclusion: Responses containing tool_calls are excluded from cache storage since the SSE emitter only supports text content replay

Cache TTL & Infrastructure

  • Default prompt cache TTL extended from 10 → 30 minutes

Enjoy it.