← Back to Changelog

System Update

Released: 2026-02-26

New feature: Cached responses now support streaming (SSE) delivery, covering ~80% of API traffic that uses stream: true. - On cache hit, the system emits synthetic SSE chunks from the stored response — no upstream API call needed - Content is split on rune boundaries (50 runes/chunk, 10ms intervals) to preserve multi-byte characters - Proper X-Cache-Hit, X-Cached-Tokens, and X-Actual-Model headers on streaming cache hits - Non-streaming cache hits continue to work as before (direct JSON response) Cache Correctness Hardening - Temperature guard: Only caches requests where temperature: 0 is explicitly present in the raw JSON body. Omitted temperature (Go zero value 0.0) is no longer falsely treated as cacheable — providers default to ~1.0 for omitted values - SSE error safety: If synthetic SSE emission fails mid-stream, the handler returns immediately instead of falling through to normal processing, preventing HTTP double-write corruption - Tool call exclusion: Responses containing tool_calls are excluded from cache storage since the SSE emitter only supports text content replay Cache TTL & Infrastructure - Default prompt cache TTL extended from 10 → 30 minutes Enjoy it.