Changelog

Type

April 2026

Feature

Models Added

Add Claude Opus 4.6 (Fast)

Claude Opus 4.6 (Fast)

Opus 4.6 is Anthropic's more faster version of Opus 4.6 model for coding and long-running professional workflows, designed for agents that operate across entire workflows rather than single prompts. It demonstrates strong performance on large codebases, complex refactoring, and multi-step debugging, with improved contextual understanding, deeper problem decomposition, and higher reliability on challenging engineering tasks compared to earlier generations. Beyond software development, Opus 4.6 excels at sustained knowledge work, producing near production-ready documents, technical plans, and analyses in a single pass while maintaining coherence across long outputs and extended sessions. Its strength in persistence, judgment, and structured execution makes it well suited for technical design, migration planning, and end-to-end project execution.

Enjoy it.

Feature

Models Added

Add GLM 5.1

GLM 5.1

GLM-5.1 delivers a major advancement in coding capability, with significant improvements in handling long-horizon tasks. It is designed to operate beyond short interactions, enabling continuous, autonomous execution over extended periods. The model can work independently on a single task for 8+ hours, performing planning, execution, and iterative self-improvement to produce complete, engineering-grade results, making it well suited for complex development workflows and autonomous agent systems.

Enjoy it.

Feature

System Update

Add local token estimation for Claude Code CLI

the missing /v1/messages/count_tokens endpoint was causing Claude Code CLI to crash on startup.

We've just shipped a fix that implements this endpoint. It performs local token estimation, so it returns a valid {"input_tokens": N} response without any additional latency.

The endpoint supports:

String and structured content block messages
System prompts
Tool definitions

The fix is resolved and on Live. Claud Code CLI should start normally against the Apertis API.

Enjoy it.

Feature

Models Added

Add Gemma 4 31B & Gemma 4 26B A4B

Gemma 4 31BGemma 4 26B A4B

Gemma 4 31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model, supporting text and image inputs with text outputs. It features a 256K token context window, configurable thinking/reasoning modes, native function calling, and broad multilingual support across 140+ languages. The model delivers strong performance in coding, reasoning, and document understanding, making it well suited for developer workflows, multilingual applications, and structured knowledge tasks.

Gemma 4 26B A4B

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind, featuring 25.2B total parameters with only 3.8B activated per token—delivering near 31B-class quality at a fraction of the compute cost. It supports multimodal inputs including text, images, and video (up to 60s at 1fps). The model includes a 256K token context window, native function calling, configurable thinking/reasoning modes, and structured output support. Released under the Apache 2.0 license, it is well suited for efficient, production-ready multimodal and agentic applications.

Enjoy it.

Feature

Models Added

Add GLM 5V Turbo

GLM 5V Turbo

GLM-5V-Turbo is Z.ai's first native multimodal agent foundation model, designed for vision-based coding and agent-driven workflows. It natively supports image, video, and text inputs, enabling integrated multimodal reasoning and execution. The model excels at long-horizon planning, complex coding, and multi-step task execution, and works seamlessly with agents to complete the full loop of “perceive → plan → execute”, making it well suited for advanced multimodal automation and real-world agent systems.

Enjoy it.