Changelog

Type

April 2026

Feature

System Update

Add local token estimation for Claude Code CLI

the missing /v1/messages/count_tokens endpoint was causing Claude Code CLI to crash on startup.

We've just shipped a fix that implements this endpoint. It performs local token estimation, so it returns a valid {"input_tokens": N} response without any additional latency.

The endpoint supports:

String and structured content block messages
System prompts
Tool definitions

The fix is resolved and on Live. Claud Code CLI should start normally against the Apertis API.

Enjoy it.

Feature

Models Added

Add Gemma 4 31B & Gemma 4 26B A4B

Gemma 4 31BGemma 4 26B A4B

Gemma 4 31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model, supporting text and image inputs with text outputs. It features a 256K token context window, configurable thinking/reasoning modes, native function calling, and broad multilingual support across 140+ languages. The model delivers strong performance in coding, reasoning, and document understanding, making it well suited for developer workflows, multilingual applications, and structured knowledge tasks.

Gemma 4 26B A4B

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind, featuring 25.2B total parameters with only 3.8B activated per token—delivering near 31B-class quality at a fraction of the compute cost. It supports multimodal inputs including text, images, and video (up to 60s at 1fps). The model includes a 256K token context window, native function calling, configurable thinking/reasoning modes, and structured output support. Released under the Apache 2.0 license, it is well suited for efficient, production-ready multimodal and agentic applications.

Enjoy it.

Feature

Models Added

Add GLM 5V Turbo

GLM 5V Turbo

GLM-5V-Turbo is Z.ai's first native multimodal agent foundation model, designed for vision-based coding and agent-driven workflows. It natively supports image, video, and text inputs, enabling integrated multimodal reasoning and execution. The model excels at long-horizon planning, complex coding, and multi-step task execution, and works seamlessly with agents to complete the full loop of “perceive → plan → execute”, making it well suited for advanced multimodal automation and real-world agent systems.

Enjoy it.

Feature

Models Added

Add Grok 4.20

Grok 4.20Grok 4.20 Multi-Agent

Grok 4.20

Grok 4.20 is xAI's newest flagship model, designed for high-performance reasoning with industry-leading speed and strong agentic tool-calling capabilities. It emphasizes strict prompt adherence and low hallucination rates, delivering highly precise and reliable responses. Optimized for agent workflows and real-time applications, Grok 4.20 provides consistent, truthful outputs while maintaining fast inference and robust task execution.

Grok 4.20 Multi-Agent

Grok 4.20 Multi-Agent is a specialized variant of xAI's Grok 4.20 designed for collaborative, agent-based workflows. It enables multiple agents to operate in parallel, coordinating tool use and synthesizing information to handle complex, multi-step tasks. Optimized for deep research and large-scale problem solving, the model supports configurable reasoning effort: 4 agents for low/medium settings and up to 16 agents for high/xhigh settings, enabling scalable parallel reasoning and execution.

Enjoy it.

March 2026

Feature

Models Added

Add Qwen3.6 Plus Preview

Qwen3.6 Plus Preview (Free)

Qwen 3.6 Plus Preview is the next-generation evolution of the Qwen Plus series, built on an advanced hybrid architecture that enhances efficiency and scalability. It delivers improved reasoning capabilities and more reliable agentic behavior compared to the 3.5 series, with benchmark performance at or above leading state-of-the-art models.

Designed as a flagship preview model, it excels in agentic coding, front-end development, and complex problem solving, making it well suited for advanced development workflows and high-performance applications.

Enjoy it.