gemini-3.5-flashGemini 3.5 Flash is Google's high-efficiency multimodal model, delivering near-Pro level performance in coding and reasoning at Flash-tier speed and cost. It supports text, image, video, audio, and PDF inputs, making it well suited for diverse multimodal workflows. Optimized for coding proficiency and parallel agentic execution, the model defaults to medium thinking effort for faster, cost-efficient responses while supporting configurable thinking levels (minimal, low, medium, high) for fine-grained cost–performance control.
Select an endpoint and copy a working example for this model.
from openai import OpenAI client = OpenAI( api_key="YOUR_API_KEY", base_url="https://api.apertis.ai/v1") response = client.chat.completions.create( model="gemini-3.5-flash", messages=[ {"role": "user", "content": "Hello!"} ], max_tokens=1024, temperature=0.7) print(response.choices[0].message.content) # Optional: Enable context compression to reduce token usage# response = client.chat.completions.create(# model="gemini-3.5-flash",# messages=[{"role": "user", "content": "Hello!"}],# extra_body={"compression": {"enabled": True, "model": "gpt-4.1-mini"}}# )modelmessagesmax_tokenstemperaturetop_pstreamtoolsreasoning_effortstream_optionsthinkingextra_bodyUse these namespaced identifiers in Cursor IDE to avoid conflicts with built-in models.
See how this model compares to others from the same provider.
Nano Banana 2 Lite (Gemini 3.1 Flash Lite Image) is Google's fastest and most cost-efficient multimodal image generation model, designed for high-throughput visual workflows and real-time applications. It supports text-to-image generation, image editing, and multi-image composition through a unified API, while also producing text outputs alongside images. Delivering image generation in approximately 4 seconds, it combines fast inference with strong character consistency, precise editing, and real-world knowledge. The model generates 1K-resolution images across 14 aspect ratios and embeds an invisible SynthID watermark in all outputs. Optimized for the best balance of quality, speed, and cost, Nano Banana 2 Lite is ideal for prototyping, developer pipelines, and large-scale visual content generation.
Gemini 3.1 Flash TTS Preview is Google's next-generation text-to-speech model, delivering a major upgrade over Gemini 2.5 Flash TTS. It converts text into natural audio across 70+ languages, with significantly expanded language coverage and improved quality. The model introduces 200+ inline audio control tags (e.g., [whispers], [laughs], [excited]) for fine-grained control over emotion, tone, and pacing, along with support for two speakers with independent voice and style settings. It outputs 24 kHz / 16-bit PCM audio, includes SynthID watermarking, and supports a 32K token context window. Designed for expressive and controllable voice generation, it is well suited for dialogue systems, storytelling, character-driven content, and advanced audio production workflows.
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind, featuring 25.2B total parameters with only 3.8B activated per token—delivering near 31B-class quality at a fraction of the compute cost. It supports multimodal inputs including text, images, and video (up to 60s at 1fps). The model includes a 256K token context window, native function calling, configurable thinking/reasoning modes, and structured output support. Released under the Apache 2.0 license, it is well suited for efficient, production-ready multimodal and agentic applications.
See how this model compares to others from the same provider.
Gemini 2.5 Flash-Lite is a lightweight, low-latency model focused on speed and cost efficiency. It generates tokens quickly and outperforms earlier Flash models on common benchmarks. “Thinking” (multi-pass reasoning) is off by default for maximum speed, but can be turned on through the Reasoning API when deeper reasoning is needed.
Gemini 2.5 Flash is Google's main high-performance model for complex reasoning, coding, math, and scientific tasks. It has built-in “thinking” features that help it produce more accurate, context-aware answers.
Gemini 2.5 Flash Preview (May 2025) is Google's high-performance general model built for advanced reasoning, coding, math, and science. It includes built-in “thinking” features to deliver more accurate, context-aware answers.
Gemini 2.5 Flash Image Preview (“Nano Banana”) is a cutting-edge image generation model with strong contextual understanding. It can create and edit images and supports multi-turn conversational workflows around visuals.
Initialized observational baseline with no recorded failures