Thinking mode

Gemini 2.5 has an explicit reasoning phase. Enable it for hard problems; disable for latency-sensitive.

GenerateRequest req = GenerateRequest.builder()
    .extension("gemini.thinkingBudget", 8192)  // tokens for reasoning
    .build();
Advertisement

2M context

2 million token context. You can dump a full codebase into a prompt. Cost/latency scale with prompt size — don't do it casually.

Advertisement

Native multimodal

Audio in, audio out is now first-class. Voice agents don't need a separate TTS pipeline.

Cached content

Explicit context caching with a TTL. Great for system instructions + shared knowledge.

String cacheId = geminiLLM.cache(systemInstruction, TTL_1_HOUR);
req = req.withCache(cacheId);