Thinking mode
Gemini 2.5 has an explicit reasoning phase. Enable it for hard problems; disable for latency-sensitive.
GenerateRequest req = GenerateRequest.builder()
.extension("gemini.thinkingBudget", 8192) // tokens for reasoning
.build();Advertisement
2M context
2 million token context. You can dump a full codebase into a prompt. Cost/latency scale with prompt size — don't do it casually.
Advertisement
Native multimodal
Audio in, audio out is now first-class. Voice agents don't need a separate TTS pipeline.
Cached content
Explicit context caching with a TTL. Great for system instructions + shared knowledge.
String cacheId = geminiLLM.cache(systemInstruction, TTL_1_HOUR);
req = req.withCache(cacheId);