Google is introducing “implicit caching” in its Gemini API, which it says can cut the cost of using its Gemini 2.5 Pro and 2.5 Flash models by up to 75% on repeated context. Now enabled by default, this feature automatically reuses any overlapping request prefixes you’ve previously sent, passing the savings directly back to you. To qualify for a cache hit, prompts must be at least 2,048 tokens for Pro and 1,024 for Flash—roughly 1,500 and 750 words, respectively—though any new or variable content should be tacked on at the end of your request to maximize cacheability.
This move follows criticism of Google’s earlier explicit prompt‐caching system, which required developers to manually specify high-frequency prompts and sometimes resulted in unexpectedly large bills. In response to those complaints—and a public apology—Google has overhauled its approach to make caching seamless and automatic. However, it has not yet provided independent verification of the claimed savings, so developers will need to test it in their own workloads to confirm the benefits.