Gemini 2.5 新功能：「隱性快取」節省高達 75% 成本

Google 最近推出了一項名為「隱性快取」（implicit caching）的新功能，為開發者提供更為便利和低成本的模型，並無須再額外設定。這項功能適用 Gemini API，傳遞「重複上下文」時節省高達 75% 成本。支援 Gemini 2.5 Pro 和 2.5 Flash 模型，給面對成本壓力的開發者更低負擔工具。

與過去的「顯性快取」（explicit caching）相比，隱性快取無需開發者手動定義常用提示，省去繁瑣設定並避免意外的API費用。隱性快取在Gemini 2.5模型自動啟用，當請求觸發快取時，並達到更高節省成本的效果。

根據Google開發者文件，隱性快取的觸發門檻為：2.5 Flash模型需至少1,024個token，2.5 Pro模型則需2,048個token，門檻相對較低。Google建議開發者在請求開頭放置重複上下文，變化內容置於末尾，以提高快取命中率。

We just shipped implicit caching in the Gemini API, automatically enabling a 75% cost savings with the Gemini 2.5 models when your request hits a cache 🚢

We also lowered the min token required to hit caches to 1K on 2.5 Flash and 2K on 2.5 Pro!

— Logan Kilpatrick (@OfficialLoganK) May 8, 2025