Krux

April 4, 2026
Google Splits Gemini API Into Cheap and Fast Tiers
Published: April 4, 2026 at 12:37 AM
Updated: April 4, 2026 at 12:37 AM
100-word summary
Google just added two new Gemini API tiers so developers can route requests by urgency instead of rebuilding their entire setup. Flex Inference cuts costs by 50% for background jobs like updating CRM records or running simulations. Priority Inference guarantees higher reliability for real-time tasks like customer support bots, though it costs more. The switch? A single parameter toggles between tiers, replacing the old split between synchronous and async batch endpoints. Catch: Flex trades speed and reliability for savings, while Priority can downgrade to standard service if you hit limits. It's menu pricing for AI calls.
What happened
Google just added two new Gemini API tiers so developers can route requests by urgency instead of rebuilding their entire setup. Flex Inference cuts costs by 50% for background jobs like updating CRM records or running simulations. Priority Inference guarantees higher reliability for real-time tasks like customer support bots, though it costs more. The switch? A single parameter toggles between tiers, replacing the old split between synchronous and async batch endpoints. Catch: Flex trades speed and reliability for savings, while Priority can downgrade to standard service if you hit limits.
Why it matters
It's menu pricing for AI calls.