Google Splits Gemini API Into Cheap and Fast Tiers

April 4, 2026

Google Splits Gemini API Into Cheap and Fast Tiers

Published: April 4, 2026 at 12:37 AM

Updated: April 4, 2026 at 12:37 AM

100-word summary

Google just added two new Gemini API tiers so developers can route requests by urgency instead of rebuilding their entire setup. Flex Inference cuts costs by 50% for background jobs like updating CRM records or running simulations. Priority Inference guarantees higher reliability for real-time tasks like customer support bots, though it costs more. The switch? A single parameter toggles between tiers, replacing the old split between synchronous and async batch endpoints. Catch: Flex trades speed and reliability for savings, while Priority can downgrade to standard service if you hit limits. It's menu pricing for AI calls.

What happened

Google just added two new Gemini API tiers so developers can route requests by urgency instead of rebuilding their entire setup. Flex Inference cuts costs by 50% for background jobs like updating CRM records or running simulations. Priority Inference guarantees higher reliability for real-time tasks like customer support bots, though it costs more. The switch? A single parameter toggles between tiers, replacing the old split between synchronous and async batch endpoints. Catch: Flex trades speed and reliability for savings, while Priority can downgrade to standard service if you hit limits.

Why it matters

It's menu pricing for AI calls.

Sources