Google's Flash-Lite Model Runs 2.5x Faster, Costs Less

Published: March 7, 2026 at 12:30 AM

Updated: March 7, 2026 at 12:30 AM

100-word summary

Google released Gemini 3.1 Flash-Lite, claiming it starts answering 2.5 times faster and outputs 45% quicker than its previous Flash model. The real shift is adaptive thinking levels: developers can dial reasoning up or down depending on whether they're moderating comments or generating dashboards. Pricing dropped to $0.25 per million input tokens. That means high-volume tasks like real-time translation or content filtering just got cheaper to run at the scale where costs actually matter. The model is in preview, so expect safety and reliability guardrails that might slow early adopters.

What happened

Why it matters

The model is in preview, so expect safety and reliability guardrails that might slow early adopters.

Sources

Google DeepMind