Krux

March 7, 2026
Google's Flash-Lite Model Runs 2.5x Faster, Costs Less
Published: March 7, 2026 at 12:30 AM
Updated: March 7, 2026 at 12:30 AM
100-word summary
Google released Gemini 3.1 Flash-Lite, claiming it starts answering 2.5 times faster and outputs 45% quicker than its previous Flash model. The real shift is adaptive thinking levels: developers can dial reasoning up or down depending on whether they're moderating comments or generating dashboards. Pricing dropped to $0.25 per million input tokens. That means high-volume tasks like real-time translation or content filtering just got cheaper to run at the scale where costs actually matter. The model is in preview, so expect safety and reliability guardrails that might slow early adopters.
What happened
Google released Gemini 3.1 Flash-Lite, claiming it starts answering 2.5 times faster and outputs 45% quicker than its previous Flash model. The real shift is adaptive thinking levels: developers can dial reasoning up or down depending on whether they're moderating comments or generating dashboards. Pricing dropped to $0.25 per million input tokens. That means high-volume tasks like real-time translation or content filtering just got cheaper to run at the scale where costs actually matter.
Why it matters
The model is in preview, so expect safety and reliability guardrails that might slow early adopters.