Krux

March 9, 2026
Google's Gemini 3.1 Flash-Lite Is 2.5x Faster, Half the Price
Published: March 9, 2026 at 12:29 AM
Updated: March 9, 2026 at 12:29 AM
100-word summary
Google just released Gemini 3.1 Flash-Lite, its speediest model yet. It starts answering 2.5x faster than the previous version and outputs responses 45% quicker, all while costing less: $0.25 per million input tokens versus higher rates for older models. The real trick? You can now dial how hard the model "thinks" on each task. Set it low for content moderation that needs to run thousands of times per second, crank it up when generating dashboards or simulations. The model is in preview, so real-world mileage may vary. Speed used to mean stupidity in AI. Not anymore.
What happened
Google just released Gemini 3.1 Flash-Lite, its speediest model yet. It starts answering 2.5x faster than the previous version and outputs responses 45% quicker, all while costing less: $0.25 per million input tokens versus higher rates for older models. The real trick? You can now dial how hard the model "thinks" on each task. Set it low for content moderation that needs to run thousands of times per second, crank it up when generating dashboards or simulations. The model is in preview, so real-world mileage may vary. Speed used to mean stupidity in AI.
Why it matters
Not anymore.