Krux

March 1, 2026
Google's Gemini 3.1 Pro Doubles Its Reasoning Score
Published: March 1, 2026 at 2:12 PM
Updated: March 1, 2026 at 2:12 PM
100-word summary
Google released Gemini 3.1 Pro on February 19, claiming it more than doubled its predecessor's reasoning benchmark (77.1% on ARC-AGI-2 versus the previous generation). The upgrade brings a 1-million-token context window and costs around $2 per million input tokens. Translation: the model can now digest an entire codebase or book in one go and chain together multi-step tasks with fewer hallucinations. It's rolling out across Google's developer tools, Vertex AI for enterprises, and consumer apps like NotebookLM. The catch? Google emphasizes it still needs human oversight, a reminder that smarter models don't mean foolproof ones.
What happened
Google released Gemini 3.1 Pro on February 19, claiming it more than doubled its predecessor's reasoning benchmark (77.1% on ARC-AGI-2 versus the previous generation). The upgrade brings a 1-million-token context window and costs around $2 per million input tokens. Translation: the model can now digest an entire codebase or book in one go and chain together multi-step tasks with fewer hallucinations. It's rolling out across Google's developer tools, Vertex AI for enterprises, and consumer apps like NotebookLM. The catch?
Why it matters
Google emphasizes it still needs human oversight, a reminder that smarter models don't mean foolproof ones.