Krux

March 12, 2026
Nvidia's New Open Model Fits an Entire Codebase in Memory
Published: March 12, 2026 at 12:28 AM
Updated: March 12, 2026 at 12:28 AM
100-word summary
Nvidia just released Nemotron 3 Super, a 120-billion-parameter open model that can hold 1 million tokens in context. That's roughly enough to load an entire mid-sized codebase or thousands of pages of financial reports without chopping them into pieces. The model uses just 12 billion active parameters during inference, making it 4x more efficient on Nvidia's Blackwell chips while delivering 5x higher throughput than its predecessor. Translation: you can run complex, multi-step AI tasks without constantly shuffling context in and out of memory. Available now with open weights via Perplexity, Hugging Face, and Nvidia's own API. The era of AI agents that forget what they read three screens ago might...
What happened
Nvidia just released Nemotron 3 Super, a 120-billion-parameter open model that can hold 1 million tokens in context. That's roughly enough to load an entire mid-sized codebase or thousands of pages of financial reports without chopping them into pieces. The model uses just 12 billion active parameters during inference, making it 4x more efficient on Nvidia's Blackwell chips while delivering 5x higher throughput than its predecessor. Translation: you can run complex, multi-step AI tasks without constantly shuffling context in and out of memory. Available now with open weights via Perplexity, Hugging Face, and Nvidia's own API.
Why it matters
The era of AI agents that forget what they read three screens ago might actually be ending.