Nvidia's New Open Model Fits an Entire Codebase in Memory

March 12, 2026

Nvidia's New Open Model Fits an Entire Codebase in Memory

Published: March 12, 2026 at 12:28 AM

Updated: March 12, 2026 at 12:28 AM

100-word summary

Nvidia just released Nemotron 3 Super, a 120-billion-parameter open model that can hold 1 million tokens in context. That's roughly enough to load an entire mid-sized codebase or thousands of pages of financial reports without chopping them into pieces. The model uses just 12 billion active parameters during inference, making it 4x more efficient on Nvidia's Blackwell chips while delivering 5x higher throughput than its predecessor. Translation: you can run complex, multi-step AI tasks without constantly shuffling context in and out of memory. Available now with open weights via Perplexity, Hugging Face, and Nvidia's own API. The era of AI agents that forget what they read three screens ago might...

What happened

Nvidia just released Nemotron 3 Super, a 120-billion-parameter open model that can hold 1 million tokens in context. That's roughly enough to load an entire mid-sized codebase or thousands of pages of financial reports without chopping them into pieces. The model uses just 12 billion active parameters during inference, making it 4x more efficient on Nvidia's Blackwell chips while delivering 5x higher throughput than its predecessor. Translation: you can run complex, multi-step AI tasks without constantly shuffling context in and out of memory. Available now with open weights via Perplexity, Hugging Face, and Nvidia's own API.

Why it matters

The era of AI agents that forget what they read three screens ago might actually be ending.

Sources