Nvidia's New Model Processes Hour-Long Videos and Audio

April 30, 2026

Nvidia's New Model Processes Hour-Long Videos and Audio

Published: April 30, 2026 at 12:12 AM

Updated: April 30, 2026 at 12:12 AM

100-word summary

Nvidia released Nemotron 3 Nano Omni, an open model that understands video, audio, images, and text in a single pass. The upgrade matters for one reason: it can handle 100-plus page documents with charts and tables, or analyze full screen recordings and meetings without choking. The model runs 9x faster than competing open omni-modal models on multimodal tasks. It's available now on Hugging Face and 25-plus partner platforms, with open weights so companies can run it locally or in the cloud. Nvidia is betting the next wave of AI agents needs to digest messy real-world inputs, not just clean text prompts.

What happened

Nvidia released Nemotron 3 Nano Omni, an open model that understands video, audio, images, and text in a single pass. The upgrade matters for one reason: it can handle 100-plus page documents with charts and tables, or analyze full screen recordings and meetings without choking. The model runs 9x faster than competing open omni-modal models on multimodal tasks. It's available now on Hugging Face and 25-plus partner platforms, with open weights so companies can run it locally or in the cloud.

Why it matters

Nvidia is betting the next wave of AI agents needs to digest messy real-world inputs, not just clean text prompts.

Sources