Nvidia's New Model Processes Hour-Long Videos and Audio

Published: April 30, 2026 at 12:12 AM

Updated: April 30, 2026 at 12:12 AM

100-word summary

Nvidia released Nemotron 3 Nano Omni, an open model that understands video, audio, images, and text in a single pass. The upgrade matters for one reason: it can handle 100-plus page documents with charts and tables, or analyze full screen recordings and meetings without choking. The model runs 9x faster than competing open omni-modal models on multimodal tasks. It's available now on Hugging Face and 25-plus partner platforms, with open weights so companies can run it locally or in the cloud. Nvidia is betting the next wave of AI agents needs to digest messy real-world inputs, not just clean text prompts.

What happened

Why it matters

Nvidia is betting the next wave of AI agents needs to digest messy real-world inputs, not just clean text prompts.

Sources

NVIDIA Blog Hugging Face Blog NVIDIA Developer Blog