Microsoft's 15B Model Reads Screens and Reasons About Charts

Published: March 9, 2026 at 12:29 AM

Updated: March 9, 2026 at 12:29 AM

100-word summary

Microsoft just released Phi-4-reasoning-vision-15B, a mid-sized open model that can see images, parse documents, and reason about what's on your screen. The practical hook: you can now build AI that reads a dashboard, extracts the numbers, and explains what changed, or prototypes a bot that navigates apps by understanding buttons and menus. It's designed to run without massive compute, combining vision encoding with reasoning in a single 15-billion-parameter package. Available now on Hugging Face and GitHub with full weights and training code. The surprise is that Microsoft is handing over the recipe for multimodal reasoning at a size most teams can actually run.

What happened

Why it matters

The surprise is that Microsoft is handing over the recipe for multimodal reasoning at a size most teams can actually run.

Sources

Microsoft Research Microsoft Foundry Hugging Face VentureBeat