Microsoft's 15B Model Reads Screens and Reasons About Charts

March 9, 2026

Microsoft's 15B Model Reads Screens and Reasons About Charts

Published: March 9, 2026 at 12:29 AM

Updated: March 9, 2026 at 12:29 AM

100-word summary

Microsoft just released Phi-4-reasoning-vision-15B, a mid-sized open model that can see images, parse documents, and reason about what's on your screen. The practical hook: you can now build AI that reads a dashboard, extracts the numbers, and explains what changed, or prototypes a bot that navigates apps by understanding buttons and menus. It's designed to run without massive compute, combining vision encoding with reasoning in a single 15-billion-parameter package. Available now on Hugging Face and GitHub with full weights and training code. The surprise is that Microsoft is handing over the recipe for multimodal reasoning at a size most teams can actually run.

What happened

Microsoft just released Phi-4-reasoning-vision-15B, a mid-sized open model that can see images, parse documents, and reason about what's on your screen. The practical hook: you can now build AI that reads a dashboard, extracts the numbers, and explains what changed, or prototypes a bot that navigates apps by understanding buttons and menus. It's designed to run without massive compute, combining vision encoding with reasoning in a single 15-billion-parameter package. Available now on Hugging Face and GitHub with full weights and training code.

Why it matters

The surprise is that Microsoft is handing over the recipe for multimodal reasoning at a size most teams can actually run.

Sources