Krux

March 9, 2026
Microsoft's 15B Model Reads Screens and Reasons About Charts
Published: March 9, 2026 at 12:29 AM
Updated: March 9, 2026 at 12:29 AM
100-word summary
Microsoft just released Phi-4-reasoning-vision-15B, a mid-sized open model that can see images, parse documents, and reason about what's on your screen. The practical hook: you can now build AI that reads a dashboard, extracts the numbers, and explains what changed, or prototypes a bot that navigates apps by understanding buttons and menus. It's designed to run without massive compute, combining vision encoding with reasoning in a single 15-billion-parameter package. Available now on Hugging Face and GitHub with full weights and training code. The surprise is that Microsoft is handing over the recipe for multimodal reasoning at a size most teams can actually run.
What happened
Microsoft just released Phi-4-reasoning-vision-15B, a mid-sized open model that can see images, parse documents, and reason about what's on your screen. The practical hook: you can now build AI that reads a dashboard, extracts the numbers, and explains what changed, or prototypes a bot that navigates apps by understanding buttons and menus. It's designed to run without massive compute, combining vision encoding with reasoning in a single 15-billion-parameter package. Available now on Hugging Face and GitHub with full weights and training code.
Why it matters
The surprise is that Microsoft is handing over the recipe for multimodal reasoning at a size most teams can actually run.