Meta's AI Cuts Kernel Optimization From Weeks to Hours

April 4, 2026

Meta's AI Cuts Kernel Optimization From Weeks to Hours

Published: April 4, 2026 at 12:38 AM

Updated: April 4, 2026 at 12:38 AM

100-word summary

Meta's KernelEvolve agent rewrites low-level GPU code automatically, exploring hundreds of variants to find the fastest version in hours instead of the weeks human engineers typically need. The system delivered a 60% inference speedup on NVIDIA chips and a 25% training boost on Meta's custom MTIA silicon for ads ranking models. It handles three chip vendors (NVIDIA, AMD, and Meta's own accelerators) using a single interface, passing 480 benchmark configurations with perfect correctness. The trick: treating optimization like a search problem, with past improvements feeding into future runs. Writing code that talks directly to silicon just stopped being a rare specialist skill.

What happened

Meta's KernelEvolve agent rewrites low-level GPU code automatically, exploring hundreds of variants to find the fastest version in hours instead of the weeks human engineers typically need. The system delivered a 60% inference speedup on NVIDIA chips and a 25% training boost on Meta's custom MTIA silicon for ads ranking models. It handles three chip vendors (NVIDIA, AMD, and Meta's own accelerators) using a single interface, passing 480 benchmark configurations with perfect correctness. The trick: treating optimization like a search problem, with past improvements feeding into future runs.

Why it matters

Writing code that talks directly to silicon just stopped being a rare specialist skill.

Sources