Krux

April 4, 2026
Meta's AI Cuts Kernel Optimization From Weeks to Hours
Published: April 4, 2026 at 12:38 AM
Updated: April 4, 2026 at 12:38 AM
100-word summary
Meta's KernelEvolve agent rewrites low-level GPU code automatically, exploring hundreds of variants to find the fastest version in hours instead of the weeks human engineers typically need. The system delivered a 60% inference speedup on NVIDIA chips and a 25% training boost on Meta's custom MTIA silicon for ads ranking models. It handles three chip vendors (NVIDIA, AMD, and Meta's own accelerators) using a single interface, passing 480 benchmark configurations with perfect correctness. The trick: treating optimization like a search problem, with past improvements feeding into future runs. Writing code that talks directly to silicon just stopped being a rare specialist skill.
What happened
Meta's KernelEvolve agent rewrites low-level GPU code automatically, exploring hundreds of variants to find the fastest version in hours instead of the weeks human engineers typically need. The system delivered a 60% inference speedup on NVIDIA chips and a 25% training boost on Meta's custom MTIA silicon for ads ranking models. It handles three chip vendors (NVIDIA, AMD, and Meta's own accelerators) using a single interface, passing 480 benchmark configurations with perfect correctness. The trick: treating optimization like a search problem, with past improvements feeding into future runs.
Why it matters
Writing code that talks directly to silicon just stopped being a rare specialist skill.