Krux

March 15, 2026
AI Fixes Security Bugs, But 85% of Patches Fail
Published: March 15, 2026 at 12:32 AM
Updated: March 15, 2026 at 12:32 AM
100-word summary
Meta's AutoPatchBench tested AI-generated security patches on 113 real Android vulnerabilities. The surprise? While AI produced fixes 60% of the time, fewer than one in six actually worked correctly. Gemini 1.5 Pro fared worse, generating patches 61% of the time but with under 15% accuracy. The gap reveals why security teams can't just point AI at their bug backlog. Meta now uses a three-stage filter: does it build, does it survive fuzzing, does it match the real fix? Even automated tests let half the wrong patches through, catching issues only humans spotted. What changes: AI can triage your security queue, not clear it.
What happened
Meta's AutoPatchBench tested AI-generated security patches on 113 real Android vulnerabilities. The surprise? While AI produced fixes 60% of the time, fewer than one in six actually worked correctly. Gemini 1.5 Pro fared worse, generating patches 61% of the time but with under 15% accuracy. The gap reveals why security teams can't just point AI at their bug backlog. Meta now uses a three-stage filter: does it build, does it survive fuzzing, does it match the real fix? Even automated tests let half the wrong patches through, catching issues only humans spotted.
Why it matters
What changes: AI can triage your security queue, not clear it.