AI Fixes Security Bugs, But 85% of Patches Fail

Published: March 15, 2026 at 12:32 AM

Updated: March 15, 2026 at 12:32 AM

100-word summary

Meta's AutoPatchBench tested AI-generated security patches on 113 real Android vulnerabilities. The surprise? While AI produced fixes 60% of the time, fewer than one in six actually worked correctly. Gemini 1.5 Pro fared worse, generating patches 61% of the time but with under 15% accuracy. The gap reveals why security teams can't just point AI at their bug backlog. Meta now uses a three-stage filter: does it build, does it survive fuzzing, does it match the real fix? Even automated tests let half the wrong patches through, catching issues only humans spotted. What changes: AI can triage your security queue, not clear it.

AI Fixes Security Bugs, But 85% of Patches Fail

100-word summary

What happened

Why it matters

Sources