OpenAI's Newest Model Ignores Its Own Rules 11% of the Time

Published: April 19, 2026 at 12:39 AM

Updated: April 19, 2026 at 12:39 AM

100-word summary

OpenAI just published scorecards showing how often its models follow their own behavior rules. GPT-5 Thinking, its best performer, still fails 11% of the time on a 596-prompt test covering refusals, tone, and sensitive topics. The gap between generations is stark: GPT-4o scrapes by at 72% compliance, while GPT-5 Thinking hits 89%. The kicker? OpenAI is grading itself using GPT-5 Thinking as the automated judge. The company open-sourced the evaluation code and dataset, letting anyone benchmark their models against the same standard. Perfect compliance apparently remains out of reach, even when you write the test, grade the test, and train specifically for the test.

What happened

Why it matters

Perfect compliance apparently remains out of reach, even when you write the test, grade the test, and train specifically for the test.

Sources

OpenAI OpenAI Alignment