Krux

April 19, 2026
OpenAI's Newest Model Ignores Its Own Rules 11% of the Time
Published: April 19, 2026 at 12:39 AM
Updated: April 19, 2026 at 12:39 AM
100-word summary
OpenAI just published scorecards showing how often its models follow their own behavior rules. GPT-5 Thinking, its best performer, still fails 11% of the time on a 596-prompt test covering refusals, tone, and sensitive topics. The gap between generations is stark: GPT-4o scrapes by at 72% compliance, while GPT-5 Thinking hits 89%. The kicker? OpenAI is grading itself using GPT-5 Thinking as the automated judge. The company open-sourced the evaluation code and dataset, letting anyone benchmark their models against the same standard. Perfect compliance apparently remains out of reach, even when you write the test, grade the test, and train specifically for the test.
What happened
OpenAI just published scorecards showing how often its models follow their own behavior rules. GPT-5 Thinking, its best performer, still fails 11% of the time on a 596-prompt test covering refusals, tone, and sensitive topics. The gap between generations is stark: GPT-4o scrapes by at 72% compliance, while GPT-5 Thinking hits 89%. The kicker? OpenAI is grading itself using GPT-5 Thinking as the automated judge. The company open-sourced the evaluation code and dataset, letting anyone benchmark their models against the same standard.
Why it matters
Perfect compliance apparently remains out of reach, even when you write the test, grade the test, and train specifically for the test.