Hugging Face Hides Test Data to Stop Leaderboard Gaming

Published: May 8, 2026 at 12:20 AM

Updated: May 8, 2026 at 12:20 AM

100-word summary

Hugging Face added a private-data track to its speech recognition leaderboard to catch models overfitted to public benchmarks. The new setup tests models on hidden datasets from Appen and DataoceanAI, covering scripted and conversational speech across five English accents. Rankings now shift when you toggle private data on, revealing which models actually work versus which just memorized the test set. The leaderboard has logged over 700,000 visits since 2023, making it a high-stakes target for gaming. Default scores still use public data, but the private track exposes gaps between US and international accents that polished benchmark numbers miss. Turns out your "state-of-the-art" model might bomb on actual phone calls.

Hugging Face Hides Test Data to Stop Leaderboard Gaming

100-word summary

What happened

Why it matters

Sources