Krux

May 8, 2026
Hugging Face Hides Test Data to Stop Leaderboard Gaming
Published: May 8, 2026 at 12:20 AM
Updated: May 8, 2026 at 12:20 AM
100-word summary
Hugging Face added a private-data track to its speech recognition leaderboard to catch models overfitted to public benchmarks. The new setup tests models on hidden datasets from Appen and DataoceanAI, covering scripted and conversational speech across five English accents. Rankings now shift when you toggle private data on, revealing which models actually work versus which just memorized the test set. The leaderboard has logged over 700,000 visits since 2023, making it a high-stakes target for gaming. Default scores still use public data, but the private track exposes gaps between US and international accents that polished benchmark numbers miss. Turns out your "state-of-the-art" model might bomb on actual phone calls.
What happened
Hugging Face added a private-data track to its speech recognition leaderboard to catch models overfitted to public benchmarks. The new setup tests models on hidden datasets from Appen and DataoceanAI, covering scripted and conversational speech across five English accents. Rankings now shift when you toggle private data on, revealing which models actually work versus which just memorized the test set. The leaderboard has logged over 700,000 visits since 2023, making it a high-stakes target for gaming. Default scores still use public data, but the private track exposes gaps between US and international accents that polished benchmark numbers miss.
Why it matters
Turns out your "state-of-the-art" model might bomb on actual phone calls.