Krux

May 24, 2026
3B Specialized Model Beats Frontier APIs on OCR
Published: May 24, 2026 at 12:12 AM
Updated: May 24, 2026 at 12:12 AM
100-word summary
Hugging Face and Dharma-AI just released DharmaOCR, a 3-billion-parameter model that outperformed every major frontier API on Brazilian Portuguese document scanning. The specialized model hit 0.911 quality versus 0.833 for the best general-purpose competitor while costing substantially less to run. The insight: training a small model on the exact type of data it will see in production beats throwing parameters at the problem. DharmaOCR was tuned specifically for enterprise documents, not the entire internet. The catch? This is one benchmark in one narrow domain. But if the pattern holds, companies may stop reflexively buying the biggest model on the menu and start asking what their AI actually needs to read.
What happened
Hugging Face and Dharma-AI just released DharmaOCR, a 3-billion-parameter model that outperformed every major frontier API on Brazilian Portuguese document scanning. The specialized model hit 0.911 quality versus 0.833 for the best general-purpose competitor while costing substantially less to run. The insight: training a small model on the exact type of data it will see in production beats throwing parameters at the problem. DharmaOCR was tuned specifically for enterprise documents, not the entire internet.
Why it matters
The catch? This is one benchmark in one narrow domain. But if the pattern holds, companies may stop reflexively buying the biggest model on the menu and start asking what their AI actually needs to read.