Krux

March 5, 2026
Alibaba Ships 0.8B Model That Runs Locally With 262K Context
Published: March 5, 2026 at 12:42 AM
Updated: March 5, 2026 at 12:42 AM
100-word summary
Alibaba just released four tiny AI models (0.8B to 9B parameters) built to run on phones and edge devices, not cloud servers. The smallest one handles 262,000-token context windows while sipping power. That's enough to process a short novel locally without sending your data anywhere. The models handle text, images, and video, and can navigate user interfaces without constant cloud calls. Alibaba used sparse attention tricks to shrink compute costs while keeping performance competitive with bigger models on certain benchmarks. All four are freely available on HuggingFace. The bet: companies will pay to avoid latency and cloud bills, even if it means slightly weaker reasoning.
What happened
Alibaba just released four tiny AI models (0.8B to 9B parameters) built to run on phones and edge devices, not cloud servers. The smallest one handles 262,000-token context windows while sipping power. That's enough to process a short novel locally without sending your data anywhere. The models handle text, images, and video, and can navigate user interfaces without constant cloud calls. Alibaba used sparse attention tricks to shrink compute costs while keeping performance competitive with bigger models on certain benchmarks. All four are freely available on HuggingFace.
Why it matters
The bet: companies will pay to avoid latency and cloud bills, even if it means slightly weaker reasoning.