Alibaba Ships 0.8B Model That Runs Locally With 262K Context

March 5, 2026

Alibaba Ships 0.8B Model That Runs Locally With 262K Context

Published: March 5, 2026 at 12:42 AM

Updated: March 5, 2026 at 12:42 AM

100-word summary

Alibaba just released four tiny AI models (0.8B to 9B parameters) built to run on phones and edge devices, not cloud servers. The smallest one handles 262,000-token context windows while sipping power. That's enough to process a short novel locally without sending your data anywhere. The models handle text, images, and video, and can navigate user interfaces without constant cloud calls. Alibaba used sparse attention tricks to shrink compute costs while keeping performance competitive with bigger models on certain benchmarks. All four are freely available on HuggingFace. The bet: companies will pay to avoid latency and cloud bills, even if it means slightly weaker reasoning.

What happened

Alibaba just released four tiny AI models (0.8B to 9B parameters) built to run on phones and edge devices, not cloud servers. The smallest one handles 262,000-token context windows while sipping power. That's enough to process a short novel locally without sending your data anywhere. The models handle text, images, and video, and can navigate user interfaces without constant cloud calls. Alibaba used sparse attention tricks to shrink compute costs while keeping performance competitive with bigger models on certain benchmarks. All four are freely available on HuggingFace.

Why it matters

The bet: companies will pay to avoid latency and cloud bills, even if it means slightly weaker reasoning.

Sources