Alibaba Ships 0.8B Model That Runs Locally With 262K Context

Published: March 5, 2026 at 12:42 AM

Updated: March 5, 2026 at 12:42 AM

100-word summary

Alibaba just released four tiny AI models (0.8B to 9B parameters) built to run on phones and edge devices, not cloud servers. The smallest one handles 262,000-token context windows while sipping power. That's enough to process a short novel locally without sending your data anywhere. The models handle text, images, and video, and can navigate user interfaces without constant cloud calls. Alibaba used sparse attention tricks to shrink compute costs while keeping performance competitive with bigger models on certain benchmarks. All four are freely available on HuggingFace. The bet: companies will pay to avoid latency and cloud bills, even if it means slightly weaker reasoning.

What happened

Why it matters

The bet: companies will pay to avoid latency and cloud bills, even if it means slightly weaker reasoning.

Sources

MarkTechPost Alibaba Group South China Morning Post