Cerebras Now on AWS: 2,500 Tokens Per Second

March 16, 2026

Cerebras Now on AWS: 2,500 Tokens Per Second

Published: March 16, 2026 at 12:56 AM

Updated: March 16, 2026 at 12:56 AM

100-word summary

Cerebras just launched its wafer-scale AI chip on AWS Marketplace, promising inference speeds up to 70x faster than typical GPUs. The hardware spits out 2,500+ tokens per second and completes full reasoning tasks in under a second. The service runs open-source models like Llama and Qwen through an OpenAI-compatible API, so switching requires minimal code changes. AWS customers can now buy access on a pay-as-you-go basis directly through the marketplace. Translation: your AI chatbot could reply before users finish reading the last message. Cerebras plans to expand to Amazon Bedrock soon, signaling AWS is hedging its bets beyond Nvidia silicon.

What happened

Cerebras just launched its wafer-scale AI chip on AWS Marketplace, promising inference speeds up to 70x faster than typical GPUs. The hardware spits out 2,500+ tokens per second and completes full reasoning tasks in under a second. The service runs open-source models like Llama and Qwen through an OpenAI-compatible API, so switching requires minimal code changes. AWS customers can now buy access on a pay-as-you-go basis directly through the marketplace.

Why it matters

Translation: your AI chatbot could reply before users finish reading the last message. Cerebras plans to expand to Amazon Bedrock soon, signaling AWS is hedging its bets beyond Nvidia silicon.

Sources