AWS Adds Cerebras Chips to Hit 3,000 Tokens Per Second

Published: March 15, 2026 at 12:32 AM

Updated: March 15, 2026 at 12:32 AM

100-word summary

Amazon is bringing Cerebras wafer-scale chips into its data centers, giving Bedrock customers access to inference speeds up to 3,000 tokens per second. That's fast enough to generate a full page of text while you're still reading the previous paragraph. The setup pairs AWS Trainium chips for the initial processing with Cerebras hardware for token generation, squeezing five times more capacity into the same physical space. Customers can run models from Meta, OpenAI, and Amazon's own Nova family when the service rolls out in coming months. The real test: whether anyone actually needs AI responses that arrive faster than they can read them.

What happened

Why it matters

The real test: whether anyone actually needs AI responses that arrive faster than they can read them.

Sources

Cerebras Blog Cerebras CS-3 Announcement Cerebras Press Release