Krux

March 15, 2026
AWS Adds Cerebras Chips to Hit 3,000 Tokens Per Second
Published: March 15, 2026 at 12:32 AM
Updated: March 15, 2026 at 12:32 AM
100-word summary
Amazon is bringing Cerebras wafer-scale chips into its data centers, giving Bedrock customers access to inference speeds up to 3,000 tokens per second. That's fast enough to generate a full page of text while you're still reading the previous paragraph. The setup pairs AWS Trainium chips for the initial processing with Cerebras hardware for token generation, squeezing five times more capacity into the same physical space. Customers can run models from Meta, OpenAI, and Amazon's own Nova family when the service rolls out in coming months. The real test: whether anyone actually needs AI responses that arrive faster than they can read them.
What happened
Amazon is bringing Cerebras wafer-scale chips into its data centers, giving Bedrock customers access to inference speeds up to 3,000 tokens per second. That's fast enough to generate a full page of text while you're still reading the previous paragraph. The setup pairs AWS Trainium chips for the initial processing with Cerebras hardware for token generation, squeezing five times more capacity into the same physical space. Customers can run models from Meta, OpenAI, and Amazon's own Nova family when the service rolls out in coming months.
Why it matters
The real test: whether anyone actually needs AI responses that arrive faster than they can read them.