Google Splits AI Chips for Training vs. Agents

Published: April 25, 2026 at 12:29 AM

Updated: April 25, 2026 at 12:29 AM

100-word summary

Google just released two specialized chips: TPU 8t trains models, TPU 8i runs AI agents and real-time inference. The split matters because most companies still use the same hardware for both, wasting money and time. A single TPU 8t cluster now packs 121 exaflops of compute across 9,600 chips. The inference chip delivers 80% better performance per dollar and triple the on-chip memory of its predecessor. Citadel Securities is already running agent workflows on the new stack. The takeaway? Training and serving AI are diverging into separate infrastructure categories, and trying to do both on general-purpose chips is becoming an expensive mistake.

What happened

Why it matters

Training and serving AI are diverging into separate infrastructure categories, and trying to do both on general-purpose chips is becoming an expensive mistake.

Sources

Google Cloud Google Cloud