Google Cuts AI Training Bandwidth by 99.6% Across Datacenters

Published: April 25, 2026 at 12:28 AM

Updated: April 25, 2026 at 12:28 AM

100-word summary

Google DeepMind's new Decoupled DiLoCo architecture trains frontier AI models across distant datacenters using regular internet connections instead of fiber-optic superhighways. Training a 12-billion-parameter model across four US regions dropped bandwidth needs from 198 Gbps to 0.84 Gbps. The secret: independent "island" clusters sync updates asynchronously, so a chip failure in Virginia doesn't halt work in Oregon. When researchers simulated chaos with high hardware failure rates, the system maintained 88% productivity while traditional methods collapsed to 27%. You can now train cutting-edge models using mismatched hardware generations in the same run, turning idle GPUs gathering dust into useful compute.

What happened

Why it matters

You can now train cutting-edge models using mismatched hardware generations in the same run, turning idle GPUs gathering dust into useful compute.

Sources

Google DeepMind Blog