April 25, 2026

Google Cuts AI Training Bandwidth by 99.6% Across Datacenters

Published: April 25, 2026 at 12:28 AM

Updated: April 25, 2026 at 12:28 AM

100-word summary

Google DeepMind's new Decoupled DiLoCo architecture trains frontier AI models across distant datacenters using regular internet connections instead of fiber-optic superhighways. Training a 12-billion-parameter model across four US regions dropped bandwidth needs from 198 Gbps to 0.84 Gbps. The secret: independent "island" clusters sync updates asynchronously, so a chip failure in Virginia doesn't halt work in Oregon. When researchers simulated chaos with high hardware failure rates, the system maintained 88% productivity while traditional methods collapsed to 27%. You can now train cutting-edge models using mismatched hardware generations in the same run, turning idle GPUs gathering dust into useful compute.

What happened

Google DeepMind's new Decoupled DiLoCo architecture trains frontier AI models across distant datacenters using regular internet connections instead of fiber-optic superhighways. Training a 12-billion-parameter model across four US regions dropped bandwidth needs from 198 Gbps to 0.84 Gbps. The secret: independent "island" clusters sync updates asynchronously, so a chip failure in Virginia doesn't halt work in Oregon. When researchers simulated chaos with high hardware failure rates, the system maintained 88% productivity while traditional methods collapsed to 27%.

Why it matters

You can now train cutting-edge models using mismatched hardware generations in the same run, turning idle GPUs gathering dust into useful compute.

Sources