Upgrading Your AI Inference? Fix the Plumbing First

May 8, 2026

Upgrading Your AI Inference? Fix the Plumbing First

Published: May 8, 2026 at 12:20 AM

Updated: May 8, 2026 at 12:20 AM

100-word summary

ServiceNow engineers just published a cautionary tale: upgrading vLLM from version 0.8.5 to 0.18.1 broke their reinforcement learning pipeline until they fixed four hidden mismatches. The culprits were subtle (how probabilities get calculated, whether the system caches prefixes, floating-point precision in the final layer) but deadly to model training. Their reward curves diverged completely until every backend detail matched the old setup. The lesson cuts against the usual move-fast instinct: when swapping inference engines mid-training, achieve exact numerical parity before touching anything else. They tracked four metrics (clip rate, KL divergence, entropy, reward) to confirm alignment. Only after those four converged did they dare tweak the learning algorithm itself.

What happened

ServiceNow engineers just published a cautionary tale: upgrading vLLM from version 0.8.5 to 0.18.1 broke their reinforcement learning pipeline until they fixed four hidden mismatches. The culprits were subtle (how probabilities get calculated, whether the system caches prefixes, floating-point precision in the final layer) but deadly to model training. Their reward curves diverged completely until every backend detail matched the old setup. The lesson cuts against the usual move-fast instinct: when swapping inference engines mid-training, achieve exact numerical parity before touching anything else. They tracked four metrics (clip rate, KL divergence, entropy, reward) to confirm alignment.

Why it matters

Only after those four converged did they dare tweak the learning algorithm itself.

Sources