Fiifi Arkhurst Jr.

Sometimes you don’t get taken down by volume. You get taken down by rhythm. During the same real‑time game rollout, one behaviour surprised us: certain client devices weren’t just polling regularly, they were polling in sync.

And when thousands (or millions) of devices all call an API at almost the exact same millisecond, your backend suddenly experiences a perfectly aligned traffic spike. This is called lockstep polling.

What is lockstep?

Lockstep = when many clients execute the same request at the same time. Instead of:

100 requests / second
100 requests / second
100 requests / second

You get:

0
0
10,000 requests in one second

Even if your infra can handle the total load, the burst breaks you.

Why it happens

Most polling logic is written like:

wait_for_response()
sleep(30 seconds)
poll_again()

So if backend latency spikes temporarily, clients "sync up" accidentally, they all wait, then retry together.

Staggering (Jitter), the fix

A better pattern:

sleep(30 seconds + random jitter)
poll()

Example jitter:

30s ± 3s

60s ± 5s

Exponential backoff on error.

This ensures clients never align, protecting the backend.

Key Takeaway

In live-polling systems, the real risk isn’t latency, it’s synchronisation. When clients begin polling in lockstep, the backend experiences artificial traffic spikes that look like a coordinated attack instead of natural load. Healthy systems embrace timing chaos, not uniformity. Random jitter, staggered retries, and asynchronous behaviour keep requests distributed, preventing thundering-herd patterns.

As your user base grows, behaviour becomes infrastructure: architecture alone won’t save you if every client acts the same at the same time. Chaos scales, uniformity strains.