Retries Create Traffic Multipliers

Retries are designed to improve reliability. Under stress, they do the opposite.

When a system slows down, retry logic increases load. Increased load slows the system further. This feedback loop collapses capacity.

Table of Contents

1. The Intuition

A request fails. The client retries.

If the failure was transient, the retry succeeds.

But if the failure was caused by overload, the retry increases overload.

Reliability logic becomes a load amplifier.

Reference: AWS Builders Library – Timeouts, Retries, and Backoff with Jitter

2. Baseline Capacity Model

Assume:

Service capacity = 10,000 RPS
Incoming traffic = 8,000 RPS
Average latency = 120ms

System is stable.

Utilization = 80%.

In queueing theory terms (M/M/1 approximation):

ρ = λ / μ
ρ = 8000 / 10000 = 0.8

As utilization approaches 1, latency increases non-linearly.

3. The Retry Multiplier Effect

Now assume 5% of requests timeout due to latency spike.

8,000 × 5% = 400 retries per second

Effective traffic becomes:

8,000 + 400 = 8,400 RPS

Utilization:

ρ = 8400 / 10000 = 0.84

Latency increases. More requests exceed timeout.

Suppose timeout rate increases to 10%.

8,000 × 10% = 800 retries
Effective load = 8,800 RPS
ρ = 0.88

Positive feedback loop.

4. Tail Latency Amplification

Retries target slow requests. Slow requests are usually P95 or P99.

That means retries amplify tail latency pressure, not average latency.

Google SRE notes that tail latency dominates user experience.

Reference: Google SRE – Handling Overload

5. Retry Storm Scenario

Now add:

3 retry attempts
No exponential backoff
No jitter

Worst-case multiplier:

Original: 8,000 RPS
1st retry: 8,000
2nd retry: 8,000
3rd retry: 8,000

Total potential = 32,000 RPS

A system built for 10,000 RPS now receives 32,000.

Collapse is deterministic.

6. Modeling the Collapse

In an M/M/1 queue, average response time:

W = 1 / (μ - λ)

If:

μ = 10000
λ = 9500

W = 1 / (500) = small

If λ increases to 9900:

W = 1 / (100)

5× worse latency from a 4% load increase.

Retries accelerate this non-linear region.

7. Production Mitigations

1. Exponential Backoff

delay = base * 2^attempt

2. Add Jitter

delay = random(0, base * 2^attempt)

3. Circuit Breaker

Stop sending traffic when failure threshold is exceeded.

4. Retry Budgets

Limit total retries as percentage of baseline traffic.

Reference: Google SRE – Cascading Failures

8. Conclusion

Retries are not free.

Under normal conditions, they improve reliability. Under stress, they multiply traffic.

Systems do not collapse because of a single failure. They collapse because feedback loops amplify load.

Retry logic is a load amplifier. Design it like one.

Redis Production Series (4/8)

View full series →