We had a feature that allowed administrators to target a specific group of users for a time-based campaign. The selected user IDs were configured from an internal admin portal.
The requirement was simple:
In development, everything worked perfectly.
All targeted user IDs were stored inside a single Redis key as a serialized JSON array.
Key: campaign:2026-02-12
Value: [1, 5, 20, 500, 999, ...]
On every login request:
In staging, the list usually contained 20–100 users. Latency was negligible. No one questioned the design.
One day, an administrator added more than 30,000 user IDs into the campaign group.
Unfortunately, this happened during peak traffic hours.
Within minutes:
At first glance, it looked like a memory leak. It wasn’t.
GET campaign:2026-02-12
Deserialize 30,000 IDs
Linear search
This is an O(n) operation executed on every login.
Under high concurrency, we were repeatedly:
The cost wasn’t obvious at small scale. At peak traffic, it became catastrophic.
All users were hitting the exact same Redis key simultaneously.
This created:
We unintentionally turned Redis into a bottleneck.
Redis is a data structure server. We ignored that.
We treated Redis like:
Give me everything, I will filter locally.
Instead of:
Ask Redis the exact question you need answered.
It was not a true leak.
It was the combination of:
Under peak load, this pattern amplified resource consumption dramatically.
Key: campaign:2026-02-12
Type: SET
Members: userId
On login:
SISMEMBER campaign:2026-02-12 userId
Benefits:
Key: user:{userId}:campaign
Value: 1
TTL: 24h
On login:
EXISTS key
This removes the hot key issue entirely.
| Design | Time Complexity | Network Payload | Concurrency Risk |
|---|---|---|---|
| JSON Array | O(n) | High | High |
| Redis Set | O(1) | Minimal | Low |
| Per-user Key | O(1) | Minimal | Very Low |
Redis was never the problem.
The problem was asking the wrong question to the system.
The difference between:
"Give me the whole list."
and
"Does this member exist?"
is the difference between surviving peak traffic and crashing in production.
Fast systems are not built by faster code. They are built by correct data modeling.