Distributed systems promise many things.
High availability. Horizontal scale. Event-driven architecture.
And sometimes:
Exactly-once delivery.
That promise is usually misunderstood.
“Exactly-once” can refer to multiple things:
These are not equivalent.
Delivery guarantees do not equal effect guarantees.
In distributed systems, failures are indistinguishable from delays.
If a consumer processes a message and crashes before acknowledging, what happened?
The broker cannot know.
Therefore it must retry.
Retry implies potential duplicate delivery.
This is fundamental. Not implementation-specific.
Many systems advertise exactly-once at a specific layer.
For example:
But these guarantees are scoped.
Kafka EOS ensures:
It does not guarantee:
Consider this flow:
1. Consume event
2. Update database
3. Write to Redis
4. Commit offset
If crash happens after step 3 but before step 4:
Even with Kafka EOS, your external side effects remain vulnerable.
Exactly-once inside Kafka. At-least-once outside.
Production systems do not rely on exactly-once.
They rely on idempotency.
Idempotency means:
Repeating the same operation does not change the outcome.
Example:
INSERT INTO payments (id, amount)
VALUES (:event_id, :amount)
ON CONFLICT (id) DO NOTHING;
Duplicate event? No duplicate payment.
This is correctness through design.
1. Use Idempotency Keys
Every external request should carry a unique identifier.
2. Make Side Effects Idempotent
3. Separate Processing from Effects
Store event processing result first. Apply side effects after durable state.
4. Accept At-Least-Once Reality
Distributed systems naturally converge to:
At-least-once delivery + idempotent processing.
Exactly-once delivery is a scoped optimization.
It is not a universal guarantee.
Real systems achieve correctness through:
This series was never about Redis.
It was about understanding where systems actually break.