Production Lesson #03 — When a Read Query Blocked an Entire Write-Heavy System

Sometimes, the system isn’t slow. It’s waiting.

Table of Contents

Context

Our system is write-heavy. On average, more than 20,000 new records are inserted into the main table every single day.

The read side supports dynamic filtering. Users can filter by multiple fields — status, category, owner, flags, and more. Different combinations are possible.

That flexibility came with a trade-off:

Composite indexes couldn’t realistically cover every filter combination. And adding too many indexes would significantly hurt write performance.

In a write-heavy system, every additional index is a tax on inserts.

The Business Request

One day, business requested a new feature:

“List all data — without any date range filter.”

This meant:

In development, it worked fine. In production, it didn’t.

What Happened in Production

Once deployed:

Write requests started queuing. Some were blocked for more than 15 seconds. Monitoring alerts flooded in.

The application was healthy. The database was not.

Why AsNoTracking Didn’t Help

On the read side, we already used AsNoTracking().

That reduces EF Core change tracking overhead. But it does NOT change database isolation behavior.

Under default READ COMMITTED isolation:

A long-running full-table read holds shared locks long enough to starve write operations.

In a write-heavy system, that’s catastrophic.

The Indexing Constraint

Could we just add more indexes?

Not realistically.

Because:

In write-heavy systems, indexing strategy is not just about query speed — it’s about protecting write throughput.

The Tactical Fix: READ UNCOMMITTED

Given:

We applied READ UNCOMMITTED to this specific query.

Effect:

The blocking stopped immediately. Write latency returned to normal. Alerts disappeared.

Consistency was slightly relaxed. System stability was restored.

Why Snapshot Isolation Might Be the Strategic Fix

READ UNCOMMITTED solved the immediate problem. But it allows dirty reads.

A more robust alternative is Snapshot Isolation.

Instead of locking reads, it uses row versioning.

However, nothing is free.

Snapshot Isolation introduces:

In write-heavy environments, row versioning cost must be carefully monitored.

READ UNCOMMITTED is a tactical fix. Snapshot Isolation is a strategic design choice.

Concurrency control is never about right or wrong. It is about choosing which cost you are willing to pay.

Why Not Just Use a Search Engine?

A common question:

“If full-table reads are dangerous, why not move this to a search engine?”

That’s a valid architectural option. But context matters.

1. Data Freshness

Introducing a search engine means:

You trade lock contention for replication delay.

2. Operational Overhead

A search engine is an operational system:

For one listing feature, that may be architectural escalation.

3. Write Amplification

In a write-heavy system (20k+ inserts/day), every write would now:

That increases system complexity and write amplification.

4. Dynamic Filtering Still Exists

Even in a search engine, dynamic filters still create many query shapes.

You move the scalability challenge — you don’t eliminate it.

Architectural escalation should follow bottleneck patterns — not feature pressure.

The Real Lesson

In write-heavy systems:

In high-write systems, isolation level is not a database setting. It’s a business decision.

And sometimes, the most dangerous query is the one that “just reads everything.”