Most engineers blame the database.
In production incidents, the database is often still healthy.
What fails first is the connection pool.
Incident happens. Error message:
Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool.
Immediate reaction:
But database CPU is 40%. Query latency is normal.
The failure is upstream.
A connection pool is a bounded resource.
Example:
Max pool size = 100 connections
If 100 requests simultaneously need a DB connection, the 101st request waits.
If wait exceeds timeout, request fails.
The database can still accept more connections. The application cannot.
Little's Law:
L = λ × W
Where:
Suppose:
L = 2000 × 0.05 = 100 inflight DB calls
Pool size = 100. System is stable.
Now latency increases to 100ms.
L = 2000 × 0.1 = 200 inflight
Required pool size doubled.
Traffic did not change.
Latency changed.
From previous article: tail latency increases non-linearly near capacity.
If P99 latency spikes to 500ms:
L = 2000 × 0.5 = 1000 inflight
With pool size 100, 900 requests wait.
Waiting increases latency further.
Latency → inflight growth → pool exhaustion → timeouts.
Database may still be under 60% CPU.
This is resource starvation at the application boundary.
1. Right-size pool using Little's Law
Estimate expected inflight under peak latency.
2. Limit concurrency before DB
Use semaphore or bounded channel.
var semaphore = new SemaphoreSlim(100);
await semaphore.WaitAsync();
try
{
await ExecuteDbCall();
}
finally
{
semaphore.Release();
}
3. Fail Fast
Reject requests instead of allowing infinite wait.
4. Monitor Inflight, Not Just CPU
CPU does not measure saturation. Queue length does.
Databases fail loudly. Connection pools fail silently.
Latency increases inflight. Inflight exhausts pools. Pools cause timeouts.
Most incidents blamed on databases are actually queueing problems.