CPU is 55%.
Memory looks fine.
But latency is climbing.
Thread count is increasing.
Everything “looks” like the system is adapting.
It is not.
Modern runtimes auto-scale worker threads.
When tasks block, new threads are injected.
This feels like elasticity.
It is actually delay compensation.
ThreadPool scaling does not remove bottlenecks. It masks them.
In .NET, the ThreadPool uses a hill-climbing algorithm.
It increases worker threads gradually based on throughput measurement.
It samples:
It does not instantly add 200 threads.
It probes.
Under sudden load, scaling lags behind demand.
ThreadPool does not break Little’s Law.
L = λ × W
If arrival rate increases or work duration increases, inflight grows.
Adding threads reduces wait time only if CPU is the bottleneck.
If threads are blocking on:
Adding threads increases contention.
Scenario:
More requests block on DB.
ThreadPool injects new threads.
New threads:
Latency increases further.
ThreadPool reacts again.
This is a positive feedback loop.
Eventually:
CPU still below 70%.
Stack Memory
Each thread consumes stack memory.
More threads → more memory fragmentation.
Context Switching
OS scheduler overhead grows non-linearly with thread count.
Garbage Collection
More inflight tasks → more allocations.
More allocations → higher GC frequency.
Higher GC → increased pause time.
Increased pause → higher latency.
Feedback loop.
1. Measure Queue Length
Monitor:
2. Limit Concurrency Explicitly
Use bounded concurrency at boundaries.
Do not rely on implicit ThreadPool behavior.
3. Prefer Async I/O
Avoid blocking threads on network calls.
4. Apply Backpressure
Reject requests when saturation approaches.
ThreadPool scaling is reactive.
It hides saturation temporarily.
It cannot fix external bottlenecks.
If your system only survives because ThreadPool keeps adding threads, you are already overloaded.