Skip to main content

Command Palette

Search for a command to run...

Cache stampede

A form of Thundering herd problem

Updated
8 min read

Modern distributed systems rely heavily on caching for performance. But caching introduces its own class of failure patterns. One of the most important and commonly discussed in system design is the cache stampede, also known as a form of the thundering herd problem.

I will try to break it down clearly: why it happens, how it becomes dangerous, and how to prevent it.

Why Do We Need Cache in the First Place?

Databases maintains your state i.e. they hold persistent data and enforce consistency. They are designed for correctness and durability, not for handling massive concurrent read bursts at low latency.

Scaling databases horizontally is difficult because:

  • State must be replicated.

  • Reads and writes must behave consistently across replicas.

  • Coordination overhead increases.

This stateful nature of databases is one of the core reasons why we create our own servers where we keep our business logic that interacts with the data and changes the state in database. These application servers need to be stateless as they need to scale out and handle traffic surges and spikes. Stateless services scale easily because they do not carry persistent state; they only process requests and interact with the database.

However, every time an application server talks to a database:

  • A connection is established (authentication, handshake, allocation).

  • Locking and isolation rules apply as the database needs to maintain ACID properties.

  • Connection pools are limited and each connection consumes memory and CPU on DB server.

Repeatedly querying the database for frequently requested data is expensive. This is where caching comes in.

What is Cache?

A cache stores data as key-value pairs. Each key typically has a TTL (Time-To-Live). After the TTL expires, the key is deleted or considered invalid, and the next request results in a cache miss.

In steady state:

  • Client sends request.

  • Application checks cache.

  • Cache hit → fast response.

  • Database remains protected.

This works well until expiration becomes synchronized.

What is cache stampede?

Consider a popular key with high traffic. If that key expires and thousands of requests arrive at nearly the same time:

  • All requests miss the cache.

  • All attempt to recompute the value.

  • All hit the database simultaneously.

  • Database load spikes.

  • Latency increases.

  • Timeouts and failures may follow.

This is called cache stampede. More generally, this is an instance of the thundering herd problem. Multiple processes wake up and compete for the same resource at the same time.

Key Insights : Cache stampede is not about high traffic but rather synchronized failures/contention.

Types of cache stampede

Stampeded does not occur in only one way. There may be different trigger reasons.

1. Cold start stampede:

Occurs when:

  • Service deployment

  • Cache flush

  • Redis restart

  • New node joins cluster

When the cache is empty, every incoming request becomes a miss. The system immediately fails back to the database under full traffic load. There is a sudden surge instead of gradual increase.

2. Expiry stampede:

A popular key reaches TTL expiry and at the same time high traffic hits system. All request miss together and attempt re-computation simultaneously. This is the classic hot-key expiration problem.

Node restart stampede:

If application nodes use local in-memory cache and one node restarts:

  • Its cache becomes empty.

  • That node suddenly generates heavy database traffic.

  • This may affect the entire system indirectly.

Even if the distributed cache is healthy, local cache resets can create localized herd behavior.

Why cache stampede is dangerous?

The database overload is not the real danger. We can handle overloads by scaling up the server. The root problem is failure amplification.

The sequence typically looks like:

  • Many cache misses occur.

  • Database load increases sharply.

  • Database latency increases.

  • Requests take longer to complete.

  • More concurrent requests accumulate.

  • Connection pools become exhausted.

  • Timeouts begin.

  • Retries increase load further.

What simply started as synchronized expiration quickly escalated into cascading system failures. This is why it is a critical production risk.

We need to understand that user activity is probabilistic and non-deterministic we never know when request surge will come in. For scalable system TTL is of immense importance and it needs to decided strategically for keys.

Strategies for avoiding cache stampede (thundering herd)

1. Jitter solution

Instead of expiring all keys at the same time, you add a random offset called jitter to each key's expiration time. This spreads expiration events and avoids synchronized loads.

Base TTL = 600 seconds
Actual TTL = 600 + random(0–120 seconds)

Since jitters are random, keys will not expire at the same fixed interval everyday (for example, not always at 8 AM).

This is very simple to implement and smooths out traffic spikes at a larger scale. However, jitter helps in tackling the problem but does not eliminate it completely. It is usually combined with other techniques.

Jitter mainly protects against mass/synchronized expiry across multiple keys but does not protect against a single hot key expiring under heavy load.

2. Stale-while-revalidate (SWR)

This solution serves the same stale value to the clients after TTL expires. Meanwhile one request in the background refreshes the cache with fresh data. Once revalidated subsequent requests get the fresh data.

After TTL expiry:

  • Continue serving stale data.

  • Trigger a background refresh.

  • Update cache asynchronously.

Users never wait for re-computation. Availability and latency remain stable. So instead of synchronized failures you get continuous responses + controlled refresh. This is the prime example of prioritizing availability over consistence. It is widely used in read-heavy systems and CDN architectures. SWR along with jitter makes a good and efficient solution for cache stampede.

3.Probabilistic Early Computation (PEC)

Instead of waiting for the cache to expire, we refresh it with a certain probability before expiry. The refresh load spreads across time rather than concentrating at the exact expiration moment.

When a request arrives and TTL is near expiry:

  • Decide probabilistically whether to refresh.

This may look same as jitter since both aim to desynchronize the cache refresh but the intent is different. Jitter randomizes expiration time when setting the cache key. Probabilistic early refresh triggers refresh before expiry to prevent hammering the database at expiry time. Instead of one unlucky request doing all the heavy lifting at expiration, multiple requests share the burden over time. However, the probabilistic function needs to be chosen carefully. Poor tuning can lead to redundant refreshes and wasted CPU/memory.

4. Mutex (Cache Locking)

A mutex is a resource locking mechanism used for synchronization between threads or processes sharing a common resource.

We can use this mechanism to create a lock over database requests from the server layer, meaning only one request can recompute the value at a time.

When a key expires:

  • The first request acquires a lock.

  • Only that request queries the database.

  • Others wait to acquire the lock.

  • Once cache is updated, lock is released waiting requests check the cache again.

This ensures only one re-computation happens. Only a single request is made to the database even if thousands of concurrent requests hit the server. Mutex-based refresh centralizes responsibility: only one actor rebuilds the cache while others wait or fallback.

It’s a classic concurrency control pattern applied to caching. However, distributed locking introduces complexity:

  • Lock timeouts

  • Deadlocks

  • Reliability of lock storage

5.Request coalescing

Mutex and request coalescing are related but not exactly the same.

Request coalescing ensures that only one request recomputes a missing or expired cache entry, while others wait and reuse the result once it is ready. Multiple requests are coalesced (merged) into one. It is about shared computation rather than locking.

Instead of explicit locking:

  • The first request triggers computation.

  • Other requests attach/subscribe to the same in-flight operation.

  • All share the result once available.

This avoids duplicate work without blocking threads via traditional locks.

However, you need to track in-flight requests and failures cases properly. If the first request hangs, others may be stuck. It requires careful concurrency handling, especially in distributed systems.

6.Cache warming

A proactive approach:

  • Preload frequently accessed keys.

  • Warm the cache during deployment.

  • Use scheduled background refresh jobs.

This prevents cold-start stampede and reduces startup latency spikes. It adds operational overhead but improves stability. This ensure the cache is hot and ready to serve requests immediately. You can write scripts or background jobs that periodically refresh popular entries and warm them pre-emptively

Final perspective

Thundering herd problem can occur when multiple process, threads or client are all waiting for the same resource or event and when it becomes available, they all wake up and try to access it simultaneously causing contention, overload and wasted work. In caching systems, this manifests as cache stampede.

It is not primarily a traffic problem. It is a synchronization problem.

Mitigation techniques aim to:

  • Desynchronize expiration,

  • Ensure only one re-computation occurs,

  • Maintain availability,

  • Protect backend systems from cascading failure.