Three-Layer Cache

Cache rarely sits in only one place. It usually stacks across several layers. Closer to the client is faster; closer to the DB is more accurate.

1. L1 — In-memory cache

The layer aimed at the shortest response time. It can be an LRU map inside the same process, or a separate process such as Redis or Memcached.

Tool	Origin
Memcached	2003, Brad Fitzpatrick (LiveJournal)
Redis	2009, Salvatore Sanfilippo
Hazelcast / Infinispan	JVM distributed in-memory
Caffeine	JVM library (Ben Manes)

The hallmark is volatility. The default assumption is that data disappears on restart or node replacement (Redis's RDB and AOF can preserve some).

2. L2 — Persistent cache

A cache, but a place where loss is awkward. PostgreSQL cache tables, materialized views, or object storage like S3 sit here.

Representative patterns.

Load external API responses into a PostgreSQL table and manage expiry with a TTL column.
Freeze expensive aggregate query results into a materialized view and REFRESH periodically.
Place static assets (images, PDFs) in object storage and expose them through a CDN.

When L1 dies, L2 acts as fallback. The flow "L1 empty → query L2 → fill L1 with the result" (cache-aside) is common.

3. L3 — Framework cache

Cache provided by web frameworks at the route or fetch level.

Next.js unstable_cache and the next: { revalidate } option of fetch — cache data fetch results aligned to the route lifecycle.
Next.js Full Route Cache — keeps render results of static and dynamic routes (behavior varies by version and configuration).
HTTP cache headers — Cache-Control, ETag, Last-Modified interpreted by clients, intermediate proxies, and CDNs.
CDN cache — Cloudflare, Fastly, Akamai store responses at the edge.

The hallmark of this layer is that it works with almost no code changes. The flip side is debug difficulty — knowing which response sits in which cache and when it gets invalidated becomes scattered.

4. Cache-aside (lazy loading)

The most common pattern.

read:
  v = cache.get(k)
  if v is null:
    v = db.query(k)
    cache.set(k, v, ttl)
  return v

write:
  db.update(k, v)
  cache.delete(k)   # or set(k, v)

The advantage is simplicity, the limits are sized load on cache miss and consistency gaps.

5. Write-through, write-behind, refresh-ahead

Write-through updates both cache and DB on writes.

write:
  cache.set(k, v)
  db.update(k, v)

Write consistency improves, but a frequently updated key loads the cache too.

Write-behind writes to the cache only and an asynchronous worker pushes to the DB. It is used in high-throughput places, but data loss risk exists if the cache dies.

Refresh-ahead refreshes items in the background as TTL nears expiry. It cuts down cache misses on user requests. Caffeine on the JVM side supports it directly.

6. TTL and stale-while-revalidate

TTL is the promise of "how stale this value can be." Too short weakens cache value; too long delivers stale results. Decide based on the data's change cadence and user tolerance.

The stale-while-revalidate pattern returns an expired value and pulls the refresh in the background. HTTP standardizes it as Cache-Control: stale-while-revalidate=....

7. Cache stampede

When many requests share a frequently called key whose TTL all expires at once, they hit the origin together and load spikes. Mitigations come as one or two of the following bundled together.

Add a small random jitter to TTL.
Use a distributed lock (Redis SET NX) so only one request refreshes the origin.
Early refresh — start the refresh before TTL expires.
Request coalescing — merge concurrent requests for the same key into one.

8. Key naming conventions

<service>:<entity>:<id> — for example, users:profile:42.
Put a version in the prefix to invalidate everything at once: v3:users:profile:42.
Environment prefix: prod:, staging:.

Long keys carry memory overhead, which is worth considering on in-memory stores like Redis.

9. Serialization and monitoring

JSON — readable to humans. Serialization cost is high.
MessagePack, CBOR — binary, smaller.
Protobuf, Avro — schema-based, multilingual.

If we serialize and deserialize cache responses frequently, the format choice shows up in response time.

Monitoring items:

Hit rate — too low signals key design or TTL issues.
Memory usage, key count, eviction count.
Origin (DB, external API) call count — measures the cache impact.

10. Common pitfalls

Partial invalidation — caching one user's info in 10 places makes invalidation hard. Key conventions and tags are needed from the start.

Caching null — decide whether to cache "no result." If we do not, every request goes to the DB; if we do, newly created items may not appear.

Infinite TTL — persistent data lingers in cache and conflicts with new code. Explicit expiry or a version prefix is safer.

Read-after-write — the client cannot see its own change. Write-through or client-side read-your-writes handling is needed.

L3 build-time caches — Next.js's static cache is tied to build time or revalidate intervals. It can clash with a CMS's instant-publish requirement.

Closing thoughts

The more cache we add, the harder invalidation gets. Starting in places where "who creates this cache and who clears it" is clear is safer. Phil Karlton's joke ("There are only two hard things in Computer Science: cache invalidation and naming things") gets quoted for a reason.

redis-roles
data-pipeline

References: HTTP Caching (MDN), RFC 5861 — stale-while-revalidate, Redis caching patterns, Next.js Caching, Caffeine GitHub.