Redis Caching in Node.js: The Patterns That Actually Hold Up in Production
Three-layer caching architecture for Node.js in 2026 — in-process LRU, Redis patterns, and CDN edge — with real invalidation strategies and stampede prevention.
Senior Developer

The benchmark is always impressive. Before Redis: 840 ms average response time. After Redis: 80 ms on cache hits. A 60% reduction in overall latency across the full traffic mix, and the support queue for "the platform is slow" goes to zero.
What the benchmark does not show is the three days spent debugging stale data, the afternoon untangling a cache stampede that took down the database during a traffic spike, and the cache key naming scheme that had to be refactored mid-sprint because it was not composable enough for prefix-based invalidation.
This guide covers both sides of the ledger. The patterns that get you the gains, and the operational reality that keeps those gains from becoming liabilities.
Why Three Layers, Not One
Caching in a Node.js backend is not a single decision — it is a three-layer architecture, with each layer optimized for a different access pattern:
Layer 1 — In-process LRU: Data lives in Node.js heap memory. No network round-trip. Sub-millisecond access. Per-process and non-shared — each server instance has its own copy. Use for: hot reference data (feature flags, config, lookup tables) that changes rarely and can tolerate brief per-instance staleness.
Layer 2 — Redis: Data lives in a shared, external in-memory store. ~1 ms network round-trip. Shared across all server instances. Use for: session data, user state, shared counters, computed results that must be consistent across your fleet.
Layer 3 — CDN edge cache: Data lives at geographically distributed edge nodes. Sub-10 ms for most users globally. Serves HTTP responses, not application data. Use for: public API responses, static assets, anything that can carry a Cache-Control header.
Most production applications need all three, applied deliberately to different data categories. The mistake is treating Redis as the answer to every caching question — some data is better served by a process-local LRU, and some data should never be cached at all.
Layer 1: In-Process LRU with lru-cache
The fastest cache is the one that never leaves your process. For data accessed thousands of times per minute that changes infrequently, an in-process LRU eliminates not just database round-trips but Redis round-trips too.
import { LRUCache } from 'lru-cache';
// Cache for feature flags — changes rarely, read on every request
const featureFlagCache = new LRUCache({
max: 500,
ttl: 1000 * 60, // 60-second TTL
updateAgeOnGet: false, // Don't reset TTL on read
fetchMethod: async (flagKey) => {
// Called automatically on cache miss — deduplicates concurrent misses
return await featureFlagService.getFlag(flagKey);
},
});
// Cache for database query results — medium-frequency reads
const queryCache = new LRUCache({
max: 500,
ttl: 1000 * 60 * 5, // 5-minute TTL
updateAgeOnGet: false,
allowStale: false,
});
// Usage
async function getFeatureFlag(key) {
// fetchMethod handles miss automatically — no manual miss handling needed
return await featureFlagCache.fetch(key);
}
async function getCachedQueryResult(queryKey, queryFn) {
const cached = queryCache.get(queryKey);
if (cached !== undefined) return cached;
const result = await queryFn();
queryCache.set(queryKey, result);
return result;
}The fetchMethod option on lru-cache is worth highlighting. When multiple concurrent requests hit a cache miss for the same key simultaneously, fetchMethod deduplicates them — only one call to the underlying data source is made, and all waiting promises resolve with the same result. This is request coalescing built into the cache.
What belongs in-process:
Feature flags and A/B test assignments
Application configuration
Static lookup tables (country codes, category lists)
JWT public keys / JWKS
Compiled regular expressions or validation schemas
What does not belong in-process:
Session data (users would lose sessions when a server restarts or when load balancer routes them to a different instance)
Anything that must be consistent across multiple server instances immediately after a write
Layer 2: Redis — Patterns That Work in Production
Pattern 1: Cache-Aside (Lazy Loading)
The most common and most appropriate pattern for application data. The cache is populated on demand: miss, fetch, store.
import { createClient } from 'redis';
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
async function getUser(userId) {
const cacheKey = `user:${userId}`;
// 1. Check cache
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// 2. Cache miss — fetch from database
const user = await db.query(
'SELECT id, name, email, role FROM users WHERE id = $1',
[userId]
);
if (!user) return null;
// 3. Store in cache with TTL
await redis.setEx(cacheKey, 300, JSON.stringify(user)); // 5-minute TTL
return user;
}
// Invalidation: call this from any write path that modifies a user
async function invalidateUser(userId) {
await redis.del(`user:${userId}`);
}The invalidation discipline is as important as the caching logic. Every write path in your application that modifies a user must call invalidateUser. If you add a new endpoint that updates user email without calling invalidation, users will see stale data until the TTL expires. Build invalidation alongside every write, not as an afterthought.
Pattern 2: Write-Through
Every write updates both the cache and the database. Reads are always served from cache. Suited for data that is written and read at similar frequency, where read-after-write consistency matters.
async function updateUserProfile(userId, updates) {
// 1. Write to database first
const updated = await db.query(
'UPDATE users SET name = $1, bio = $2, updated_at = NOW() WHERE id = $3 RETURNING *',
[updates.name, updates.bio, userId]
);
// 2. Immediately update cache
const cacheKey = `user:${userId}`;
await redis.setEx(cacheKey, 300, JSON.stringify(updated));
return updated;
}Write-through is more expensive per write but eliminates the window between a write and cache population where a read would generate a cache miss and hit the database. For user profile updates where the user immediately sees their own edits, write-through provides a better experience.
Pattern 3: Cache Keys as a System
Cache key design is the most under-discussed aspect of production caching. A key scheme that is not composable makes bulk invalidation impossible — you end up with either over-invalidation (deleting too much) or stale data.
// Structured key conventions
const CacheKeys = {
user: (id) => `user:${id}`,
userOrders: (userId) => `user:${userId}:orders`,
userOrderPage: (userId, page) => `user:${userId}:orders:page:${page}`,
product: (id) => `product:${id}`,
productsByCategory: (categoryId) => `products:category:${categoryId}`,
// Tag-based invalidation: all keys for a user
userPattern: (userId) => `user:${userId}:*`,
};
// Bulk invalidation using SCAN (never use KEYS in production — it's O(N) and blocks Redis)
async function invalidateAllUserData(userId) {
const pattern = CacheKeys.userPattern(userId);
let cursor = 0;
do {
const result = await redis.scan(cursor, {
MATCH: pattern,
COUNT: 100,
});
cursor = result.cursor;
if (result.keys.length > 0) {
await redis.del(result.keys);
}
} while (cursor !== 0);
}The SCAN command with COUNT is the correct way to search for keys by pattern. The KEYS command (redis.keys('user:*')) is O(N) across your entire key space — it blocks the Redis event loop for the entire duration, causing latency spikes across every client connected to that Redis instance during the scan. Never use KEYS in production.
Pattern 4: Preventing Cache Stampedes
The classic production failure: a heavily cached key expires. A thousand concurrent requests all see a miss simultaneously and all query the database at once. The database falls over.
This is the thundering herd problem, and it gets more dangerous as your traffic grows.
Solution A: Probabilistic Early Expiration (XFetch)
Recompute the cache value before it expires, with probability proportional to how close to expiry the value is. By the time the key actually expires, it has already been refreshed.
async function getCachedWithXFetch(key, fetchFn, ttl, beta = 1.0) {
const raw = await redis.get(key);
if (raw) {
const { value, expiry, delta } = JSON.parse(raw);
const now = Date.now() / 1000;
// Probabilistically recompute before expiry
// Higher beta = more aggressive early refresh
if (now - delta * beta * Math.log(Math.random()) < expiry) {
return value;
}
}
// Cache miss or early refresh triggered
const start = Date.now();
const value = await fetchFn();
const delta = (Date.now() - start) / 1000; // Time taken to compute, in seconds
const expiry = Date.now() / 1000 + ttl;
await redis.setEx(key, ttl, JSON.stringify({ value, expiry, delta }));
return value;
}Solution B: Mutex Lock on Cache Miss
When a cache miss occurs, acquire a distributed lock before fetching. Other requests wait for the lock holder to populate the cache rather than all racing to the database.
async function getCachedWithLock(key, fetchFn, ttl) {
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const lockKey = `lock:${key}`;
const lockAcquired = await redis.set(lockKey, '1', {
NX: true, // Only set if not exists
EX: 10, // Lock expires in 10 seconds
});
if (lockAcquired) {
try {
// This instance won the lock — fetch and populate
const value = await fetchFn();
await redis.setEx(key, ttl, JSON.stringify(value));
return value;
} finally {
await redis.del(lockKey);
}
} else {
// Another instance is populating — wait briefly and retry
await new Promise(resolve => setTimeout(resolve, 50));
const populated = await redis.get(key);
return populated ? JSON.parse(populated) : getCachedWithLock(key, fetchFn, ttl);
}
}Use probabilistic early expiration for frequently accessed keys where you can afford the occasional early refresh. Use mutex locks for expensive operations (external API calls, heavy aggregations) where parallel execution would be harmful.
Layer 3: CDN Edge Caching for API Responses
For public API endpoints that return the same response for many users, CDN caching eliminates origin load entirely. A well-configured CDN can absorb 90%+ of read traffic for public content.
// Express middleware that sets aggressive caching headers for public endpoints
function setCacheHeaders(maxAge, staleWhileRevalidate = 60) {
return (req, res, next) => {
// Only cache GET requests
if (req.method !== 'GET') return next();
res.set({
'Cache-Control': `public, max-age=${maxAge}, stale-while-revalidate=${staleWhileRevalidate}`,
'Vary': 'Accept-Encoding', // Separate cache for gzip vs non-gzip
});
next();
};
}
// Product listing: cache for 5 minutes, serve stale for 60 seconds while revalidating
app.get('/api/products',
setCacheHeaders(300, 60),
async (req, res) => {
const products = await productService.list(req.query);
res.json(products);
}
);
// User-specific data: never cache at CDN
app.get('/api/user/profile',
(req, res, next) => {
res.set('Cache-Control', 'private, no-cache');
next();
},
authenticate,
async (req, res) => {
const profile = await userService.getProfile(req.user.id);
res.json(profile);
}
);stale-while-revalidate is the most useful cache directive for API responses. It tells the CDN to serve a stale response immediately while fetching a fresh one in the background. From the user's perspective, the response is always fast. From your origin's perspective, requests arrive at a controlled, background rate rather than all at once when cache entries expire.
The critical Vary header: If you serve compressed and uncompressed responses from the same endpoint (which all production servers should), Vary: Accept-Encoding ensures the CDN maintains separate cache entries for each encoding. Without it, compressed responses get served to clients that cannot decompress them.
Eviction Policies: What Happens When Redis Gets Full
Redis operates entirely in memory. When maxmemory is reached and new keys need to be written, Redis must evict existing keys. The eviction policy determines which keys get removed.
# redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lruPolicy comparison:
Policy | Behavior | Use When |
|---|---|---|
| Refuse writes when full; return error | You cannot afford data loss — Redis as primary store |
| Evict least recently used keys from entire keyspace | General-purpose cache — this is the right default |
| Evict LRU keys that have a TTL set | Mix of cache and persistent data in same Redis instance |
| Evict least frequently used keys | Access patterns with highly skewed popularity |
| Evict keys closest to expiry | You want to control what survives memory pressure via TTL |
For a pure cache (all data has TTLs and can be regenerated from the database), allkeys-lru is almost always the right choice.
The unbounded key growth trap: Forgetting to set TTLs on cache keys is one of the most common Redis production mistakes. Without TTLs, your key count grows monotonically until Redis hits maxmemory and starts evicting keys according to your policy. With noeviction, writes start failing. The symptom appears suddenly at a threshold that can be hard to predict.
Always set TTLs. Always.
Observability: The Metrics That Catch Problems Early
A cache you cannot observe is a cache you cannot trust. These are the signals worth instrumenting from day one.
Cache hit rate is the single most important metric. Below 50% means your strategy is broken — your TTLs are too short, your key space is too fragmented, or you are caching data that changes too frequently.
// Redis INFO stats
const info = await redis.info('stats');
// Parse keyspace_hits and keyspace_misses
const hitRate = keyspaceHits / (keyspaceHits + keyspaceMisses);Memory usage and fragmentation:
const memoryInfo = await redis.info('memory');
// Watch: used_memory_rss vs used_memory
// High fragmentation ratio (>1.5) means Redis is holding onto
// more OS memory than it is actually usingLatency percentiles: Instrument your cache client to record operation latency. A Redis GET that takes 50 ms indicates network problems or a Redis instance under heavy load — both situations that degrade your application even when the data is cached.
Instrument hit/miss at the application layer:
class InstrumentedCache {
constructor(redisClient, metrics) {
this.redis = redisClient;
this.metrics = metrics;
}
async get(key, namespace = 'default') {
const value = await this.redis.get(key);
if (value) {
this.metrics.increment('cache.hit', { namespace });
} else {
this.metrics.increment('cache.miss', { namespace });
}
return value ? JSON.parse(value) : null;
}
}Track hit rate per cache namespace. A single low-performing key namespace degrades overall hit rate but is invisible in aggregate metrics.
The Redis Scaling Progression
Start simple. Add complexity only when metrics demand it.
Phase 1 — Single Node: Sufficient for most applications up to significant traffic. Add read replicas when reads become the bottleneck.
Phase 2 — Redis Sentinel: Automatic failover for high availability without horizontal sharding. Suitable up to approximately 50 GB of data. One primary, multiple replicas, Sentinel monitors and promotes a replica if the primary fails.
Phase 3 — Redis Cluster: Automatic horizontal sharding across 3–1,000 nodes with built-in replication. Handles petabyte-scale workloads. Adds operational complexity and some API constraints (multi-key operations across slots require care).
Most applications never need Phase 3. Most applications that think they need Phase 3 actually need Phase 2 combined with better application-level caching and query optimization.
The Anti-Patterns Worth Naming
Caching everything once you see the results. After the first successful caching implementation improves response times dramatically, the temptation is to cache every expensive operation. Resist it. Understand the data's update lifecycle — who writes it, how often, what triggers a change — before deciding to cache it. Caching data with complex invalidation requirements without a clear invalidation plan turns a performance win into a correctness bug.
Using KEYS in production. KEYS pattern is O(N) across the entire key space and blocks Redis for its entire duration. At 1 million keys with a 100 ms scan, every client connected to that Redis instance experiences 100 ms of additional latency. Use SCAN with a cursor.
Missing TTLs on any key. Set them. Every one.
A cache hit rate below 50%. This is not a caching problem — it is a sign that your keys are either not matching real request patterns (key construction is wrong), your TTL is shorter than your query interval (keys expire before they get read again), or you are caching data that changes faster than it is read (which means you should not be caching it).
What Good Caching Looks Like
Good caching is designed from the beginning, not bolted on after a performance incident. The key scheme is deliberate and composable. Invalidation is built into every write path at the same time as caching is added to the read path. TTLs are set based on how frequently the underlying data changes, not based on a default that someone copy-pasted. And the hit rate is monitored continuously — not checked once during load testing and then forgotten.
The gains are real. A well-implemented Redis caching layer reduces database load by 60–90%, dramatically improves response latency, and lets your database infrastructure scale much further before requiring sharding or read replicas.
The operational discipline to maintain those gains over time is what separates a caching implementation that helps from one that eventually causes a 3 AM incident.
Don't cache what you don't understand. Caching data with complex invalidation requirements without a clear invalidation strategy introduces subtle bugs that are hard to debug in production.
Comments (0)
Login to post a comment.