ZyVOP Logo
Content That Connects
SeriesCategoriesTags
ZyVOP Logo
Content That Connects

Empowering developers and creators with cutting-edge insights, comprehensive tutorials, and innovative solutions for the digital future.

Content

  • Tags
  • Write Article
  • Newsletter

Company

  • About Us
  • Contact

Connect

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • DMCA Policy
  • Code of Conduct

© 2026 ZyVOP. Crafted with care for the developer community.

Made with ❤️ by the ZyVOP team
All systems operational
HomeRedis Caching in Node.js: The Patterns That Actually Hold Up in Production
👍1

Redis Caching in Node.js: The Patterns That Actually Hold Up in Production

Three-layer caching architecture for Node.js in 2026 — in-process LRU, Redis patterns, and CDN edge — with real invalidation strategies and stampede prevention.

#Redis#Caching#Node.js#Backend Engineering#performance#app#production#JavaScript#System Design
Z
ZyVOP

Senior Developer

June 8, 2026
12 min read
9 views
Redis Caching in Node.js: The Patterns That Actually Hold Up in Production

The benchmark is always impressive. Before Redis: 840 ms average response time. After Redis: 80 ms on cache hits. A 60% reduction in overall latency across the full traffic mix, and the support queue for "the platform is slow" goes to zero.

What the benchmark does not show is the three days spent debugging stale data, the afternoon untangling a cache stampede that took down the database during a traffic spike, and the cache key naming scheme that had to be refactored mid-sprint because it was not composable enough for prefix-based invalidation.

This guide covers both sides of the ledger. The patterns that get you the gains, and the operational reality that keeps those gains from becoming liabilities.


Why Three Layers, Not One

Caching in a Node.js backend is not a single decision — it is a three-layer architecture, with each layer optimized for a different access pattern:

Layer 1 — In-process LRU: Data lives in Node.js heap memory. No network round-trip. Sub-millisecond access. Per-process and non-shared — each server instance has its own copy. Use for: hot reference data (feature flags, config, lookup tables) that changes rarely and can tolerate brief per-instance staleness.

Layer 2 — Redis: Data lives in a shared, external in-memory store. ~1 ms network round-trip. Shared across all server instances. Use for: session data, user state, shared counters, computed results that must be consistent across your fleet.

Layer 3 — CDN edge cache: Data lives at geographically distributed edge nodes. Sub-10 ms for most users globally. Serves HTTP responses, not application data. Use for: public API responses, static assets, anything that can carry a Cache-Control header.

Most production applications need all three, applied deliberately to different data categories. The mistake is treating Redis as the answer to every caching question — some data is better served by a process-local LRU, and some data should never be cached at all.


Layer 1: In-Process LRU with lru-cache

The fastest cache is the one that never leaves your process. For data accessed thousands of times per minute that changes infrequently, an in-process LRU eliminates not just database round-trips but Redis round-trips too.

import { LRUCache } from 'lru-cache';

// Cache for feature flags — changes rarely, read on every request
const featureFlagCache = new LRUCache({
  max: 500,
  ttl: 1000 * 60,          // 60-second TTL
  updateAgeOnGet: false,    // Don't reset TTL on read
  fetchMethod: async (flagKey) => {
    // Called automatically on cache miss — deduplicates concurrent misses
    return await featureFlagService.getFlag(flagKey);
  },
});

// Cache for database query results — medium-frequency reads
const queryCache = new LRUCache({
  max: 500,
  ttl: 1000 * 60 * 5,      // 5-minute TTL
  updateAgeOnGet: false,
  allowStale: false,
});

// Usage
async function getFeatureFlag(key) {
  // fetchMethod handles miss automatically — no manual miss handling needed
  return await featureFlagCache.fetch(key);
}

async function getCachedQueryResult(queryKey, queryFn) {
  const cached = queryCache.get(queryKey);
  if (cached !== undefined) return cached;

  const result = await queryFn();
  queryCache.set(queryKey, result);
  return result;
}

The fetchMethod option on lru-cache is worth highlighting. When multiple concurrent requests hit a cache miss for the same key simultaneously, fetchMethod deduplicates them — only one call to the underlying data source is made, and all waiting promises resolve with the same result. This is request coalescing built into the cache.

What belongs in-process:

  • Feature flags and A/B test assignments

  • Application configuration

  • Static lookup tables (country codes, category lists)

  • JWT public keys / JWKS

  • Compiled regular expressions or validation schemas

What does not belong in-process:

  • Session data (users would lose sessions when a server restarts or when load balancer routes them to a different instance)

  • Anything that must be consistent across multiple server instances immediately after a write


Layer 2: Redis — Patterns That Work in Production

Pattern 1: Cache-Aside (Lazy Loading)

The most common and most appropriate pattern for application data. The cache is populated on demand: miss, fetch, store.

import { createClient } from 'redis';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

async function getUser(userId) {
  const cacheKey = `user:${userId}`;

  // 1. Check cache
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }

  // 2. Cache miss — fetch from database
  const user = await db.query(
    'SELECT id, name, email, role FROM users WHERE id = $1',
    [userId]
  );

  if (!user) return null;

  // 3. Store in cache with TTL
  await redis.setEx(cacheKey, 300, JSON.stringify(user)); // 5-minute TTL

  return user;
}

// Invalidation: call this from any write path that modifies a user
async function invalidateUser(userId) {
  await redis.del(`user:${userId}`);
}

The invalidation discipline is as important as the caching logic. Every write path in your application that modifies a user must call invalidateUser. If you add a new endpoint that updates user email without calling invalidation, users will see stale data until the TTL expires. Build invalidation alongside every write, not as an afterthought.

Pattern 2: Write-Through

Every write updates both the cache and the database. Reads are always served from cache. Suited for data that is written and read at similar frequency, where read-after-write consistency matters.

async function updateUserProfile(userId, updates) {
  // 1. Write to database first
  const updated = await db.query(
    'UPDATE users SET name = $1, bio = $2, updated_at = NOW() WHERE id = $3 RETURNING *',
    [updates.name, updates.bio, userId]
  );

  // 2. Immediately update cache
  const cacheKey = `user:${userId}`;
  await redis.setEx(cacheKey, 300, JSON.stringify(updated));

  return updated;
}

Write-through is more expensive per write but eliminates the window between a write and cache population where a read would generate a cache miss and hit the database. For user profile updates where the user immediately sees their own edits, write-through provides a better experience.

Pattern 3: Cache Keys as a System

Cache key design is the most under-discussed aspect of production caching. A key scheme that is not composable makes bulk invalidation impossible — you end up with either over-invalidation (deleting too much) or stale data.

// Structured key conventions
const CacheKeys = {
  user: (id) => `user:${id}`,
  userOrders: (userId) => `user:${userId}:orders`,
  userOrderPage: (userId, page) => `user:${userId}:orders:page:${page}`,

  product: (id) => `product:${id}`,
  productsByCategory: (categoryId) => `products:category:${categoryId}`,

  // Tag-based invalidation: all keys for a user
  userPattern: (userId) => `user:${userId}:*`,
};

// Bulk invalidation using SCAN (never use KEYS in production — it's O(N) and blocks Redis)
async function invalidateAllUserData(userId) {
  const pattern = CacheKeys.userPattern(userId);
  let cursor = 0;

  do {
    const result = await redis.scan(cursor, {
      MATCH: pattern,
      COUNT: 100,
    });

    cursor = result.cursor;

    if (result.keys.length > 0) {
      await redis.del(result.keys);
    }
  } while (cursor !== 0);
}

The SCAN command with COUNT is the correct way to search for keys by pattern. The KEYS command (redis.keys('user:*')) is O(N) across your entire key space — it blocks the Redis event loop for the entire duration, causing latency spikes across every client connected to that Redis instance during the scan. Never use KEYS in production.

Pattern 4: Preventing Cache Stampedes

The classic production failure: a heavily cached key expires. A thousand concurrent requests all see a miss simultaneously and all query the database at once. The database falls over.

This is the thundering herd problem, and it gets more dangerous as your traffic grows.

Solution A: Probabilistic Early Expiration (XFetch)

Recompute the cache value before it expires, with probability proportional to how close to expiry the value is. By the time the key actually expires, it has already been refreshed.

async function getCachedWithXFetch(key, fetchFn, ttl, beta = 1.0) {
  const raw = await redis.get(key);

  if (raw) {
    const { value, expiry, delta } = JSON.parse(raw);
    const now = Date.now() / 1000;

    // Probabilistically recompute before expiry
    // Higher beta = more aggressive early refresh
    if (now - delta * beta * Math.log(Math.random()) < expiry) {
      return value;
    }
  }

  // Cache miss or early refresh triggered
  const start = Date.now();
  const value = await fetchFn();
  const delta = (Date.now() - start) / 1000; // Time taken to compute, in seconds
  const expiry = Date.now() / 1000 + ttl;

  await redis.setEx(key, ttl, JSON.stringify({ value, expiry, delta }));

  return value;
}

Solution B: Mutex Lock on Cache Miss

When a cache miss occurs, acquire a distributed lock before fetching. Other requests wait for the lock holder to populate the cache rather than all racing to the database.

async function getCachedWithLock(key, fetchFn, ttl) {
  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);

  const lockKey = `lock:${key}`;
  const lockAcquired = await redis.set(lockKey, '1', {
    NX: true,      // Only set if not exists
    EX: 10,        // Lock expires in 10 seconds
  });

  if (lockAcquired) {
    try {
      // This instance won the lock — fetch and populate
      const value = await fetchFn();
      await redis.setEx(key, ttl, JSON.stringify(value));
      return value;
    } finally {
      await redis.del(lockKey);
    }
  } else {
    // Another instance is populating — wait briefly and retry
    await new Promise(resolve => setTimeout(resolve, 50));
    const populated = await redis.get(key);
    return populated ? JSON.parse(populated) : getCachedWithLock(key, fetchFn, ttl);
  }
}

Use probabilistic early expiration for frequently accessed keys where you can afford the occasional early refresh. Use mutex locks for expensive operations (external API calls, heavy aggregations) where parallel execution would be harmful.


Layer 3: CDN Edge Caching for API Responses

For public API endpoints that return the same response for many users, CDN caching eliminates origin load entirely. A well-configured CDN can absorb 90%+ of read traffic for public content.

// Express middleware that sets aggressive caching headers for public endpoints
function setCacheHeaders(maxAge, staleWhileRevalidate = 60) {
  return (req, res, next) => {
    // Only cache GET requests
    if (req.method !== 'GET') return next();

    res.set({
      'Cache-Control': `public, max-age=${maxAge}, stale-while-revalidate=${staleWhileRevalidate}`,
      'Vary': 'Accept-Encoding',  // Separate cache for gzip vs non-gzip
    });

    next();
  };
}

// Product listing: cache for 5 minutes, serve stale for 60 seconds while revalidating
app.get('/api/products',
  setCacheHeaders(300, 60),
  async (req, res) => {
    const products = await productService.list(req.query);
    res.json(products);
  }
);

// User-specific data: never cache at CDN
app.get('/api/user/profile',
  (req, res, next) => {
    res.set('Cache-Control', 'private, no-cache');
    next();
  },
  authenticate,
  async (req, res) => {
    const profile = await userService.getProfile(req.user.id);
    res.json(profile);
  }
);

stale-while-revalidate is the most useful cache directive for API responses. It tells the CDN to serve a stale response immediately while fetching a fresh one in the background. From the user's perspective, the response is always fast. From your origin's perspective, requests arrive at a controlled, background rate rather than all at once when cache entries expire.

The critical Vary header: If you serve compressed and uncompressed responses from the same endpoint (which all production servers should), Vary: Accept-Encoding ensures the CDN maintains separate cache entries for each encoding. Without it, compressed responses get served to clients that cannot decompress them.


Eviction Policies: What Happens When Redis Gets Full

Redis operates entirely in memory. When maxmemory is reached and new keys need to be written, Redis must evict existing keys. The eviction policy determines which keys get removed.

# redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lru

Policy comparison:

Policy

Behavior

Use When

noeviction

Refuse writes when full; return error

You cannot afford data loss — Redis as primary store

allkeys-lru

Evict least recently used keys from entire keyspace

General-purpose cache — this is the right default

volatile-lru

Evict LRU keys that have a TTL set

Mix of cache and persistent data in same Redis instance

allkeys-lfu

Evict least frequently used keys

Access patterns with highly skewed popularity

volatile-ttl

Evict keys closest to expiry

You want to control what survives memory pressure via TTL

For a pure cache (all data has TTLs and can be regenerated from the database), allkeys-lru is almost always the right choice.

The unbounded key growth trap: Forgetting to set TTLs on cache keys is one of the most common Redis production mistakes. Without TTLs, your key count grows monotonically until Redis hits maxmemory and starts evicting keys according to your policy. With noeviction, writes start failing. The symptom appears suddenly at a threshold that can be hard to predict.

Always set TTLs. Always.


Observability: The Metrics That Catch Problems Early

A cache you cannot observe is a cache you cannot trust. These are the signals worth instrumenting from day one.

Cache hit rate is the single most important metric. Below 50% means your strategy is broken — your TTLs are too short, your key space is too fragmented, or you are caching data that changes too frequently.

// Redis INFO stats
const info = await redis.info('stats');

// Parse keyspace_hits and keyspace_misses
const hitRate = keyspaceHits / (keyspaceHits + keyspaceMisses);

Memory usage and fragmentation:

const memoryInfo = await redis.info('memory');
// Watch: used_memory_rss vs used_memory
// High fragmentation ratio (>1.5) means Redis is holding onto
// more OS memory than it is actually using

Latency percentiles: Instrument your cache client to record operation latency. A Redis GET that takes 50 ms indicates network problems or a Redis instance under heavy load — both situations that degrade your application even when the data is cached.

Instrument hit/miss at the application layer:

class InstrumentedCache {
  constructor(redisClient, metrics) {
    this.redis = redisClient;
    this.metrics = metrics;
  }

  async get(key, namespace = 'default') {
    const value = await this.redis.get(key);

    if (value) {
      this.metrics.increment('cache.hit', { namespace });
    } else {
      this.metrics.increment('cache.miss', { namespace });
    }

    return value ? JSON.parse(value) : null;
  }
}

Track hit rate per cache namespace. A single low-performing key namespace degrades overall hit rate but is invisible in aggregate metrics.


The Redis Scaling Progression

Start simple. Add complexity only when metrics demand it.

Phase 1 — Single Node: Sufficient for most applications up to significant traffic. Add read replicas when reads become the bottleneck.

Phase 2 — Redis Sentinel: Automatic failover for high availability without horizontal sharding. Suitable up to approximately 50 GB of data. One primary, multiple replicas, Sentinel monitors and promotes a replica if the primary fails.

Phase 3 — Redis Cluster: Automatic horizontal sharding across 3–1,000 nodes with built-in replication. Handles petabyte-scale workloads. Adds operational complexity and some API constraints (multi-key operations across slots require care).

Most applications never need Phase 3. Most applications that think they need Phase 3 actually need Phase 2 combined with better application-level caching and query optimization.


The Anti-Patterns Worth Naming

Caching everything once you see the results. After the first successful caching implementation improves response times dramatically, the temptation is to cache every expensive operation. Resist it. Understand the data's update lifecycle — who writes it, how often, what triggers a change — before deciding to cache it. Caching data with complex invalidation requirements without a clear invalidation plan turns a performance win into a correctness bug.

Using KEYS in production. KEYS pattern is O(N) across the entire key space and blocks Redis for its entire duration. At 1 million keys with a 100 ms scan, every client connected to that Redis instance experiences 100 ms of additional latency. Use SCAN with a cursor.

Missing TTLs on any key. Set them. Every one.

A cache hit rate below 50%. This is not a caching problem — it is a sign that your keys are either not matching real request patterns (key construction is wrong), your TTL is shorter than your query interval (keys expire before they get read again), or you are caching data that changes faster than it is read (which means you should not be caching it).


What Good Caching Looks Like

Good caching is designed from the beginning, not bolted on after a performance incident. The key scheme is deliberate and composable. Invalidation is built into every write path at the same time as caching is added to the read path. TTLs are set based on how frequently the underlying data changes, not based on a default that someone copy-pasted. And the hit rate is monitored continuously — not checked once during load testing and then forgotten.

The gains are real. A well-implemented Redis caching layer reduces database load by 60–90%, dramatically improves response latency, and lets your database infrastructure scale much further before requiring sharding or read replicas.

The operational discipline to maintain those gains over time is what separates a caching implementation that helps from one that eventually causes a 3 AM incident.

Don't cache what you don't understand. Caching data with complex invalidation requirements without a clear invalidation strategy introduces subtle bugs that are hard to debug in production.

Z

ZyVOP

Passionate developer sharing knowledge about modern web technologies and best practices.

Comments (0)

Login to post a comment.

Stay Updated

Get the latest articles delivered to your inbox.

We respect your privacy. Unsubscribe anytime.

Related Posts

The Node.js Event Loop Is Not Magic — It's a Contract

Every Node.js performance problem is either an event loop violation or a consequence of one. This is the guide to understanding the contract, diagnosing when it breaks, and building systems that never block.

Read article

Why Your App Is Slow (And It's Not the Database)

Slow APIs with a clean slow query log trace to one of five root causes. Four have nothing to do with query execution. Here's how to identify each one, measure it precisely, and fix it for good.

Read article

From Zero to One Million: The 2026 Engineering Playbook Every Developer Must Read

Most apps die not from lack of features, but from architectural arrogance. This is the brutally honest, research-grounded 2026 guide to scaling your website from launch day to one million users — one deliberate decision at a time.

Read article

Best AI Tools for Developers in India (2026) — Tried, Tested & Ranked

85% of developers use AI tools daily in 2026. But most "best tools" lists are written for US developers with dollar budgets. This guide covers what actually works for Indian developers — with real 2026 pricing in rupees.

Read article

SQL Mistakes That Kill Your Database (And How to Fix Them)

SQL performance problems rarely come from the database itself — they come from inefficient queries. This guide covers the most common mistakes that slow production systems down, including missing indexes, N+1 queries, full table scans, bad joins, overfetching, and how to debug and optimize them properly.

Read article

Popular Tags

#.env.example Node.js#0x profiling#10x faster python scraper tutorial#12-factor#2026#AI#AI agents#AI code security#AI coding#AI coding tools 2026