Which topics does this article cover?

It highlights Node.js graceful shutdown, SIGTERM Node.js, Docker graceful shutdown Node.js, health check Express, Kubernetes readiness probe Node.js.

Graceful Shutdown in Node.js: Stop Dropping Requests on Every Deploy

Here is what happens when a Node.js app shuts down the wrong way:

Your CI pipeline pushes a new deploy. Docker sends SIGTERM to the container. The process exits immediately. Any request in flight at that moment gets dropped — no response, no error, just a hanging connection from the client's perspective. If you are deploying five times a day, users notice.

Graceful shutdown is the difference between a deploy that users never feel and one that generates a spike of errors in your monitoring. It is also one of the more neglected production patterns in Node.js — not because it is hard, but because it only fails in production under real traffic, which means it often goes unfixed until it causes a real incident.

What Graceful Shutdown Actually Means

When your process receives a termination signal (SIGTERM from Docker/Kubernetes, SIGINT from Ctrl+C), it should:

Stop accepting new connections
Finish handling all in-flight requests
Drain background job workers
Close database connections cleanly
Flush any buffered logs
Exit with code 0

If it takes too long (say, more than 30 seconds), something is stuck and the process should force-exit with code 1.

The Full Implementation

// src/server.js
import http from 'http';
import app from './app.js';
import db from './lib/db.js';
import redis from './lib/redis.js';
import logger from './lib/logger.js';
import { emailWorker, reportWorker } from './workers/index.js';

const PORT = process.env.PORT || 3000;
const server = http.createServer(app);

// Track whether we are shutting down
// Used by health check to signal load balancers to stop sending traffic
let isShuttingDown = false;

// Track active connections so we can drain them
const activeConnections = new Set();

server.on('connection', (socket) => {
  activeConnections.add(socket);
  socket.once('close', () => activeConnections.delete(socket));
});

// ─────────────────────────────────────────────────────
// Graceful shutdown handler
// ─────────────────────────────────────────────────────
async function gracefulShutdown(signal) {
  if (isShuttingDown) return;   // Prevent double-shutdown if multiple signals arrive
  isShuttingDown = true;

  logger.info({ signal }, 'Shutdown signal received, starting graceful shutdown');

  // Force exit if shutdown takes too long
  // 30s is generous — tune down to 10-15s if your requests are typically fast
  const forceExitTimer = setTimeout(() => {
    logger.error('Graceful shutdown timed out, forcing exit');
    process.exit(1);
  }, 30_000);
  forceExitTimer.unref();   // Don't let this timer keep the process alive

  try {
    // Step 1 — Stop accepting new connections
    // Existing connections finish; new ones get Connection: close
    await new Promise((resolve, reject) => {
      server.close((err) => {
        if (err) reject(err);
        else resolve();
      });
    });
    logger.info('HTTP server closed — no longer accepting connections');

    // Step 2 — Wait for in-flight requests to finish
    // server.close() stops new connections but existing ones can still have requests
    // This waits for all active sockets to close
    if (activeConnections.size > 0) {
      logger.info({ count: activeConnections.size }, 'Waiting for active connections to drain');
      await new Promise((resolve) => {
        const check = setInterval(() => {
          if (activeConnections.size === 0) {
            clearInterval(check);
            resolve();
          }
        }, 100);
      });
    }

    // Step 3 — Close BullMQ workers (finish current job, reject new ones)
    logger.info('Closing background workers...');
    await Promise.all([
      emailWorker.close(),
      reportWorker.close(),
    ]);
    logger.info('Workers closed');

    // Step 4 — Close database pool
    await db.end();
    logger.info('Database pool closed');

    // Step 5 — Close Redis connection
    await redis.quit();
    logger.info('Redis connection closed');

    clearTimeout(forceExitTimer);
    logger.info('Graceful shutdown complete');
    process.exit(0);

  } catch (err) {
    logger.error({ error: err.message }, 'Error during graceful shutdown');
    process.exit(1);
  }
}

// Listen for termination signals
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));  // Docker, Kubernetes
process.on('SIGINT',  () => gracefulShutdown('SIGINT'));   // Ctrl+C in dev

// ─────────────────────────────────────────────────────
// Unhandled promise rejections and exceptions
// ─────────────────────────────────────────────────────
process.on('unhandledRejection', (reason, promise) => {
  logger.error({ reason, promise }, 'Unhandled promise rejection');
  // In production, treat this as fatal — exit and let the process manager restart
  gracefulShutdown('unhandledRejection');
});

process.on('uncaughtException', (err) => {
  logger.error({ error: err.message, stack: err.stack }, 'Uncaught exception');
  gracefulShutdown('uncaughtException');
});

// ─────────────────────────────────────────────────────
// Start server
// ─────────────────────────────────────────────────────
server.listen(PORT, () => {
  logger.info({ port: PORT }, 'Server started');
});

The Health Check Endpoint

A health check that returns 200 during shutdown is actively harmful — your load balancer keeps sending traffic to a process that is trying to shut down. Your health check must respect the shutdown state.

// src/routes/health.js

// Lightweight liveness check — just "is the process running?"
// Used by Docker/Kubernetes to know if the container should be restarted
router.get('/health/live', (req, res) => {
  if (isShuttingDown) {
    return res.status(503).json({ status: 'shutting_down' });
  }
  res.json({ status: 'ok' });
});

// Readiness check — "is the app ready to serve traffic?"
// Load balancers use this. Return 503 to drain traffic before shutdown.
router.get('/health/ready', async (req, res) => {
  if (isShuttingDown) {
    return res.status(503).json({
      status: 'shutting_down',
      message: 'Draining traffic',
    });
  }

  // Check actual dependencies — don't lie to the load balancer
  const checks = await Promise.allSettled([
    db.query('SELECT 1'),           // Database reachable?
    redis.ping(),                   // Redis reachable?
  ]);

  const dbOk = checks[0].status === 'fulfilled';
  const redisOk = checks[1].status === 'fulfilled';
  const healthy = dbOk && redisOk;

  res.status(healthy ? 200 : 503).json({
    status: healthy ? 'ok' : 'degraded',
    checks: {
      database: dbOk ? 'ok' : 'error',
      redis: redisOk ? 'ok' : 'error',
    },
  });
});

// Deep health check — more expensive, used for alerting not load balancing
router.get('/health/detail', authenticate, async (req, res) => {
  const [emailQueue] = await Promise.all([
    emailQueue.getJobCounts(),
  ]);

  res.json({
    status: 'ok',
    uptime: process.uptime(),
    memory: process.memoryUsage(),
    queues: { email: emailQueue },
    version: process.env.APP_VERSION || 'unknown',
  });
});

Docker and Kubernetes Config

For Docker Compose:

services:
  app:
    image: your-app
    stop_grace_period: 30s    # Give the app time to shut down before force kill
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/health/live"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 15s

For Kubernetes:

spec:
  containers:
  - name: app
    livenessProbe:
      httpGet:
        path: /health/live
        port: 3000
      initialDelaySeconds: 10
      periodSeconds: 10

    readinessProbe:
      httpGet:
        path: /health/ready
        port: 3000
      initialDelaySeconds: 5
      periodSeconds: 5

  terminationGracePeriodSeconds: 30   # Must be >= your shutdown timeout

The readiness probe is the critical one for zero-downtime deployments. Kubernetes stops sending traffic to a pod the moment /health/ready returns non-200. Combined with the shutdown handler setting isShuttingDown = true as its first action, traffic drains before the server closes.

Common Mistakes

Not calling server.close() — Most developers only handle SIGTERM and call process.exit() directly. This drops all in-flight requests. Always close the server first.

Setting the force-exit timer too high — A 5-minute timeout means a stuck process holds a deployment slot for 5 minutes. Keep it at 15–30 seconds.

Health check ignoring shutdown state — If /health returns 200 during shutdown, load balancers keep sending traffic and requests keep arriving. The shutdown never drains. Always check isShuttingDown in your health endpoint.

Not closing the DB pool — Node.js will not exit while there are open database connections. If you do not call db.end(), the force-exit timer fires and you get exit(1) instead of exit(0).

Not waiting for BullMQ workers — A worker killed mid-job leaves the job in an indeterminate state. BullMQ will re-queue stalled jobs, but it is cleaner to call worker.close() and let the current job finish.

Testing Your Shutdown

# Start your app
node src/server.js

# In another terminal, send a curl that takes a while
curl -X POST http://localhost:3000/api/slow-endpoint &

# While that's running, send SIGTERM
kill -SIGTERM $(lsof -t -i:3000)

# The slow request should complete before the process exits
# The exit code should be 0
echo "Exit code: $?"

If your slow request completes and you see Graceful shutdown complete in the logs before the process exits, your implementation is correct.

Here is what happens when a Node.js app shuts down the wrong way:

What Graceful Shutdown Actually Means

When your process receives a termination signal (SIGTERM from Docker/Kubernetes, SIGINT from Ctrl+C), it should:

Stop accepting new connections
Finish handling all in-flight requests
Drain background job workers
Close database connections cleanly
Flush any buffered logs
Exit with code 0

If it takes too long (say, more than 30 seconds), something is stuck and the process should force-exit with code 1.

The Full Implementation

// src/server.js
import http from 'http';
import app from './app.js';
import db from './lib/db.js';
import redis from './lib/redis.js';
import logger from './lib/logger.js';
import { emailWorker, reportWorker } from './workers/index.js';

const PORT = process.env.PORT || 3000;
const server = http.createServer(app);

// Track whether we are shutting down
// Used by health check to signal load balancers to stop sending traffic
let isShuttingDown = false;

// Track active connections so we can drain them
const activeConnections = new Set();

server.on('connection', (socket) => {
  activeConnections.add(socket);
  socket.once('close', () => activeConnections.delete(socket));
});

// ─────────────────────────────────────────────────────
// Graceful shutdown handler
// ─────────────────────────────────────────────────────
async function gracefulShutdown(signal) {
  if (isShuttingDown) return;   // Prevent double-shutdown if multiple signals arrive
  isShuttingDown = true;

  logger.info({ signal }, 'Shutdown signal received, starting graceful shutdown');

  // Force exit if shutdown takes too long
  // 30s is generous — tune down to 10-15s if your requests are typically fast
  const forceExitTimer = setTimeout(() => {
    logger.error('Graceful shutdown timed out, forcing exit');
    process.exit(1);
  }, 30_000);
  forceExitTimer.unref();   // Don't let this timer keep the process alive

  try {
    // Step 1 — Stop accepting new connections
    // Existing connections finish; new ones get Connection: close
    await new Promise((resolve, reject) => {
      server.close((err) => {
        if (err) reject(err);
        else resolve();
      });
    });
    logger.info('HTTP server closed — no longer accepting connections');

    // Step 2 — Wait for in-flight requests to finish
    // server.close() stops new connections but existing ones can still have requests
    // This waits for all active sockets to close
    if (activeConnections.size > 0) {
      logger.info({ count: activeConnections.size }, 'Waiting for active connections to drain');
      await new Promise((resolve) => {
        const check = setInterval(() => {
          if (activeConnections.size === 0) {
            clearInterval(check);
            resolve();
          }
        }, 100);
      });
    }

    // Step 3 — Close BullMQ workers (finish current job, reject new ones)
    logger.info('Closing background workers...');
    await Promise.all([
      emailWorker.close(),
      reportWorker.close(),
    ]);
    logger.info('Workers closed');

    // Step 4 — Close database pool
    await db.end();
    logger.info('Database pool closed');

    // Step 5 — Close Redis connection
    await redis.quit();
    logger.info('Redis connection closed');

    clearTimeout(forceExitTimer);
    logger.info('Graceful shutdown complete');
    process.exit(0);

  } catch (err) {
    logger.error({ error: err.message }, 'Error during graceful shutdown');
    process.exit(1);
  }
}

// Listen for termination signals
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));  // Docker, Kubernetes
process.on('SIGINT',  () => gracefulShutdown('SIGINT'));   // Ctrl+C in dev

// ─────────────────────────────────────────────────────
// Unhandled promise rejections and exceptions
// ─────────────────────────────────────────────────────
process.on('unhandledRejection', (reason, promise) => {
  logger.error({ reason, promise }, 'Unhandled promise rejection');
  // In production, treat this as fatal — exit and let the process manager restart
  gracefulShutdown('unhandledRejection');
});

process.on('uncaughtException', (err) => {
  logger.error({ error: err.message, stack: err.stack }, 'Uncaught exception');
  gracefulShutdown('uncaughtException');
});

// ─────────────────────────────────────────────────────
// Start server
// ─────────────────────────────────────────────────────
server.listen(PORT, () => {
  logger.info({ port: PORT }, 'Server started');
});

The Health Check Endpoint

// src/routes/health.js

// Lightweight liveness check — just "is the process running?"
// Used by Docker/Kubernetes to know if the container should be restarted
router.get('/health/live', (req, res) => {
  if (isShuttingDown) {
    return res.status(503).json({ status: 'shutting_down' });
  }
  res.json({ status: 'ok' });
});

// Readiness check — "is the app ready to serve traffic?"
// Load balancers use this. Return 503 to drain traffic before shutdown.
router.get('/health/ready', async (req, res) => {
  if (isShuttingDown) {
    return res.status(503).json({
      status: 'shutting_down',
      message: 'Draining traffic',
    });
  }

  // Check actual dependencies — don't lie to the load balancer
  const checks = await Promise.allSettled([
    db.query('SELECT 1'),           // Database reachable?
    redis.ping(),                   // Redis reachable?
  ]);

  const dbOk = checks[0].status === 'fulfilled';
  const redisOk = checks[1].status === 'fulfilled';
  const healthy = dbOk && redisOk;

  res.status(healthy ? 200 : 503).json({
    status: healthy ? 'ok' : 'degraded',
    checks: {
      database: dbOk ? 'ok' : 'error',
      redis: redisOk ? 'ok' : 'error',
    },
  });
});

// Deep health check — more expensive, used for alerting not load balancing
router.get('/health/detail', authenticate, async (req, res) => {
  const [emailQueue] = await Promise.all([
    emailQueue.getJobCounts(),
  ]);

  res.json({
    status: 'ok',
    uptime: process.uptime(),
    memory: process.memoryUsage(),
    queues: { email: emailQueue },
    version: process.env.APP_VERSION || 'unknown',
  });
});

Docker and Kubernetes Config

For Docker Compose:

services:
  app:
    image: your-app
    stop_grace_period: 30s    # Give the app time to shut down before force kill
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/health/live"]
      interval: 10s
      timeout: 5s
      retries: 3
      start_period: 15s

For Kubernetes:

spec:
  containers:
  - name: app
    livenessProbe:
      httpGet:
        path: /health/live
        port: 3000
      initialDelaySeconds: 10
      periodSeconds: 10

    readinessProbe:
      httpGet:
        path: /health/ready
        port: 3000
      initialDelaySeconds: 5
      periodSeconds: 5

  terminationGracePeriodSeconds: 30   # Must be >= your shutdown timeout

Common Mistakes

Not calling server.close() — Most developers only handle SIGTERM and call process.exit() directly. This drops all in-flight requests. Always close the server first.

Setting the force-exit timer too high — A 5-minute timeout means a stuck process holds a deployment slot for 5 minutes. Keep it at 15–30 seconds.

Not closing the DB pool — Node.js will not exit while there are open database connections. If you do not call db.end(), the force-exit timer fires and you get exit(1) instead of exit(0).

Testing Your Shutdown

# Start your app
node src/server.js

# In another terminal, send a curl that takes a while
curl -X POST http://localhost:3000/api/slow-endpoint &

# While that's running, send SIGTERM
kill -SIGTERM $(lsof -t -i:3000)

# The slow request should complete before the process exits
# The exit code should be 0
echo "Exit code: $?"

If your slow request completes and you see Graceful shutdown complete in the logs before the process exits, your implementation is correct.

Graceful Shutdown in Node.js: Stop Dropping Requests on Every Deploy

What Graceful Shutdown Actually Means

The Full Implementation

The Health Check Endpoint

Docker and Kubernetes Config

Common Mistakes

Testing Your Shutdown

ZyVOP

Comments (0)

Graceful Shutdown in Node.js: Stop Dropping Requests on Every Deploy

What Graceful Shutdown Actually Means

The Full Implementation

The Health Check Endpoint

Docker and Kubernetes Config

Common Mistakes

Testing Your Shutdown

ZyVOP

Comments (0)

Popular Tags