Graceful Shutdown in Node.js: Stop Dropping Requests on Every Deploy
How to drain connections, close workers, and signal your load balancer before the process exits — on every single deploy
Senior Developer

Here is what happens when a Node.js app shuts down the wrong way:
Your CI pipeline pushes a new deploy. Docker sends SIGTERM to the container. The process exits immediately. Any request in flight at that moment gets dropped — no response, no error, just a hanging connection from the client's perspective. If you are deploying five times a day, users notice.
Graceful shutdown is the difference between a deploy that users never feel and one that generates a spike of errors in your monitoring. It is also one of the more neglected production patterns in Node.js — not because it is hard, but because it only fails in production under real traffic, which means it often goes unfixed until it causes a real incident.
What Graceful Shutdown Actually Means
When your process receives a termination signal (SIGTERM from Docker/Kubernetes, SIGINT from Ctrl+C), it should:
Stop accepting new connections
Finish handling all in-flight requests
Drain background job workers
Close database connections cleanly
Flush any buffered logs
Exit with code
0
If it takes too long (say, more than 30 seconds), something is stuck and the process should force-exit with code 1.
The Full Implementation
// src/server.js
import http from 'http';
import app from './app.js';
import db from './lib/db.js';
import redis from './lib/redis.js';
import logger from './lib/logger.js';
import { emailWorker, reportWorker } from './workers/index.js';
const PORT = process.env.PORT || 3000;
const server = http.createServer(app);
// Track whether we are shutting down
// Used by health check to signal load balancers to stop sending traffic
let isShuttingDown = false;
// Track active connections so we can drain them
const activeConnections = new Set();
server.on('connection', (socket) => {
activeConnections.add(socket);
socket.once('close', () => activeConnections.delete(socket));
});
// ─────────────────────────────────────────────────────
// Graceful shutdown handler
// ─────────────────────────────────────────────────────
async function gracefulShutdown(signal) {
if (isShuttingDown) return; // Prevent double-shutdown if multiple signals arrive
isShuttingDown = true;
logger.info({ signal }, 'Shutdown signal received, starting graceful shutdown');
// Force exit if shutdown takes too long
// 30s is generous — tune down to 10-15s if your requests are typically fast
const forceExitTimer = setTimeout(() => {
logger.error('Graceful shutdown timed out, forcing exit');
process.exit(1);
}, 30_000);
forceExitTimer.unref(); // Don't let this timer keep the process alive
try {
// Step 1 — Stop accepting new connections
// Existing connections finish; new ones get Connection: close
await new Promise((resolve, reject) => {
server.close((err) => {
if (err) reject(err);
else resolve();
});
});
logger.info('HTTP server closed — no longer accepting connections');
// Step 2 — Wait for in-flight requests to finish
// server.close() stops new connections but existing ones can still have requests
// This waits for all active sockets to close
if (activeConnections.size > 0) {
logger.info({ count: activeConnections.size }, 'Waiting for active connections to drain');
await new Promise((resolve) => {
const check = setInterval(() => {
if (activeConnections.size === 0) {
clearInterval(check);
resolve();
}
}, 100);
});
}
// Step 3 — Close BullMQ workers (finish current job, reject new ones)
logger.info('Closing background workers...');
await Promise.all([
emailWorker.close(),
reportWorker.close(),
]);
logger.info('Workers closed');
// Step 4 — Close database pool
await db.end();
logger.info('Database pool closed');
// Step 5 — Close Redis connection
await redis.quit();
logger.info('Redis connection closed');
clearTimeout(forceExitTimer);
logger.info('Graceful shutdown complete');
process.exit(0);
} catch (err) {
logger.error({ error: err.message }, 'Error during graceful shutdown');
process.exit(1);
}
}
// Listen for termination signals
process.on('SIGTERM', () => gracefulShutdown('SIGTERM')); // Docker, Kubernetes
process.on('SIGINT', () => gracefulShutdown('SIGINT')); // Ctrl+C in dev
// ─────────────────────────────────────────────────────
// Unhandled promise rejections and exceptions
// ─────────────────────────────────────────────────────
process.on('unhandledRejection', (reason, promise) => {
logger.error({ reason, promise }, 'Unhandled promise rejection');
// In production, treat this as fatal — exit and let the process manager restart
gracefulShutdown('unhandledRejection');
});
process.on('uncaughtException', (err) => {
logger.error({ error: err.message, stack: err.stack }, 'Uncaught exception');
gracefulShutdown('uncaughtException');
});
// ─────────────────────────────────────────────────────
// Start server
// ─────────────────────────────────────────────────────
server.listen(PORT, () => {
logger.info({ port: PORT }, 'Server started');
});The Health Check Endpoint
A health check that returns 200 during shutdown is actively harmful — your load balancer keeps sending traffic to a process that is trying to shut down. Your health check must respect the shutdown state.
// src/routes/health.js
// Lightweight liveness check — just "is the process running?"
// Used by Docker/Kubernetes to know if the container should be restarted
router.get('/health/live', (req, res) => {
if (isShuttingDown) {
return res.status(503).json({ status: 'shutting_down' });
}
res.json({ status: 'ok' });
});
// Readiness check — "is the app ready to serve traffic?"
// Load balancers use this. Return 503 to drain traffic before shutdown.
router.get('/health/ready', async (req, res) => {
if (isShuttingDown) {
return res.status(503).json({
status: 'shutting_down',
message: 'Draining traffic',
});
}
// Check actual dependencies — don't lie to the load balancer
const checks = await Promise.allSettled([
db.query('SELECT 1'), // Database reachable?
redis.ping(), // Redis reachable?
]);
const dbOk = checks[0].status === 'fulfilled';
const redisOk = checks[1].status === 'fulfilled';
const healthy = dbOk && redisOk;
res.status(healthy ? 200 : 503).json({
status: healthy ? 'ok' : 'degraded',
checks: {
database: dbOk ? 'ok' : 'error',
redis: redisOk ? 'ok' : 'error',
},
});
});
// Deep health check — more expensive, used for alerting not load balancing
router.get('/health/detail', authenticate, async (req, res) => {
const [emailQueue] = await Promise.all([
emailQueue.getJobCounts(),
]);
res.json({
status: 'ok',
uptime: process.uptime(),
memory: process.memoryUsage(),
queues: { email: emailQueue },
version: process.env.APP_VERSION || 'unknown',
});
});Docker and Kubernetes Config
For Docker Compose:
services:
app:
image: your-app
stop_grace_period: 30s # Give the app time to shut down before force kill
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/health/live"]
interval: 10s
timeout: 5s
retries: 3
start_period: 15sFor Kubernetes:
spec:
containers:
- name: app
livenessProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
terminationGracePeriodSeconds: 30 # Must be >= your shutdown timeoutThe readiness probe is the critical one for zero-downtime deployments. Kubernetes stops sending traffic to a pod the moment /health/ready returns non-200. Combined with the shutdown handler setting isShuttingDown = true as its first action, traffic drains before the server closes.
Common Mistakes
Not calling server.close() — Most developers only handle SIGTERM and call process.exit() directly. This drops all in-flight requests. Always close the server first.
Setting the force-exit timer too high — A 5-minute timeout means a stuck process holds a deployment slot for 5 minutes. Keep it at 15–30 seconds.
Health check ignoring shutdown state — If /health returns 200 during shutdown, load balancers keep sending traffic and requests keep arriving. The shutdown never drains. Always check isShuttingDown in your health endpoint.
Not closing the DB pool — Node.js will not exit while there are open database connections. If you do not call db.end(), the force-exit timer fires and you get exit(1) instead of exit(0).
Not waiting for BullMQ workers — A worker killed mid-job leaves the job in an indeterminate state. BullMQ will re-queue stalled jobs, but it is cleaner to call worker.close() and let the current job finish.
Testing Your Shutdown
# Start your app
node src/server.js
# In another terminal, send a curl that takes a while
curl -X POST http://localhost:3000/api/slow-endpoint &
# While that's running, send SIGTERM
kill -SIGTERM $(lsof -t -i:3000)
# The slow request should complete before the process exits
# The exit code should be 0
echo "Exit code: $?"If your slow request completes and you see Graceful shutdown complete in the logs before the process exits, your implementation is correct.
Comments (0)
Login to post a comment.