Cron Jobs and Scheduled Tasks in Node.js: The Right Way to Run Recurring Work
Three approaches — BullMQ repeatable jobs, Redis distributed locking, and Postgres-tracked execution — and honest guidance on when each one is the right call
Senior Developer

Every application eventually needs work that runs on a schedule. Clean up expired sessions at midnight. Send a weekly digest on Monday morning. Retry failed payment charges every four hours. Generate monthly invoices on the first of each month.
The naive approach is setInterval or a cron library that runs inside your main API process. That works until you scale to two servers and the job runs twice. Or you deploy and the job misses its window. Or your API pod restarts mid-job and the work is half-done.
This guide covers the right architecture: jobs defined in code, state tracked in Postgres, distributed locking via Redis so only one instance runs at a time, and a clean separation between the schedule and the work.
The Problem With Simple Cron Libraries
node-cron, node-schedule, and similar libraries are fine for single-server apps. They parse cron expressions and fire a callback at the right time. The issues surface at scale:
Duplicate execution. If two instances of your app are running, both run the cron job. Your "send daily digest to all users" job sends two emails to every user.
No persistence. If your server restarts at 11:59 PM, the midnight job never runs. The library has no memory of what ran and when.
No visibility. You cannot see which jobs ran, which failed, or how long they took without building that yourself.
No retry logic. If the job throws, it is gone. The library does not retry it.
The solution is to use BullMQ's repeat functionality for jobs that need reliability, combined with a distributed lock for jobs that need exactly-once execution across multiple instances.
Approach 1: BullMQ Repeatable Jobs (Recommended for Most Cases)
BullMQ stores repeatable job schedules in Redis. When an instance adds a repeatable job, Redis tracks it — subsequent instances that call the same add with the same jobId are idempotent. Only one instance runs the job at any given scheduled time.
// src/jobs/scheduleJobs.ts
import { emailQueue, reportQueue, maintenanceQueue } from '../queues';
export async function scheduleAllJobs() {
// Weekly digest — every Monday at 8 AM UTC
await emailQueue.add(
'weekly-digest',
{ type: 'weekly-digest' },
{
jobId: 'weekly-digest-recurring', // Stable ID — idempotent on multiple instances
repeat: {
pattern: '0 8 * * 1', // Cron: minute hour day month weekday
tz: 'UTC',
},
removeOnComplete: 10,
removeOnFail: 50,
}
);
// Daily cleanup — every day at 2 AM UTC
await maintenanceQueue.add(
'cleanup-expired-sessions',
{},
{
jobId: 'cleanup-sessions-recurring',
repeat: { pattern: '0 2 * * *', tz: 'UTC' },
}
);
// Hourly payment retry
await maintenanceQueue.add(
'retry-failed-payments',
{},
{
jobId: 'retry-payments-recurring',
repeat: { pattern: '0 * * * *', tz: 'UTC' }, // Every hour on the hour
}
);
// Monthly invoices — 1st of month at 6 AM UTC
await reportQueue.add(
'generate-monthly-invoices',
{},
{
jobId: 'monthly-invoices-recurring',
repeat: { pattern: '0 6 1 * *', tz: 'UTC' },
}
);
console.log('Recurring jobs scheduled');
}Call this once at startup:
// src/server.ts
import { scheduleAllJobs } from './jobs/scheduleJobs';
// Schedule jobs after server starts
server.listen(PORT, async () => {
await scheduleAllJobs();
logger.info({ port: PORT }, 'Server started');
});The workers that process these jobs are the same workers you already have — no new infrastructure.
Approach 2: Distributed Locking for One-Shot Jobs
Some jobs do not fit a repeatable pattern — they are triggered by a condition, not a time. Or they are time-based but need a guarantee that exactly one instance runs, not "usually one."
A distributed lock via Redis ensures only one process can hold the lock at a time:
// src/lib/distributedLock.ts
import redis from './redis';
import { randomUUID } from 'crypto';
interface Lock {
release: () => Promise<void>;
}
/**
* Acquire a distributed lock. Returns null if the lock is already held.
*
* @param key Lock identifier — unique per job type
* @param ttlSeconds How long the lock is held before auto-expiry (safety net)
*/
export async function acquireLock(
key: string,
ttlSeconds: number
): Promise<Lock | null> {
const lockKey = `lock:${key}`;
const lockValue = randomUUID(); // Unique value — only the holder can release it
// SET key value NX EX ttl — atomic: only sets if key does not exist
const acquired = await redis.set(lockKey, lockValue, 'NX', 'EX', ttlSeconds);
if (!acquired) {
return null; // Lock is held by another instance
}
return {
release: async () => {
// Lua script: only delete if we still hold the lock
// Prevents releasing a lock that expired and was re-acquired by another instance
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
await redis.eval(script, 1, lockKey, lockValue);
},
};
}Usage:
async function runDailyReport() {
// Try to acquire a 10-minute lock
const lock = await acquireLock('daily-report', 600);
if (!lock) {
logger.info('Daily report already running on another instance — skipping');
return;
}
try {
logger.info('Starting daily report generation');
await generateDailyReport();
logger.info('Daily report complete');
} catch (err) {
logger.error({ error: (err as Error).message }, 'Daily report failed');
throw err;
} finally {
await lock.release();
}
}Approach 3: Database-Tracked Jobs (Full Audit Trail)
For jobs where you need to know exactly what ran, when, by which instance, and what the result was — store job state in Postgres.
CREATE TYPE job_status AS ENUM ('pending', 'running', 'completed', 'failed');
CREATE TABLE scheduled_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
job_type TEXT NOT NULL,
scheduled_at TIMESTAMPTZ NOT NULL,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
status job_status NOT NULL DEFAULT 'pending',
instance_id TEXT, -- Which server ran it
result JSONB,
error TEXT,
duration_ms INTEGER
);
CREATE INDEX idx_scheduled_jobs_type_scheduled
ON scheduled_jobs(job_type, scheduled_at);
CREATE INDEX idx_scheduled_jobs_status
ON scheduled_jobs(status);// src/lib/trackedJob.ts
import { randomUUID } from 'crypto';
const INSTANCE_ID = `${process.env.HOSTNAME || 'unknown'}-${randomUUID().slice(0, 8)}`;
export async function runTrackedJob<T>(
jobType: string,
fn: () => Promise<T>
): Promise<T | null> {
// Claim the job — atomic check-and-update
const claim = await db.query(`
UPDATE scheduled_jobs
SET
status = 'running',
started_at = NOW(),
instance_id = $1
WHERE id = (
SELECT id FROM scheduled_jobs
WHERE job_type = $2
AND status = 'pending'
AND scheduled_at <= NOW()
ORDER BY scheduled_at ASC
FOR UPDATE SKIP LOCKED -- Skip rows locked by other instances
LIMIT 1
)
RETURNING id
`, [INSTANCE_ID, jobType]);
if (!claim.rows[0]) {
return null; // No pending job of this type — another instance claimed it
}
const jobId = claim.rows[0].id;
const start = Date.now();
try {
const result = await fn();
await db.query(`
UPDATE scheduled_jobs
SET
status = 'completed',
completed_at = NOW(),
result = $1,
duration_ms = $2
WHERE id = $3
`, [JSON.stringify(result), Date.now() - start, jobId]);
return result;
} catch (err) {
await db.query(`
UPDATE scheduled_jobs
SET
status = 'failed',
completed_at = NOW(),
error = $1,
duration_ms = $2
WHERE id = $3
`, [(err as Error).message, Date.now() - start, jobId]);
throw err;
}
}The FOR UPDATE SKIP LOCKED is the key detail — it lets multiple instances compete for the same job row without deadlocking. The first one to acquire the lock claims the job; all others skip it and find nothing to run.
Practical Cron Expressions Reference
Cron syntax has five fields: minute hour day-of-month month day-of-week.
# Every minute
* * * * *
# Every hour at :00
0 * * * *
# Every day at 2:30 AM UTC
30 2 * * *
# Every Monday at 8 AM UTC
0 8 * * 1
# Every first of the month at 6 AM UTC
0 6 1 * *
# Every 15 minutes
*/15 * * * *
# Weekdays at 9 AM UTC (Monday-Friday)
0 9 * * 1-5
# Every 6 hours
0 */6 * * *Always use UTC in your cron expressions and convert to local time for display. Daylight saving time causes jobs defined in local time to shift by an hour twice a year.
Verify your expressions before deploying:
# Use crontab.guru — paste your expression and see plain English
# https://crontab.guruMonitoring Scheduled Jobs
Jobs that silently fail are worse than jobs that visibly break. Add a watchdog:
// src/jobs/watchdog.ts — run every 15 minutes
async function checkJobHealth() {
// Find jobs that should have run recently but haven't
const overdueJobs = await db.query(`
SELECT job_type, MAX(completed_at) AS last_run
FROM scheduled_jobs
WHERE status = 'completed'
GROUP BY job_type
HAVING MAX(completed_at) < NOW() - INTERVAL '26 hours' -- Daily jobs overdue
`);
for (const job of overdueJobs.rows) {
logger.error({
jobType: job.job_type,
lastRun: job.last_run,
}, 'Scheduled job appears to be stalled');
// Alert via your notification channel
await alertSlack(`⚠️ Scheduled job *${job.job_type}* has not run since ${job.last_run}`);
}
// Find jobs stuck in 'running' for too long
const stuckJobs = await db.query(`
SELECT id, job_type, started_at, instance_id
FROM scheduled_jobs
WHERE status = 'running'
AND started_at < NOW() - INTERVAL '2 hours'
`);
for (const job of stuckJobs.rows) {
logger.error({ job }, 'Job stuck in running state — possible crashed instance');
// Reset to pending so it can be retried
await db.query(
`UPDATE scheduled_jobs SET status = 'pending' WHERE id = $1`,
[job.id]
);
}
}The Architecture Decision
Use BullMQ repeatable jobs when:
The job needs retry logic on failure
You want it to show up in Bull Board with history
The job does real work that could fail (sending emails, generating reports)
Use distributed locking when:
The job is triggered by a condition, not a pure schedule
You need the simplest possible implementation with minimal Redis usage
The job is fast and failure consequences are low
Use database-tracked jobs when:
You need a full audit trail of what ran, when, and on which instance
The job handles regulated work (billing, compliance exports)
You need to query job history for debugging
For most teams, BullMQ repeatable jobs cover 80% of cases. Add distributed locking for the edge cases. Use database tracking only when the audit trail is non-negotiable.
Comments (0)
Login to post a comment.