Zero-Downtime Database Migrations: How to Change Your Schema Without Taking Down Your App
The expand-contract pattern — the only migration strategy that works when you deploy multiple times a day
Senior Developer

Database migrations are the most dangerous routine operation in backend development. Not because they are complex — most are simple ALTER TABLE statements — but because the gap between what your running application expects and what your database schema looks like is where outages live.
The naive approach: take the app down, run the migration, bring the app back up. That works at 3 AM with no users. It does not work when you have traffic, paying customers, and a team that deploys multiple times a day.
This guide covers the expand-contract pattern — the only migration strategy that is both safe and compatible with continuous deployment.
Why Naive Migrations Break Things
Consider a simple rename: users.name → users.full_name.
The wrong way:
-- Migration
ALTER TABLE users RENAME COLUMN name TO full_name;You run this migration, then deploy the new code that uses full_name. But for the 30 seconds between the migration and the deploy, your running application is still selecting name — a column that no longer exists. Every query fails. Every user gets a 500.
Even if you deploy code and migration simultaneously, there is still a window. Migrations take time. Old pods are still running during a rolling deploy. The window might be 10 seconds or 10 minutes depending on your setup, but it exists.
The fix is not to move faster. It is to make the migration itself backwards-compatible.
The Expand-Contract Pattern
Every schema change that could break a running application is split into three separate deployments:
Phase 1 — Expand: Add the new thing without removing the old thing. The database supports both old and new code simultaneously.
Phase 2 — Migrate: Deploy the new code that writes to both old and new structures. Backfill existing data.
Phase 3 — Contract: Once all code is writing to the new structure and no old code is running, remove the old thing.
It sounds slow. In practice each phase is a tiny deployment, and the total time is less than the time it takes to plan an outage window.
Example 1 — Renaming a Column
The goal: rename users.name to users.full_name.
Phase 1: Expand — Add the new column
-- Migration 001: add_full_name_to_users.sql
ALTER TABLE users ADD COLUMN full_name TEXT;
-- Copy existing data into the new column
UPDATE users SET full_name = name WHERE full_name IS NULL;
-- Keep them in sync with a trigger while both columns exist
CREATE OR REPLACE FUNCTION sync_user_name()
RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' OR NEW.name IS DISTINCT FROM OLD.name THEN
NEW.full_name := NEW.name;
END IF;
IF TG_OP = 'INSERT' OR NEW.full_name IS DISTINCT FROM OLD.full_name THEN
NEW.name := NEW.full_name;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER sync_name_columns
BEFORE INSERT OR UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION sync_user_name();Deploy this. Your running app still uses name — that is fine, the trigger keeps full_name in sync.
Phase 2: Migrate — Deploy code that uses full_name
Update your application code to read and write full_name. Deploy. Verify in production that everything works.
Phase 3: Contract — Remove the old column
-- Migration 002: remove_name_from_users.sql
-- Only run this after all code is using full_name
DROP TRIGGER sync_name_columns ON users;
DROP FUNCTION sync_user_name();
ALTER TABLE users DROP COLUMN name;No downtime. No outage window. Three small deploys instead of one dangerous one.
Example 2 — Adding a NOT NULL Column
Adding a non-nullable column to a large table is one of the most common causes of migration-induced downtime. A naive ALTER TABLE ADD COLUMN name TEXT NOT NULL on a 10-million-row table locks the entire table while Postgres rewrites it.
The wrong way:
-- Locks the table for minutes on large tables
ALTER TABLE orders ADD COLUMN processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW();The right way:
Phase 1: Add as nullable
-- Instant — no table rewrite, no lock
ALTER TABLE orders ADD COLUMN processed_at TIMESTAMPTZ;Phase 2: Backfill in batches
-- Backfill in small batches to avoid lock contention
-- Run this as a script, not a single migration
DO $$
DECLARE
batch_size INT := 10000;
last_id UUID := '00000000-0000-0000-0000-000000000000';
updated INT;
BEGIN
LOOP
UPDATE orders
SET processed_at = created_at -- Or whatever the correct value is
WHERE id > last_id
AND processed_at IS NULL
RETURNING id INTO last_id;
GET DIAGNOSTICS updated = ROW_COUNT;
EXIT WHEN updated < batch_size;
-- Small pause between batches to not hammer the DB
PERFORM pg_sleep(0.1);
END LOOP;
END $$;Phase 3: Add the NOT NULL constraint — without a rewrite
-- In Postgres 12+, you can add a NOT NULL constraint using a check constraint
-- This validates without a full table rewrite if all rows already have values
ALTER TABLE orders
ADD CONSTRAINT orders_processed_at_not_null
CHECK (processed_at IS NOT NULL) NOT VALID;
-- Validate in the background — acquires only a SHARE UPDATE EXCLUSIVE lock
ALTER TABLE orders VALIDATE CONSTRAINT orders_processed_at_not_null;
-- Once validated, you can make it a real NOT NULL
ALTER TABLE orders ALTER COLUMN processed_at SET NOT NULL;
ALTER TABLE orders DROP CONSTRAINT orders_processed_at_not_null;Example 3 — Adding an Index Without Locking
A regular CREATE INDEX locks writes to the table for the duration of the build. On a large table, that can be minutes.
-- BAD: Locks writes
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- GOOD: Builds in the background, no write lock
-- Takes longer but is safe for production
CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id);CONCURRENTLY cannot run inside a transaction block. If your migration tool wraps everything in a transaction, you need to run index creation separately or configure your tool to skip the transaction for that migration.
Migration Tooling
With raw SQL and node-pg-migrate
npm install node-pg-migrate pg// package.json scripts
{
"scripts": {
"migrate:up": "node-pg-migrate up",
"migrate:down": "node-pg-migrate down",
"migrate:create": "node-pg-migrate create"
}
}// migrations/1716400000000_add-full-name-to-users.js
exports.up = async (pgm) => {
pgm.addColumn('users', {
full_name: { type: 'text' }
});
// Backfill
pgm.sql('UPDATE users SET full_name = name WHERE full_name IS NULL');
};
exports.down = async (pgm) => {
pgm.dropColumn('users', 'full_name');
};With Drizzle ORM
npm install drizzle-orm drizzle-kit pg// src/db/schema.ts
import { pgTable, text, uuid, timestamp } from 'drizzle-orm/pg-core';
export const users = pgTable('users', {
id: uuid('id').primaryKey().defaultRandom(),
email: text('email').notNull().unique(),
fullName: text('full_name'), // Added in Phase 1
createdAt: timestamp('created_at').notNull().defaultNow(),
});# Generate migration from schema diff
npx drizzle-kit generate:pg
# Apply migration
npx drizzle-kit push:pgRunning Migrations at Deploy Time
# In your GitHub Actions deploy workflow
- name: Run database migrations
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.VPS_HOST }}
username: ${{ secrets.VPS_USER }}
key: ${{ secrets.VPS_SSH_KEY }}
script: |
cd /opt/app
# Run migrations before deploying new code
docker compose run --rm app npm run migrate:up
# Then deploy
docker compose up -d --no-deps appMigrations run before new code deploys. Since Phase 1 migrations are always backwards-compatible, the old code running during migration is unaffected.
The Rules That Prevent Migration Incidents
✅ Never rename a column in one step
✅ Never add a NOT NULL column without a default or backfill first
✅ Always use CREATE INDEX CONCURRENTLY on large tables
✅ Never DROP a column until zero code references it
✅ Always run migrations before deploying new code, not after
✅ Test migrations on a production-size data clone before running live
✅ Always have a down migration — and test it
✅ Never run a migration inside the same transaction as application codeThe expand-contract pattern adds two extra deploys to what used to be one. Those two deploys are the cost of never having a schema-induced outage again. For any team deploying more than once a week, it is an obvious trade.
Comments (0)
Login to post a comment.