Zero-Downtime DB Migrations for SaaS 2026
TL;DR
Most SaaS boilerplates run prisma migrate deploy in CI and pray. That works until it doesn't. Dropping a column while the old code is still running causes instant 500 errors. Adding a non-nullable column without a default causes the migration to lock the table. Renaming a column is a multi-deploy operation. This guide covers the patterns that let you migrate without downtime: expand/contract, shadow databases, and coordinated deploy/migrate sequencing. None of this is exotic — just disciplined sequencing.
Key Takeaways
- Never drop or rename columns in the same deploy that changes code — multi-step expand/contract pattern
- Non-nullable columns need a default or Postgres locks the entire table during backfill
- Prisma migrate deploy vs dev:
deployis for production (forward-only),devis for local (creates migration files) - Shadow database: Prisma needs a second DB to generate migration files safely — use a separate schema in Postgres
- Lock timeouts: add
lock_timeoutto prevent long migrations from blocking reads - Rollback plan: always know how to undo the last migration before deploying
The Expand/Contract Pattern
The fundamental pattern for zero-downtime schema changes:
Instead of a single risky migration:
-- ❌ DANGEROUS: Rename column in one step
ALTER TABLE users RENAME COLUMN full_name TO display_name;
-- ^ Instantly breaks any code reading `full_name` that's still deployed
Use three separate deploys:
Deploy 1: Expand (add new structure, keep old)
→ Schema: ADD COLUMN display_name
→ Code: writes to BOTH full_name AND display_name
Deploy 2: Migrate data + transition (if needed)
→ Backfill: UPDATE display_name WHERE display_name IS NULL
→ Code: reads from display_name, writes to both
Deploy 3: Contract (remove old structure)
→ Schema: DROP COLUMN full_name
→ Code: only uses display_name
Common Migrations and How to Do Them Safely
Adding a Non-Nullable Column
// ❌ WRONG: Fails on tables with existing rows (no default)
// prisma/migrations/xxx_add_plan.sql
ALTER TABLE users ADD COLUMN plan VARCHAR NOT NULL;
// ✅ CORRECT: Add with default, later make stricter
Step 1: Add the column with a default value:
// schema.prisma — add with default
model User {
// ...
plan String @default("free")
}
-- Generated migration is safe (default means no table lock):
ALTER TABLE "users" ADD COLUMN "plan" TEXT NOT NULL DEFAULT 'free';
Step 2: Later, if you want to remove the default (to force explicit values):
-- Safe to do after all existing rows have values:
ALTER TABLE "users" ALTER COLUMN "plan" DROP DEFAULT;
Renaming a Column
Deploy 1 (Expand): Add new column, write to both
Deploy 2 (Migrate): Backfill new column, switch reads to new
Deploy 3 (Contract): Drop old column
// Step 1: Add display_name alongside full_name
model User {
fullName String
displayName String? // nullable initially
}
// Application code in Step 1 — writes to both:
await db.user.update({
where: { id },
data: {
fullName: name, // old column (still being read)
displayName: name, // new column (being populated)
},
});
-- Step 2: Backfill migration (batched to avoid locking):
DO $$
DECLARE
batch_size INTEGER := 1000;
offset_val INTEGER := 0;
rows_updated INTEGER;
BEGIN
LOOP
UPDATE users
SET display_name = full_name
WHERE id IN (
SELECT id FROM users
WHERE display_name IS NULL
LIMIT batch_size OFFSET offset_val
);
GET DIAGNOSTICS rows_updated = ROW_COUNT;
EXIT WHEN rows_updated < batch_size;
offset_val := offset_val + batch_size;
PERFORM pg_sleep(0.01); -- Brief pause to reduce lock contention
END LOOP;
END $$;
// Step 3: Remove old column
model User {
displayName String // now required
// fullName removed
}
Adding an Index to a Large Table
Regular index creation locks the table for writes. Use CONCURRENTLY:
-- ❌ Locks table during creation:
CREATE INDEX idx_users_email ON users(email);
-- ✅ Non-blocking (takes longer but doesn't block writes):
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
In Prisma migrations, you can't use CONCURRENTLY directly via the schema. Use a raw SQL migration:
// prisma/migrations/xxx_add_email_index/migration.sql
-- This migration was created manually
-- CreateIndex
-- Using CONCURRENTLY to avoid locking the users table:
CREATE INDEX CONCURRENTLY "users_email_idx" ON "users"("email");
Mark it in schema.prisma after creation:
model User {
email String @unique
@@index([email])
}
Setting Up Lock Timeouts
Prevent a slow migration from blocking the entire app:
-- Add to the start of long-running migrations:
SET lock_timeout = '5s'; -- Fail fast if we can't get a lock
SET statement_timeout = '30s'; -- Fail fast if the statement takes too long
-- Now run your migration:
ALTER TABLE users ADD COLUMN last_seen_at TIMESTAMPTZ;
In Prisma, add these to the beginning of the migration file:
-- prisma/migrations/xxx/migration.sql
SET lock_timeout = '5s';
SET statement_timeout = '60s';
ALTER TABLE "users" ADD COLUMN "last_seen_at" TIMESTAMP(3);
Production Deploy Sequence
The correct order for code + schema changes:
When adding new columns/tables (backward-compatible):
1. Run migration (new column exists, old code doesn't use it yet)
2. Deploy new code (now reads/writes new column)
New code can safely add columns before deploying because the old code ignores unknown columns.
When removing columns/tables (requires expand/contract):
1. Deploy new code (stops reading the old column, writes to new)
2. Wait for all instances to update (rolling deploy completes)
3. Run migration (drop the old column — code no longer references it)
Setting Up in CI/CD (GitHub Actions + Vercel)
# .github/workflows/deploy.yml
name: Deploy
on:
push:
branches: [main]
jobs:
migrate-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Run database migrations
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
run: npx prisma migrate deploy
# migrate deploy is safe for production:
# - Only runs pending migrations
# - Never creates new migration files
# - Idempotent (safe to run multiple times)
# - Fails fast if migration is incompatible
- name: Deploy to Vercel
# Deploy runs AFTER migration — new code matches new schema
uses: amondnet/vercel-action@v25
with:
vercel-token: ${{ secrets.VERCEL_TOKEN }}
vercel-org-id: ${{ secrets.VERCEL_ORG_ID }}
vercel-project-id: ${{ secrets.VERCEL_PROJECT_ID }}
vercel-args: '--prod'
Setting Up Shadow Database
Prisma needs a shadow database to generate migration files safely (it creates + drops the shadow DB to diff schemas):
# .env (development only)
DATABASE_URL="postgresql://user:pass@localhost:5432/myapp"
SHADOW_DATABASE_URL="postgresql://user:pass@localhost:5432/myapp_shadow"
-- Create shadow DB once:
CREATE DATABASE myapp_shadow;
// schema.prisma
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
shadowDatabaseUrl = env("SHADOW_DATABASE_URL") // Only for dev
}
For Supabase, you can use a separate schema instead of a separate database:
SHADOW_DATABASE_URL="postgresql://user:pass@db.xxx.supabase.co:5432/postgres?schema=shadow"
What to Do When a Migration Goes Wrong
// 1. Check what migrations have been applied:
// $ npx prisma migrate status
// 2. If migration is stuck (blocking lock):
// Find blocking queries:
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_start;
// Kill a specific blocking query:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid = <blocking_pid>;
// 3. Mark a failed migration as "rolled back" in Prisma:
// $ npx prisma migrate resolve --rolled-back <migration_name>
// 4. If you need to manually fix the schema and mark as applied:
// $ npx prisma migrate resolve --applied <migration_name>
Migration Checklist Before Deploying
Pre-migration:
[ ] Migration is backward-compatible with current code
[ ] New non-nullable columns have default values
[ ] Indexes use CONCURRENTLY if table has >100K rows
[ ] Lock timeout added for DDL operations
[ ] Tested in staging with production data size
Deploy sequence:
[ ] Merge code PR that still works with OLD schema
[ ] Run `prisma migrate deploy` against production DB
[ ] Verify migration completed successfully
[ ] Deploy new code to production
Post-migration:
[ ] Monitor error rates for 15 minutes post-deploy
[ ] Verify key user flows work (login, payment, core feature)
[ ] Check slow query log for new query patterns
Why Migrations Break in Production
Most migration problems in production share a common root cause: the developer tested the migration against a development database with a few hundred rows, but the production database has millions. Operations that complete in milliseconds on small datasets can lock tables for minutes at production scale. The impact is immediate: every new write to the table blocks, connection pools fill up, and your application starts returning 500 errors.
The ALTER TABLE command in PostgreSQL acquires an AccessExclusiveLock — the strictest lock level — for most column-level changes. This lock prevents reads and writes on the table until the operation completes. For ADD COLUMN with a default value that PostgreSQL must backfill on existing rows, the lock is held for the duration of the backfill. On a table with 10 million rows, this can take 30-120 seconds.
Since PostgreSQL 11, adding a column with a volatile default value no longer requires rewriting the table — PostgreSQL stores the default and returns it at query time until rows are updated. This means ADD COLUMN plan TEXT DEFAULT 'free' is now safe on large tables in modern PostgreSQL. The exception is NOT NULL constraints without a server default — these still require a full table scan if the column has no default to apply to existing rows.
What the Expand/Contract Pattern Solves
The expand/contract pattern (also called the parallel change pattern) exists because schema changes and code changes often can't be deployed atomically. Your deploy process runs the migration, then rolls out new code. During the rollout, both old code and new code are running simultaneously — if the schema change breaks the old code's assumptions, you get errors.
The pattern creates a deployment window where both old and new schemas are valid. Deploy 1 (Expand) adds new structure without removing old structure — both code versions work. Deploy 2 (Contract) removes old structure only after all instances run the new code that doesn't use it. The intermediate state is slightly wasteful (writing data to two columns) but safe.
Teams using rolling deployments or blue-green deployments particularly benefit from this approach. A rolling deploy means new code and old code run side by side for several minutes. A blue-green deploy means the old environment stays running until the new one is verified healthy. In both cases, schema changes that break the old code's queries cause errors during the transition window.
Boilerplate-Specific Considerations
Different SaaS boilerplates handle the database migration layer differently, and understanding these differences affects how you apply the patterns above.
Boilerplates using Prisma (ShipFast, Supastarter, SaaSBold) generate SQL migration files from schema changes. The prisma migrate dev command diffs your current schema against the database and generates the SQL. Review the generated SQL before running it in production — Prisma doesn't always generate the safest possible migration. For example, Prisma will generate ALTER TABLE ... ADD COLUMN ... NOT NULL for non-nullable fields without a default, which is safe only if the table is empty or if you add a default in the schema.
Boilerplates using Drizzle (parts of ShipFast, some indie starters) give you more control over the migration SQL but also more responsibility. Drizzle's drizzle-kit generate creates a migration file; you're expected to review and potentially modify it before running.
Neither ORM prevents you from writing a dangerous migration. The responsibility for applying the expand/contract pattern and reviewing for lock issues sits with the developer.
When to Use a Migration Tool Like atlas or pgroll
For teams running high-traffic SaaS applications where any table lock causes visible user impact, dedicated migration tools provide safety checks that Prisma and Drizzle don't. pgroll (by xataio, open source) runs schema migrations in phases automatically — it handles the expand/contract pattern internally, making both old and new schemas available during the transition. This eliminates the multi-deploy pattern for many common changes.
atlas provides migration linting — it analyzes your migration files and warns about dangerous operations like missing indexes, non-concurrent index creation on large tables, and destructive schema changes. Atlas integrates with GitHub Actions and can block PRs that include dangerous migrations. For teams scaling past 100,000 rows per table, migration linting pays for its setup cost quickly.
The migration tooling landscape continues to evolve, and these tools work alongside Prisma and Drizzle rather than replacing them.
Practical Advice for SaaS Boilerplate Developers
The boilerplates that include Docker Compose configurations (Epic Stack, Makerkit, Supastarter) make it easy to run a local Postgres instance that mirrors production settings. Use this to test migrations with realistic data volumes before deploying to production — run pgbench or load a data dump to scale your local database to production-representative sizes.
For boilerplates without Docker Compose (T3 Stack, ShipFast), connect your local migration tooling to a staging Neon or Supabase database branch. Both Neon and Supabase support branch databases — you get a copy of production data structure in an isolated branch, make your schema changes, verify the migration time, and then promote the change to production. This is the most reliable way to validate migration safety without actual production risk.
The migration patterns in this guide — expand/contract, concurrent index creation, lock timeouts, and staged deploy/migrate sequencing — apply regardless of which SaaS boilerplate or ORM you start from. The boilerplate determines the ORM and initial schema structure; it doesn't determine how safely you evolve that schema once real users depend on it. Building the discipline of writing backward-compatible migrations from the very first database change is substantially easier than retrofitting safe patterns after a production incident forces the issue.
Most SaaS applications hit the first non-trivial migration challenge between 5,000 and 50,000 users, when the user table starts to matter for query performance and downtime is no longer acceptable. Starting with the expand/contract pattern and lock timeout discipline from the beginning means your team already knows how to handle that moment before it arrives under pressure.
Migration safety is a compound discipline. The checklist above covers the mechanical steps; the cultural side is equally important — making it normal to ask "is this migration backward-compatible?" in code review, making it routine to add lock timeouts to DDL operations, and making it expected that database changes and code changes travel as separate deploy steps. Teams that build this into their review process rarely have migration-related incidents. Teams that treat it as optional always learn the hard way.
See the self-hosted vs cloud Supabase guide for database hosting options that support branching for migration testing, and the Drizzle vs Prisma comparison for boilerplates for ORM-level migration tooling differences including how each handles rollbacks. For a production deployment checklist covering migrations alongside environment variables and CI/CD pipeline setup, see the SaaS production deployment guide.