Alembic migrations without tears
A practical guide to running Postgres migrations with Alembic in production — autogeneration, manual edits, and zero-downtime patterns.
What Alembic gets right
Alembic generates a migration file from your model changes. You run it. The schema updates. Most of the time this works.
The rest of the time you need to know what is happening underneath.
The everyday flow
alembic revision --autogenerate -m "add user.phone_number"
alembic upgrade head
Read the generated file before running it. Autogeneration misses:
- Server-side defaults
- Check constraints
- Index types (BTREE vs GIN vs GiST)
- Enum changes
- Sometimes, primary key changes
These need manual edits. Get into the habit.
Migration etiquette
- One conceptual change per migration. A migration that adds three unrelated columns is harder to revert.
- Migrations are append-only. Never edit a migration that has run in production. Write a new one.
- Test the downgrade. If you cannot
alembic downgrade -1cleanly, you have a one-way migration. Sometimes unavoidable; usually a smell. - Run migrations as a separate deploy step. Not in the app entrypoint. Otherwise N replicas race to migrate.
Zero-downtime migrations
The painful ones:
- Renaming a column: ship in three deploys. Deploy 1: add new column, write to both. Deploy 2: backfill, read from new. Deploy 3: drop old.
- Adding a NOT NULL column with no default:
ALTER TABLElocks. Add as nullable, backfill, then add the constraint. - Adding an index on a large table: use
CREATE INDEX CONCURRENTLY. Alembic supports this — setpostgresql_concurrently=Trueandop.execute("commit")if needed.
Test the migration on a copy of production data before running it. pg_dump + restore + alembic upgrade. We have caught hours of downtime this way.
When to use raw SQL
Anything Alembic does poorly: complex constraint changes, data backfills, anything Postgres-specific. op.execute("...") is fine. The migration file is for humans; readability beats abstraction.
What we will not do
We will not auto-run migrations on production from a CI job without a human approval. The cost of a bad migration is too high. A "click to run" gate in CI is the right amount of friction.