AI-assisted Postgres schema surgery
May 15, 2025
By Danny Hermes
This post describes a zero-downtime schema change to a high contention table — guided (and greatly accelerated) by ChatGPT.
We made the wrong assumption
On our (real) address
table in the database, we put a constraint on the
street
column directly in the definition: street VARCHAR(50)
. (Yes we've
read the PostgreSQL "Don't Do This" wiki.) Unfortunately, a street being
capped at 50 characters is a falsehood programmers (may) believe about
addresses. We ran into legitimate data from a user with a length of 51:
12345 Doctor Martin Luther King Junior Street North1 in
St. Petersburg, FL.
We wanted to support this user, so we needed to relax this constraint and get
the data in the database. However, the address
table is a high contention
table in our application: it receives a constant stream of reads and writes.
This makes it much more challenging to just change a column's type!
Online migrations and table contention
In our application, we have active users at all times of the day. This means there is no opportunity for taking downtime without impacting users. As a result, we strictly use online migrations for evolving our PostgreSQL schema. This also allows our team to be nimble and make changes throughout the workday to add new features or improve existing ones.
In order to resolve the character length issue, we needed to change the column
type of street
(to either VARCHAR(N>50)
or to TEXT
). Changing the table
schema will always require an ACCESS EXCLUSIVE
lock, which means all
queries to the table will need to wait for the schema change to complete.
For a high contention table that receives multiple reads per second and close to
one write per second, an ACCESS EXCLUSIVE
lock can cause the entire
application to stall. This is not acceptable to us or our users! Luckily, most
table schema changes resolve in a few microseconds after acquiring a lock due to
steady improvements in PostgreSQL over the last 15+ years.
However, changing a column's type is a special type of change: it may require a
full table rewrite. For a large table like address
, a table rewrite is a very
long and costly operation, which would mean holding the ACCESS EXCLUSIVE
lock
for a long time. Changing from a VARCHAR(50)
to TEXT
is a binary-coercible
column change and should not incur a full table rewrite on modern
PostgreSQL.
I get by with a little help from my (AI) friend
Although the column types are binary-coercible, we wanted to be VERY sure our application remained stable, so we chose to avoid2 that and instead:
- Introduce a new column
street_new
with the correct type - Add
CHECK
onstreet_new
that can constrain text length but be more easily changed in the future without a column type change - Copy data from
street
into the new column (in batches) and mirror writes - Swap names after backfill is complete
- Drop the
street_new
column
I knew conceptually how to do all of this but I leaned heavily on ChatGPT to just do all the actual work of writing DDL, SQLAlchemy, and Alembic code while I did the thinking. Even as a daily user of LLM tools, I was really surprised how perfectly ChatGPT nailed this and helped me go really fast:
Add new column
First step we'll add the new column. We elected to do this with DDL directly rather than use Alembic migrations. This gave us the maximum level of control while managing this sensitive change:
Note that it's crucial that we cap the amount of time we're willing to wait on and hold a lock as well as the total amount of time a statement will take. Additionally, we print out before / after timestamps for auditing in case any issues occur3.
new-column.sql
output
Constrain new column
Adding a CHECK
constraint still requires an ACCESS EXCLUSIVE
lock on the
table so we get it out of the way as soon as we can. (For large tables, it may
make more sense to add this constraint as not valid and then
VALIDATE CONSTRAINT
after.)
Note that CHAR_LENGTH(street_new)
refers to the new column but we want it to
refer to street
when things are renamed and all said and done. Luckily
PostgreSQL will track this when the column gets renamed!
add-length-constraint.sql
output
Mirror writes
In order to ensure that we can swap street
and street_new
, we need them both
to have the same data! To do this WITHOUT locking the table for an extended
period of time, we need to be able to backfill data in batches and ensure future
writes get mirrored to both columns. We can safely assume that street_new
NEVER receives writes directly: the application has only ever heard of
street
.
It's important that street_new
is a "stable" name4 in the
mirror_street_to_new()
function. After we swap the columns, we'll
leave the sync_street_to_street_new
trigger in place to keep the data in sync
for final validation. So if street_new
becomes an invalid name, the function
will begin to cause all writes to fail.
add-trigger-to-table.sql
output
Backfill
Recall the primary reason we chose column mirroring was to avoid a table
rewrite. To that end, we need to backfill data into street_new
in batches.
Luckily, we use created_at
and updated_at
metadata columns in ALL of our
tables so we can define batches based on most recent update. For example:
The updated_at
time is a moving target because addresses can be updated
during our backfill process. However, that's OK because any newer updates will
already be in sync due to our sync_street_to_street_new
trigger.
Validate mirrored data
After completing the backfill, we need to ensure that street
and street_new
are identical. Note that street = street_new
is not a sufficient predicate
because nulls will not compare equal with the =
operator, so we use
IS DISTINCT FROM
:
Column (name) swap
In order to swap street
and street_new
, we first need to rename street
to
a different column name to enable the swap without both columns using the same
name at once:
Recall we want to continue to use the street_new
name because at this point in
the migration, mirroring is still enabled and mirror_street_to_new()
has a
"generic" (not table-specific) reference to NEW.street_new
:
swap-column-names.sql
output
Say goodbye
At this point, after a final validation,
we can stop mirroring and drop the street_new
column:
Notice that dropping the column may take a slightly longer time than some of the other migrations. However, the primary issue is getting a table lock (which is highly variable depending on current application activity):
drop-old-column.sql
output
Migration (Alembic)
Though we ran this migration via many small and careful steps, it's still useful to document this step for future team members. Our team uses Alembic for the large majority of migrations. (The raw DDL escape hatch is only necessary in cases where we need to make a sensitive change that'll require locks that are potentially problematic for the application.)
To document the change at a conceptual level:
We always use alembic stamp
to memorialize changes made outside of a typical
alembic upgrade
operation:
Model changes (SQLAlchemy)
In addition to changing the Column(TEXT)
in the model definition, the extra
CHECK
constraint must be tracked in __table_args__
:
Conclusion
Fixing schema constraints in high contention tables like address
is rarely
straightforward. But with a careful, methodical approach, it's absolutely
achievable without user-facing downtime. It's important to combine deep
understanding of PostgreSQL internals with thoughtful engineering practices like
online backfills and trigger-based mirroring. Just as importantly, it's a
reminder that tools like LLMs can meaningfully augment engineering workflows
— not by replacing decision-making, but by accelerating safe and accurate
implementation. Schema changes may always be risky, but they don't have to be
scary!
Footnotes
-
OK not actually 12345, but a 5 digit street number! ↩
-
Early in the process, we attempted to change column type and kept getting timeouts, some of which made slight impacts on application performance. This may very well have been bad luck with lock contention. Rather than wait to find out, we elected to go with the safer route and totally avoid the chance of a table rewrite. ↩
-
In future examples, we'll hide the
SELECT NOW()
and the setting of timeouts but it's crucial to always track these when doing database operations. ↩ -
Ask me how I know this! ↩