Postgres Partition Conversion: minimal downtime

kyle Hailey
Apr 8, 2023
14 min read

The red spikes in the graph signify anomalies, with the dips in the top left indicating a significant decline in transaction rates, and the spike on the top left representing a substantial increase in latency.

Rearchitecting a database online can be both challenging and exciting. In the weeks leading up to the conversion, I experienced a persistent shoulder cramp, likely due to the stress of the impending process. One of the primary concerns was not only the conversion itself but also the potential consequences if things didn't go smoothly. In the worst-case scenario, the application could be down for an entire day, necessitating a labor-intensive and time-consuming rebuild process.

Having advised production DBAs for years on managing high-throughput Oracle databases, I now found myself in a similar situation but with PostgreSQL. I had an 8TB table handling 8,000 queries per second that had become unmanageable and required partitioning.

The challenge at hand was, "How can I partition this table while minimizing downtime?"

Looks easy : Look Mom, no hands

I searched online for information and only found minimal information. The best URL I found was

https://rodoq.medium.com/partition-an-existing-table-on-postgresql-480b84582e8d

ALTER TABLE payments RENAME TO payments_old;
CREATE TABLE payments (
       payment_id int NOT NULL DEFAULT nextval('public.payments_id_seq'::regclass)
     , created timestamptz NOT NULL
     , updated  timestamptz NOT NULL DEFAULT now()
     , amount float
     , status varchar DEFAULT 'new')
PARTITION BY RANGE (created);
ALTER TABLE payments_old ADD CONSTRAINT payments_old
   CHECK  (created >= '2000-01-01' AND created <'2021-04-01');
ALTER TABLE payments ATTACH PARTITION payments_old
    FOR VALUES FROM ('2000-01-01') TO ('2021-04-01');

Basically there are 4 steps:

Rename old table
Create new table
1. with partitioning
Add constraint on old table for it's proposed partition ranges
Attach old table as a partition to new partition table

Sounds easy right?

What about indexes? What about Triggers?

And guess what, there are other traps to watch out for!

Inheriting Indexes

Well there is an easy solution for indexes. Create the new partition table using "LIKE payements_old" and it inherits indexes:

CREATE TABLE payments ( LIKE payments_old INCLUDING ALL ) PARTITION BY RANGE (created);

New partitions will have indexes of the table name used in the LIKE clause.

Triggers

My experience, and maybe I did something wrong, is that the triggers were not inherited. On the other hand creating triggers on the new partitioned table were inherited on newly added partitions. I didn't find this out until after we did the conversion and the application was running and we were missing data due to missing triggers. I hastily created the triggers on the partition table. They were there on the old payments_old by not new partitions. Once the triggers were created on the partition table, then they were inherited by new partitions.

You can get the trigger creation statements from the original table with

SELECT pg_get_triggerdef(oid) from pg_trigger  where tgrelid ='payments_old'::regclass::oid;

Then create those triggers on the new partition table and then new partitions added to the table will have those triggers.

Primary Key

What about primary keys?

Did your table already have a primary key? Is the partition field different from the primary key? That was my case. I had a primary key on "ID" and was adding a partition on a date "created".

Guess what ? The primary key has to have the partition key in it. There can be only one primary key. Thus you have to drop the old primary key and add a new one.

In my case we are talking about an 8TB table. Creating an index takes between 3-12 hours depending whether using "CONCURRENTLY" and what level of parallelism. I could get the first scan of create index to run in parallel, but in the case of "CONCURRENTLY" the second pass ran in single thread mode. What do we do? Create a new non primary key UNIQUE index , and in my case do it concurrently to run with live load and in my case it took 12 hours:

CREATE UNIQUE INDEX CONCURRENTLY payments_id_create_idx ON  payments(id,create);

Now when we are ready to convert to a partition table, we have an index ready to become the new primary key.

The following will find the primary key and then drops it. (don't do this now, do it when you are ready with everyting)

select concat('alter table jobs drop constraint ', constraint_name) as drop_constraint 
from information_schema.table_constraints 
where table_name ='payment' 
  and constraint_type ='PRIMARY KEY' \gset
\echo select :'drop_constraint';
\set drop_constraint 'begin execute '   :'drop_constraint''; end;'
\echo :drop_constraint;
do  :'drop_constraint';

Command to promote the new UNIQUE index with partiton key as the primary key (again only do this when you are ready with everything)

alter table payments add constraint payments_id_create_pk primary key using index payments_id_create_idx;

Foreign Key

Now for a fun part that I have yet to find a solution to. Foreign Keys have to reference the primary key, and the new the primary key has an extra field. Thus you have to drop the Foreign Key references when putting in place the new primary key. If you want to recreate these foreign keys, you have to add the partition key to the Foreign Key to mirror the Primary Key. In my case the "CREATE" date is not in the Foreign Key tables, so for the time being I DON'T have a solution. I just dropped the foreign keys and now we are running along with out the constraints.

Unique Indexes

Any other unique indexes need to be recreated with the partition key.

You can pre-create them and then drop the original unique indexes.

Date Range Constraint

Now for some other fun stuff ... in order to add the old payments table to the new partition payments table, you need to have a data range constraint on the table that matches the data range of the partition.

Adding a constraint will cause the who table to be scanned. Not sure why when there is an index on the constraint field but it does. Frustrating.

The trick here is to add the constraint as NOT VALID and avoid the full scan.

Adding this constraint does take out a lock and will have to wait until all transactions complete. If there a long running transaction it will block. If Autovacuum is in the middle of a vacuuming the table it will block. Luckily one can kill the Autovacuum process and it will just restart up, so if you launch the command, kill Autovacuum, the command will get the lock and continue and Autovacuum will start back up. On the other hand all new transactions will back up on the lock that the ALTER is trying to get. If the ALTER gets it right away, all good, it's immediate. If ALTER has to wait then there will be a backup. This was a dicy move for me. I ended up doing this after the instances application was down with a trick that that allowed me to validate it immediately.

(NOTE using "\gset" messes up copy and pasting to psql. If coping and pasting to psql you haver to run the \gset line by itself and manually hit return. Not sure why. The multiple commands work fine in a script)

select (min(create)::timestamp::date+ interval '0' hour )  as mn,        ((max(create) +  interval '1' day)::timestamp::date + interval '0' hour )as mx from payment  \gset

\echo :'mn', :'mx'

ALTER TABLE payment 
ADD CONSTRAINT payment_check_create 
CHECK (    create IS NOT NULL
AND        create >= :'mn'::timestamp without time zone
AND        create <  :'mx'::timestamp without time zone )
NOT VALID;

Even better watch out for timezones. I put this constraint on at midnight but did not specify the timezone. I targeted midnight but midnight was in UTC which for me on the west coast in PST was 5pm. At 5pm on the day I was planning the conversion for midnight the database came to a screeching halt until I removed the constraint and limped away with my tail between my legs postponing the conversion to another day.

Validating the Date Range Constraint

Once the constraint is created we can validate it concurrently which means it will run online without taking out locks. Unfortunately this runs in single threaded without parallel option. Concurrently also does two passes, so in my case this was a 12 hour operation. Now there is a trick to do it immediately and avoid the full table scan. That's to change the data dictionary directly:

 UPDATE pg_constraint 
 SET    convalidated = true  
 WHERE  conrelid = 'payment'::regclass 
  AND   conname  = 'payment_check_create';

Now unfortunately, on Google Cloud SQL, this does not work because the manage databases permissions are not sufficient for an end users. Well as a fairly large GCP user, I got on the support line and raised a P1. They might not like me sharing this publicly, but the were awesome and got me an procedure that day! Soooo thankful:

SELECT do_set_convalidated('payment'::regclass, 'payment_check_create', true);

As we can see earlier, validating the constraint takes along time and worse, once it is in place you have to do the partition convert before 5pm PST if the constraint is on UTC midnight. Now with this procedure, I could avoid validating the constraint and even wait to create the constraint until the application was down during the small time window for conversion.

In my case, once the unique index on "ID,CREATE" was finished, I was ready to do the convert. I had no other UNIQUE indexes that I needed to convert.

The next step was to pick a time window and bring the application down, convert the table, then bring the application backup.

The vacuum process was a real pain. The vacuum was running all the time on this table and it blocked all of my commands so I was constantly having to kill it. I wanted to avoid having to kill it over and over. The only way I could find to stop it completely was to change a startup parameter that could only be changed with a db bounce:

autovacuum off

Considering I would be under enormous pressure during the convert, I wanted as much automated and scripted as possible to avoid manual typing and on the fly decision making, so I looked into a way to change startup parameters via GCPs API:

curl -X PATCH \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json" \
     -d '     
     {
  "settings": {
    "databaseFlags": [
      { "name": "autovacuum", "value": "off" },
      { "name": "autovacuum_max_workers", "value": "10" },
      { "name": "cloudsql.enable_pg_cron", "value": "on" },
      { "name": "cloudsql.enable_auto_explain", "value": "on" },
      { "name": "idle_in_transaction_session_timeout", "value": "60000" },
      { "name": "log_checkpoints", "value": "on" },
      { "name": "log_min_duration_statement", "value": "-1" },
      { "name": "maintenance_work_mem", "value": "1048576" },
      { "name": "max_connections", "value": "2000" },
      { "name": "max_replication_slots", "value": "20" },
      { "name": "max_wal_senders", "value": "20" },
      { "name": "max_wal_size", "value": "16000" },
      { "name": "max_worker_processes", "value": "96" },
      { "name": "random_page_cost", "value": "1" },
      { "name": "track_activity_query_size", "value": "32768" },
      { "name": "work_mem", "value": "131072" }
    ]
  }
}
 ' \
"https://sqladmin.googleapis.com/sql/v1beta4/projects/${PROJECT_ID}/instances/${INSTANCE_NAME}"

I took advantage of the down time to change a bunch of parameters.

WARNING: this bounces the database. Unlike Amazon RDS where you can change parameters without a bounce to take affect when ever there is a bounce, in GCP CLOUD SQL there is no such option.

Testing, Testing , Testing

I found many issues by testing on a clone.

I actually should have tested much more.

One issue that the clone brought up was an issue with the constraint on payments_old.

I put a time range constraint on payments_old that matched the time constraints of the partition it was going to become. The science of the constraint is that when the constraint matches the partition constraint and is valid, the database will swap in the table immediately. If there is no constraint, then the database will scan the whole table to make sure it conforms to the partition time range. Well, I tried this over and over. I would add the constraint NOT VALID, run the google procedure to make it valid in the data dictionary, then attach it as a partition and and and ... it would hang. I was like "what, is the google procedure changing the wrong data?" I want as far as validating the whole table , i.e. 12 hours, then swapping it in and it hung again, i.e. scanning the whole table. Now I'm not sure what happened. I had done this before and it had worked. I knew it could work. Thus I looked at the partition definition and it had a NOT NULL constraint as well as "without time zone". I'm not sure what the issue was, I just made the CONSTRAINT look exactly like the PARTITION definition and then it worked.

Ready to go

New Primary Key candidate UNIQUE Index has finished and took 12 hours

CREATE UNIQUE INDEX CONCURRENTLY payment_id_created_idx ON  payment(id,created);

OK lets go.

Bring Application down
Run the GCP API command to turn off AUTOVACUUM
Rename table to old
Create new partition table from old
Create the constraint old to match it's planned partition range
Modify data dicitonary to set constraint to valid
Drop old primary key
Create new primary key with unique index on "id,created"
Add old table to new partition.
Create a few extra partitions for upcoming days

Script


-- turn application off
-- turn REPLICATION off
-- bounce DB with GCP API script
-- run script
-- Bounce DB with GCP API script to turn autovacuum on
-- Bring application up
-- if satisfied its running correct, turn REPLICATION back on

\echo -- BEGIN  set time constraint range
ALTER TABLE payment DROP CONSTRAINT payment_check_created;
select 
   (min(created)::timestamp::date + interval '0' hour ) as mn, 
  ((max(created) + interval '1' day)::timestamp::date + interval '0' hour )as mx 
from payment  \gset
\echo :'mn', :'mx' ;
-- set a contraint on payment on the min and max dates 
\echo  ALTER TABLE payment ADD CONSTRAINT payment_check_created
\echo  CHECK ( created IS NOT NULL AND 
\echo         created >= :'mn'::timestamp without time zone AND 
\echo         created <  :'mx'::timestamp without time zone )
\echo  NOT VALID;

ALTER TABLE payment ADD CONSTRAINT payment_check_created
CHECK ( created IS NOT NULL AND 
        created >= :'mn'::timestamp without time zone AND 
        created <  :'mx'::timestamp without time zone )
 NOT VALID;
\echo -- END  set time constraint range
 
\prompt 'Continue ', response

\echo -- BEGIN  VALIDATE time constraint google code
-- SET time period constraint to true
SELECT do_set_convalidated('payment'::regclass, 'payment_check_created', true);
\echo -- END  VALIDATE time constraint

\echo
\prompt 'Continue ', response

\echo -- BEGIN DROP FOREIGN KEYS
-- DROP FOREIGN KEYS
\echo  ALTER TABLE orders DROP CONSTRAINT orders_id_fkey ;
       ALTER TABLE orders DROP CONSTRAINT orders_id_fkey ;
\echo -- END DROP FOREIGN KEYS

\echo
\echo
\prompt 'Continue ', response

-- DROP INVALID INDEXES
-- take the opportunity to clean up will application id down
-- drop indexes frustratingly takes locks that block the applicaton
\echo ----  BEGIN DROP INVALID INDEXES
\echo drop index payment_v5_rating_idx;
      drop index payment_v5_rating_idx;
\echo ----  END DROP INVALID INDEXES

\echo
\echo
\prompt 'Continue ', response

-- drop Primary Key
\echo ----  BEGIN drop Primary Key
select concat('alter table payment drop constraint ', constraint_name) as drop_constraint
from information_schema.table_constraints where table_name = 'payment' and constraint_type = 'PRIMARY KEY' \gset
\echo select :'drop_constraint';
\set drop_constraint 'begin execute '   :'drop_constraint'  '; end;'
\echo :drop_constraint;
do  :'drop_constraint';
\echo ----  END drop Primary Key

\echo   
\echo   
\prompt 'Continue ', response

\echo ----  BEGIN ADD Primary Key
-- add  Primary Key  new
\echo alter table payment add constraint payment_id_created_pk primary key using index payment_id_created_idx;
alter table payment add constraint payment_id_created_pk primary key using index payment_id_created_idx;
\echo ----  END ADD Primary Key

\echo   
\prompt 'Continue ', response

-- take advantage of downtime to modify table with a new column
\echo alter table payment add column archive boolean;
alter table payment add column archive boolean;
\echo 

\echo ----  BEGIN  rename payment to payment OLD
-- rename payment to payment OLD
\echo ALTER TABLE payment RENAME TO payment_old;
      ALTER TABLE payment RENAME TO payment_old;
--   END  rename payment to payment OLD

\echo   
\echo   
\prompt 'Continue ', response

-- create PARTRITION
\echo ----  BEGIN create PARTRITION
\echo CREATE TABLE payment( LIKE payment_old INCLUDING ALL ) PARTITION BY RANGE (created);
      CREATE TABLE payment( LIKE payment_old INCLUDING ALL ) PARTITION BY RANGE (created);
\echo ----  END create PARTRITION

\echo   
\echo   
\prompt 'Continue ', response

\echo ----  BEGIN ATTACH old payment
\echo ALTER TABLE payment ATTACH PARTITION payment_old FOR VALUES FROM (:'mn') TO (:'mx');
      ALTER TABLE payment ATTACH PARTITION payment_old FOR VALUES FROM (:'mn') TO (:'mx');
\echo ----  END ATTACH old payment

\echo   
\echo   
\prompt 'Continue ', response

select (:'mx')::timestamp::date  + interval '0' day as mn , (:'mx')::timestamp::date  + interval '1' day as mx \gset
 SELECT  'payment_' ||  TO_CHAR(:'mn':: DATE, 'yy_mm_dd') as pname \gset
\echo :'pname'
\echo CREATE TABLE  :pname PARTITION OF payment FOR VALUES FROM (:'mn') TO (:'mx');
      CREATE TABLE  :pname PARTITION OF payment FOR VALUES FROM (:'mn') TO (:'mx');


select (:'mx')::timestamp::date  + interval '0' day as mn , (:'mx')::timestamp::date  + interval '1' day as mx \gset
 SELECT  'payment_' ||  TO_CHAR(:'mn':: DATE, 'yy_mm_dd') as pname \gset
\echo :'pname'
\echo CREATE TABLE  :pname PARTITION OF public.payment FOR VALUES FROM (:'mn') TO (:'mx');
      CREATE TABLE  :pname PARTITION OF public.payment FOR VALUES FROM (:'mn') TO (:'mx');


select (:'mx')::timestamp::date  + interval '0' day as mn , (:'mx')::timestamp::date  + interval '1' day as mx \gset
 SELECT  'payment_' ||  TO_CHAR(:'mn':: DATE, 'yy_mm_dd') as pname \gset
\echo :'pname'
\echo CREATE TABLE  :pname PARTITION OF public.payment FOR VALUES FROM (:'mn') TO (:'mx');
      CREATE TABLE  :pname PARTITION OF public.payment FOR VALUES FROM (:'mn') TO (:'mx');

select (:'mx')::timestamp::date  + interval '0' day as mn , (:'mx')::timestamp::date  + interval '1' day as mx \gset
 SELECT  'payment_' ||  TO_CHAR(:'mn':: DATE, 'yy_mm_dd') as pname \gset
\echo :'pname'
\echo CREATE TABLE  :pname PARTITION OF public.payment FOR VALUES FROM (:'mn') TO (:'mx');
      CREATE TABLE  :pname PARTITION OF public.payment FOR VALUES FROM (:'mn') TO (:'mx');

Follow up

The convert was a success. It took maybe 10 minutes at max. The actual script above without a hick up runs in seconds. Bouncing the database as fast but still took minute.

Biggest issue is that application didn't completely stop and there were a swarm of inserts that blocked my changes.

Starting and stopping the application irrespective of the database took the most time, like 40 minutes (20 minutes down, 20 minutes up) . All and all we were done in 50 minutes.

Not so Quick

The next day Friday, I was taking a break in the sunshine in my back yard. I came back to my home desk at 5pm and looked at my monitor ... "What??" ... all transactions stopped on the database. I knew there were errors but our setup was missing actually errors in the dashboard. It turns out it was that time constraint on the payments_old. I had created on the original payments and then renamed it to payments_old, meaning when I create the new payments like payments_old it inherited the constraint and at midnight UTC i.e. 5 PM PST , all the new records were violating that constraint. Once dropped , everything worked, but I lost another 30 minutes. With that 30 minutes , plus the 15 minutes the day before for the same thing and an hour down time, it managed to get Forbes attention and give us a call to see if everything was alright ( that comes across like they care, but hey it's could be a great article)

Furthermore ... adding partitions each day

I'm still not out of the woods. I still need to set up automatic partition creation. I want to learn what the advantages of pg_partman. I'm thinking using cron would be easy with a partition add script. My first attempts at adding a partition were hair raising. Of course I waited till 4pm the day the last partition was about to run out and guess what?! adding a partition locked!!! I was like "what?! am I going to have to lock out the application for another few minutes?!"

Well turns out there is a trick we already used.

Don't just add a partition.

-- old method will lock under load
-- CREATE TABLE  :pname PARTITION OF public.payment FOR VALUES
--      FROM (:'mn') TO (:'mx');

Create a new table like payment, then the trick is, add a constraint for the time range of the new partition, the attach that table with the constraint as a partition:

CREATE TABLE payment_23_04_13 (LIKE payment INCLUDING ALL);

ALTER TABLE payment_23_04_13 ADD CONSTRAINT payment_23_04_13_constraint
CHECK ( created IS NOT NULL AND
created >= '2023-04-13 00:00:00'::timestamp without time zone AND
created < '2023-04-14 00:00:00'::timestamp without time zone ) ;

ALTER TABLE payment
ATTACH PARTITION payment_23_04_13
FOR VALUES FROM ('2023-04-13 00:00:00') TO ( '2023-04-14 00:00:00');

putting this into an automated script

select max(part_dates.part_date)  as mx from (
 select
        (regexp_matches(pg_get_expr(pt.relpartbound, pt.oid),
        '\((.+?)\).+\((.+?)\)' ))[2] as part_date
from   pg_class      base_tb
  join pg_inherits   i        on i.inhparent = base_tb.oid
  join pg_class      pt       on pt.oid = i.inhrelid
where base_tb.oid = 'public.payment'::regclass) as part_dates
\gset

\echo :'mx' ;

-- set mn and mx from mx
select (:'mx')::timestamp::date  + interval '0' day as mn , (:'mx')::timestamp::date  + interval '1' day as mx \gset

-- create new job partition name
SELECT  'payment_' ||  TO_CHAR(:'mn':: DATE, 'yy_mm_dd') as pname \gset

-- create new job partition constraint name
SELECT  'payment_' ||  TO_CHAR(:'mn':: DATE, 'yy_mm_dd') || '_constraint' as pname_constraint \gset

\echo
\echo :'pname'

\echo
\echo :'pname_constraint'

\echo
\echo CREATE TABLE :pname  (LIKE payment INCLUDING ALL);
CREATE TABLE :pname  (LIKE payment INCLUDING ALL);

\echo
\echo ALTER TABLE :pname  ADD CONSTRAINT :pname_constraint
\echo CHECK ( created IS NOT NULL AND
\echo created >= :'mn'::timestamp without time zone AND
\echo created < :'mx'::timestamp without time zone ) ;
ALTER TABLE :pname  ADD CONSTRAINT :pname_constraint
CHECK ( created IS NOT NULL AND
created >= :'mn'::timestamp without time zone AND
created < :'mx'::timestamp without time zone ) ;


\echo
\echo ALTER TABLE payment
\echo ATTACH PARTITION :pname
\echo FOR VALUES FROM (:'mn') TO ( :'mx');
ALTER TABLE payment
ATTACH PARTITION :pname
FOR VALUES FROM (:'mn') TO ( :'mx');

\echo

As Connor McDonald points out, this can be done online in one command on Oracle. Here is a different but similar change on Oracle

Or to put it more simply at Tim Hall says

https://oracle-base.com/articles/12c/online-conversion-of-a-non-partitioned-table-to-a-partitioned-table-12cr2