alter partitioned table, using work disk and pt-online-schema-change

I have a very large partitioned table that I need to add some new columns too. It will take days to migrate and I can’t have the system offline while I do it. Perfect job for pt-online-schema-change!

Normally this means I need double the space on my disk (old copy + new copy) to do the modification with pt-online-schema-change. We don’t have enough space currently and we could add it, but then if we add it to the file system (xfs) with the database tables on it we can’t give it back when we are done. You can’t reduce the size of an xfs file system. There are multiple TB of data that the table uses.

The big idea was to add the column(s) and modify the tablespace (new tablespace on new disk) at the same time thus writing the table on a new disk using pt-online-schema-change. Then in a second go just modify the tablespace back to the original tablespace/disk (‘TABLESPACE = innodb_file_per_table’). Leaving the temporary work disk empty again so we can release it back to the wild to be used for other things.

Here is my alter statement

alter table vmv_base
ADD COLUMN transcript_id VARCHAR(20) DEFAULT NULL AFTER gene_id,
REORGANIZE PARTITION pSynonymous,pNonSynonymous,pNoCall INTO (
PARTITION pSynonymous VALUES IN (0) TABLESPACE = tmp1,
PARTITION pNonSynonymous VALUES IN (1) TABLESPACE = tmp1,
PARTITION pNoCall VALUES IN (2) TABLESPACE = tmp1)

The Problem: You can’t combine other alter statements with a REORGANIZE PARTITION alter.

So I can move the table or alter the table but I can’t do both at the same time. In my case one w/o the other does not work I will blow out the disk space or just move the table without altering it.

I looked into the the ‘DATA DIRECTORY’ options on the partitions but that does not seem to be supported at all for alter table(just create), plus the alter statement also uses the problematic REORGANIZE PARTITION in the same was as above with the same limitation as my statement above.

Adding column(s) with mysql’s ALTER statement with algorithm=online results in the table being rebuilt in in the same directory (needing twice the disk space too) so that is a non starter.

I tried using two –alter clauses with pt-online-schema-change, but that only executes the second alter.

In my mind that would be the cleanest way to make this all work is allow multiple alter statements to pt-online-schema-change to get around the rule that says you can’t do both these alterations in one go.

Before I go off and basically emulate the pt-online-schema-change tool on my own, is there an easy solution I am overlooking? Some other tool maybe…

  1. create new table on new tablespace
  2. write my own script to loop over the data and insert it
  3. stop processing
  4. rename the tables.
  5. drop the old table.

Does this helps? [url]https://www.percona.com/blog/2016/11/04/changing-tablespace-directory-pt-online-schema-change/[/url]

This is close but not quite working. For my partitioned table it generates the following…

CREATE TABLE `variantdb`.`_vmv_base_new` (
.
.
.
) /*!50100 TABLESPACE `innodb_file_per_table` */ ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY LIST (partition_key)
(PARTITION pSynonymous VALUES IN (0) TABLESPACE = `innodb_file_per_table` ENGINE = InnoDB,
PARTITION pNonSynonymous VALUES IN (1) TABLESPACE = `innodb_file_per_table` ENGINE = InnoDB,
PARTITION pNoCall VALUES IN (2) TABLESPACE = `innodb_file_per_table` ENGINE = InnoDB) */ [B]DATA DIRECTORY='/Users/jrule/mysql_extra' [/B]

This code does not execute. What I need is…

CREATE TABLE `variantdb`.`_vmv_base_new` (
.
.
.
) /*!50100 TABLESPACE `innodb_file_per_table` */ ENGINE=InnoDB DEFAULT CHARSET=utf8 [B]DATA DIRECTORY='/Users/jrule/mysql_extra' [/B]
/*!50100 PARTITION BY LIST (partition_key)
(PARTITION pSynonymous VALUES IN (0) TABLESPACE = `innodb_file_per_table` ENGINE = InnoDB,
PARTITION pNonSynonymous VALUES IN (1) TABLESPACE = `innodb_file_per_table` ENGINE = InnoDB,
PARTITION pNoCall VALUES IN (2) TABLESPACE = `innodb_file_per_table` ENGINE = InnoDB) */ 

This then works and generates a table that looks thus…


/*!50100 TABLESPACE `innodb_file_per_table` */ ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY LIST (partition_key)
(PARTITION pSynonymous VALUES IN (0) TABLESPACE = `innodb_file_per_table` [B]DATA DIRECTORY = '/Users/jrule/mysql_extra/' [/B]ENGINE = InnoDB,
PARTITION pNonSynonymous VALUES IN (1) TABLESPACE = `innodb_file_per_table` [B]DATA DIRECTORY = '/Users/jrule/mysql_extra/' [/B]ENGINE = InnoDB,
PARTITION pNoCall VALUES IN (2) TABLESPACE = `innodb_file_per_table` [B]DATA DIRECTORY = '/Users/jrule/mysql_extra/' [/B]ENGINE = InnoDB) */

I need to move it back when I am done, and your current extension even if modified to work does not allow me to specify the DATA_DIR as the target. I could make an new directory on the same disk but outside of the DATA_DIR, but would rather not. However, I can use the following modify statement go get it back to the home database directory…

--alter '
REORGANIZE PARTITION pSynonymous,pNonSynonymous,pNoCall INTO (
PARTITION pSynonymous VALUES IN (0) TABLESPACE = innodb_file_per_table,
PARTITION pNonSynonymous VALUES IN (1) TABLESPACE = innodb_file_per_table,
PARTITION pNoCall VALUES IN (2) TABLESPACE = innodb_file_per_table
)
'

Any suggestions on what we can do, the real rub here is this is not just a regular table, but a partitioned table.

I am afraid that pt-osc won’t help in this case.
According to this MySQL bug: https://bugs.mysql.com/bug.php?id=32091,

Well, I am running a very recent version (mysql Ver 14.14 Distrib 5.7.20-19) and they clearly do accept data directory option at the table level when creating. It is just applied to all the partitions. As you can see from my successful example above. They just apply the table level directive to each partition. I did not try to mix and match them and both levels.

My question boils down to:

  1. can you extend the data directory option to support partitions as the do allow data directory specification at the partition level.
  2. can you allow for multiple --alter statements (I think this would be the best option) then you could fully support the range of modify statements allowed as some types of modify statements for a table and partition options are mutually exclusive and can not be executed in a single go.

As for now. I went ahead and just whipped up some scripts to emulate the basic pt-osc functionality. But anyone working with partitioned tables is going to keep bumping up against these kinds of problems as you can’t modify a table and modify it’s partition strategy in a single go.

I found a work around for this problem. During the modify you can use the ‘PARTITION BY …’ clause with the other modify statements. You can not use the REORGANIZE PARTITION and some others PARTITION modifiers with other column modifiers.

Since I am building a new empty table as we go simply fully restating the the partitioning with the PARTITION BY clause is OK.

You also need to use the TABLESPACE xxx clause at the table level as it just sets the default tablespace for the table. This is useful as the rebuild of the partitions picks up and uses the new default tablespace.

To move the data and modify the table’s tablespace at the same time you do the following…


create tablespace tmp1 '/path/to/tablespace/tmp1.ibd' ENGINE = INNODB;

Then in pt-osc tool you can use the following --alter clause to move to the new tables and alter the table in a single go. This sets the default tablespace for the table to tmp1 at the table leve and the recreates all the partitions (using the new default tablespace).


--alter '
tablespace tmp1,
add column my_new_column integer not null default 'abc'
PARTITION BY RANGE COLUMNS (col1, col2) (
p1 values less then (1,1),
p2 values less then (maxvalue, maxvalue)
)'

When you are done, you will have fully recreated the partitions using the new default tablespace of tmp1, populated the table and swapped and dropped the temporary table with the original table as pt-osc is so good at.

To move it back (as I needed to do). You just set the default tablespace to innodb_file_per_table and recreate the partitions one more time.

You just call pt-osc one more time as follows.


--alter '
tablespace innodb_file_per_table
PARTITION BY RANGE COLUMNS (col1, col2) (
p1 values less then (1,1),
p2 values less then (maxvalue, maxvalue)
) '

And when this finishes the partitions have been recreated back in the default directory due to the default tablespace now being innodb_file_per_table.

The the tmp1 tablespace is now empty again and can be dropped.


drop tablespace tmp1;

Then the disk that the tmp1 tablespace was living on can be discarded.