pt-online-schema-change how to handle very large tables

This tool has been very helpful in our environment for a variety of reasons. One common one is to rebuild tables with row_format=compressed to help control use of disk space on reporting tables. This works really well, except for the part where the old table is dropped. (This is with file_per_table, btw) A global lock is taken inside mysql while the operating system deletes the file for the old table.

Everything hangs while the OS takes its time if the file is very large. We had a hang in excess of three minutes after dropping an 80G file.

Slightly better than the default behavior would be to set the --no-drop-old-table option so that we could defer the drop until a time when the impact on the users would be minimized.

However, it would be best if there was a way to avoid a lengthy hang altogether. I ran a test where I simply zeroed out the old table’s ibd file before issuing a drop command, and it didn’t raise any errors. Can anyone think of a reason why you wouldn’t be OK doing this:

For large table FOO

$ pt-online-schema-change --execute --no-drop-old-table --alter “ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4” D=database, t=bigtable

when that is finished

$ cp /dev/null bigtable_old.ibd

and then

mysql> drop table bigtable_old;

Nothing would be using bigtable_old in between the schema change job and the other steps.

This worked in testing of a 1G table file, but the production database tables where this would be useful are much larger.

@cmcgrail, i’m looking to do something similar with large tables and am curious where you ended up with this.

also curious if you’re using replication at all while making these changes.

thanks in advance.

Yes, the databases where we’ve been doing this are replicated. At first the user running the job didn’t have privs on both the master and replica so it wasn’t able to check if replication was lagging, and on large tables the replication was often hours behind. Once we got to systems where that was not tolerable, I modified the user credentials to allow the user running the job to connect to both master and replica, and the tool would pause when lag began to appear. This slowed progress of the overall job noticeably, but it minimized the chance that anyone would notice anything amiss with user data.