We have a large database that is managed in a cluster of 3 hosts, 2 replicas and one primary.
A recent maintenance required us to drop some old partitions ~250GB in file size, i had already created hardlinks as i know the file deletion is often the most time consuming and heavily impacts IO.
So the process was as follows :
- moved proxy sql alias from incoming master to replica
- killed all sleeping connections on master
- paused applications
3.1. identified that the primary server was still being written to
3.2. dropped the alias write address from outgoing master
3.3. verified there were no more writes being performed and the incoming master and it was fully syncd.
- Attempted to drop the partition of item_occurrence, this took ~10mins to complete with the process showing the state of
committing alter table to storage engine
There were no additional threads or connections on the incoming master only the table alter and the binlog process.
Why would this take so long to drop ? dropping the exact same partition on a test host with the same physical database is instantaneous. So i am assuming that there is some background function such as AHI that needs to be manipulated as part of the partition drop or is there something else ?
Regards
Darren
Running version 8.0.35.27
What were CPU/Disk like during these 10m? Yes, AHI scanning/purging, dirty LRU scanning/purging, regular LRU scanning/purging, all takes place during a drop table. Correct, this is usually instant.
CPU/DISK appeared ok and the host was fully responsive during this period.
I was wondering if a flush tables with read lock
might also be pertinent, but this is a secondary thing as this particular process was being performed on a host that was not being written to or accessed but had been used as a read replica prior to the change being performed.
What is the FTWRL for? You don’t need to lock anything. Stop writes, ALTER TABLE … DROP PARTITION should be it.
The underlying issue is that we have a quite heavily used database, so we are getting blocked with making table alters to manage partitions i think purely because it is so loaded and is being used for read and write functions.
Because there are combinations of locks in place i believe our only recourse is to stop all applications / microservices which will in turn release those locks and then perform the alter.
I guess i also misunderstood the advantages of DML. as i thought this would manage the drop better with less manipulation.
Your best bet here is to revert back to “old school” ALTERs where you stop replication on a replica, drop the partition on replica, repeat for other replicas, then promote a replica to be new source. Then drop on source.
Or, implement quick maintenance windows were you stop apps for 10m to do the drop.
I guess i also misunderstood the advantages of DML.
What do you mean by this? Advantages of DML?
I thought with upgrading to mysql8 from 5.7 we would gain an advantage with alters, i misspoke when mentioning DML but i thought we would have had some realtime benefits without having to take down the entire pipeline to make changes.
I have discussed the possibility of dropping the partitions on the replica. My concern here is that the primary host may still be making updates to the partition which will cause replication to break (i believe) because it will attempt to process a change through the binlog entries and when it hits an action to alter the now deleted partition it will error and pause.
Although re-reading your comment , what you are saying is pausing replication and i assume also writes to the primary, then making the change and then restarting writes to one of the replicas.
If i were to do this locally i would possibly approach it this way…
create a cascading replication a-b-c (where c is replicating through b)
pause application / microservices thus stopping writes
drop the partition from the replicas by dropping it on host b
enable the applications and point them to host b (i do this with floating addresses so i can apply the required floating address to host b)
Then recover host A from a snapshot of host C.
Some alters, yes. Like ALTER TABLE .. ADD COLUMN foo varchar(20)
can be done using the INSTANT algorithm. But not all DDL is supported for INSTANT. Just like not all DDL is supported for 5.7’s “online” alters. In fact, if you look at the doc page I linked, you’ll see that DROP PARTITION cannot be done online.
In MySQL 8, (same url just change version), there’s some support for INPLACE which prevents entire table copy. You might try ALTER TABLE foo DROP PARTITION X, ALGORITHM=INPLACE, LOCK=NONE
Ah, yes, I assumed that the reason for dropping the partitions is because they were no longer in use by the application.
Yeah unfortunately the application isnt aware of the partitions, just the base table schema.
Well couple of items here… You mention " cluster of 3 hosts, 2 replicas and one primary." So Just to make sure this is not a CLUSTER (PXC)
I have this issue with the Zabbix partition scripts as well and it is actually the rollup process if you will…
There are a couple of items to think about that have helped me though I run only a Master/Master setup and have moved to PXC for most of it now and deal with it in a slightly different format now Mine are geographically diverse so I also have to account for WAN latency
- I make quite a few days partitions at one time
- I make sure I only have one replica (worker) thread running on the “slave” (the one not running the partition script)
- I have a Load balancer that all reads go to (and make sure they allow for in this case dirty reads)
- I set the threads to LOGICAL verse DATABASE
- I make sure to turn off during the process the proper sync settings (take care and know what you are doing on this so you do not get them out of sync)
- I disable or alter the CRC checksums
- In SOME cases I found it less intrusive to run it on the “read” side or my masters and let the worker thread take care of some of the lockings…
For me, it took some playing around to get things to work right and be as non-intrusive as possible. Keep a eye on the IO stats and also make sure depending on the storage the fs syncs going on as the IO Method might be part of the issue.
The part that puzzles me though is the simple DROP should not be the issue I would be somewhat curious if you took and stopped the two relica’s and tried to drop the partition if it still takes so long. . I am wondering if the worker threads are holding this open for you… If so What is the size of the binlogs and the binlog cache items? and where those binlog files are being written to? Might even just try it by doing a stop slave and dropping the partition(s)
Key Items the more I think about it
- Slave worker settings and binlog sync settings
- Binlog cache’s
- INNODB type
- OS Swapiness and file system type
- Binlog file locations and the stats going with those
Something seems wrong here as a simple drop should be very very quick… Maybe even leverage something like lsof to see what is going on with the file systems Also leverage and look around in pmm as I am assuming you have the mysql monitors going and that will really help you to see what is going on with the 10 min committing aspect…
So as mentioned when i performed the table alter on the replica there were no additional threads open except the slave replication thread, and nothing was connected to the host as the applications were not aware of it.
So in theory this would be close to the test node i performed the drop on and it i thought should have been much quicker as the test node showed it should only take a few seconds. But for some reason we were hung in the : committing alter table to storage engine
state which it did eventually recover from and the cascaded replica also successfully dropped the partition.
This isnt PXC it is standard percona server running 8.0.35.27