Actually tripped up on a weird issue with pt-osc on Mysql 5.7. We do have a clustered setup, working across data centres. We are also experiencing network issues outside our control, so were trying this…:
We needed to create 2 indices on a large table, 60m rows. So we did the usual, and use pt-osc. This caused some errors when used across the network on the slaves, so went for the single server at a time approach.
We have 4 servers in AWS, and 3 in a hosting providers. We added --set-vars=“SQL_LOG_BIN=OFF”, to stop commands being replicated. All worked well, except for 2 servers in the hosted area, which arbitrarily stare=ted replicating the pt-osc instrcutuions.
After investigation. we narrowed this down to a high load scenario. WE fixed this by shifting the load to other servers, then running the script, and shifting the load back (thx HAproxy).
However, this is less than suboptimal. It became apparent that a timeout is kicking in, dropping the connection. and then allowing mysql to reconnect on the next instruction. This loses the settings of set-var, which is session based.
Is there a way to avoid this happening? Is it a known issue… with a fix we are unaware of? Wehuntedhigh and low… an dfound nothing. I do fully accept that keeping a session variable alive between reconnects is not simple, but I would have thought tha a check on threads within pt-osc… and if different, saving an reloading the session vars should not be too difficult… but I culd also be easily wrong.
Any ideas? Anyone else met this?