Pt online schema change is very slow on prod server

i have a mysql server which is always having 1000+ connections and runing threads >100 when i tries to execute the pt-osc on this server for modifying the datatype for one of the column dry run will run preety fast but --execute option will take 5-10 min to create a new table itself and for an hour it copied only 90k records from original table to the new table
want to know any parameter do i need to use here appreciate your reply

1 Like

Hello @devaraj,
What you describe sounds correct. If your server already has lots of load (threads_running > 100), pt-osc will add even more load to your system. If your disks cannot handle all of the normal traffic + pt-osc, then you will experience slowdowns everywhere. This is normal. Best thing you can do is use pt-ocs during low traffic periods, or use traditional slave-promotion to affect schema changes.

1 Like

Thanks @matthewb
I tried the same in low peak hours then also the same issue, were hardly see threads running >50 during that time, but even then also the pt-osc was every slow when i start the pt-osc for for first 5 mins it wont show any output after some times i can see from output it created a new table and triggers

But while doing dryrun the response is veryfast while doing execution why creating table itself takes 10-15 mins even when there is less load on server

1 Like

@devaraj,
A dry-run doesn’t actually do anything; that’s why it’s fast. It only tells you what it plans on doing so you can check the steps.

Open another session and check SHOW PROCESSLIST and see what else is going on in the system.

1 Like

Thanks @matthewb
Checked the same i can see the user which i was using with pt-osc in show processlist which is in sleep state apart from that i can see normal application queries

1 Like

Next time you run pt-osc, run it like this:

$ PTDEBUG=1 pt-online-schema-change ...

This will produce LOTS AND LOTS of output, but it will tell you exactly what pt-osc is doing. See if you can find out why it is sleeping/waiting.

1 Like

Sure @matthewb
Thank you so much for helping me here, i will collect the logs using debug mode and will update here

1 Like

From what I recall of the 10K+ lines of code of ptosc, there is definitely code that explicitely backs off when there is load. The mysql user actually needed the PROCESS privilege just for that polling of process list. The nibbling copy is throttled accordingly.

Also, depending on the mode (drop swap or rebuild constraints) there is a phase where a table lock must be obtained, and this is difficult with ongoing threads. The lock is granted by FIFO iirc, and if ptosc was holding on to this hope to get the lock, it would force all other queries to pile up behind it. So the code sets a much MUCH shorter lock wait timeout to be the 1st victim and avoid piling up other queries behind.

That mean, ptosc yields in presence of other queries, and it could take A LOT of time to see a lull for ptosc to acquire the needed table lock.

I filed a RFE for tuning that: [PT-1928] Add option to set lock timeouts distinctly for various statements - Percona JIRA

1 Like

Thank you @bob
will look into the section that you have mentioned

1 Like