Recently we have faced issue with pt-online-schmea-change in our environment. We have master-master (M1->M2) setup, where M1 is active side.Since the alter on one of the busy table where we are doing alter using pt-osc, had failed multiple times due to “dead lock error”, we decided to execute alter on M2 which is inactive side. After we started alter on inactive side, replication was broken with duplicate key error on both M1& M2.
The below issue have occurred when executing alter using pt-online on inactive side rather on active side.
Last_SQL_Error: Could not execute Write_rows event on table myntra_wms._item_new; Duplicate entry ‘100259858717’ for key ‘PRIMARY’, Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event’s master log pas-wmsdb0.002680, end_log_pos 391514628
pt-online-schema-change --alter “ADD COLUMN xxxxxxxxx,ADD colum yyyyyyyy, Add index xxxxx” D=xxxxx,t=xxxx–host localhost --port 3306 --ask-pass --user xxxxx --check-slave-lag h=xxxxxx,u=xxxxx,p=xxxxx --max-lag 5 --check-interval 25 --check-slave-lag h=xxxxx,u=xxxxxx,p=xxxxxxx --max-lag 5 --check-interval 20 --alter-foreign-keys-method auto --critical-load Threads_running=200 --dry-run[/B]
We have found one more issue during the alter. Lag was never monitored and alter was continued even after replication was broken on both sides.
Please help us with solution.
- Does the above syntax require any changes for monitoring one or more slaves?
- Does Using pt-online in master-master setup needs extra filters to avoid the above error.