While executing pt-osc i can see lot of deadlock errors

I was running a ptosc for adding an index on a table while running Ptosc i seen many deadlock errors was logging into the log when i check those are triggred by pt-osc
Can someone help me how to avoid this

mysql tables in use 2, locked 2
LOCK WAIT 7 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 1
MySQL thread id 5574896, OS thread handle 47371953706752, query id 34640533083 172.31.13.123 cart-rw update
REPLACE INTO `chart`.`__________charts_new` (`id`, `userid`, `subtotal`, `shippingcharges`, `destpin`, `carrierid`, `codcharges`,  `total` ,`created`, `migrationid`) VALUES (NEW.`id`, NEW.`userid`, NEW.`subtotal`, NEW.`shippingcharges`, NEW.`destpin`, NEW.`carrierid`,  NEW.`total`,  NEW.`created`, NEW.`m
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `chart`.`__________charts_new` trx id 12740293280 lock mode AUTO-INC waiting
*** (2) TRANSACTION:
TRANSACTION 12740293273, ACTIVE 1 sec inserting
mysql tables in use 2, locked 2
8 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 3
MySQL thread id 5574356, OS thread handle 47431366022912, query id 34640532979 172.31.46.96 cart-rw update
REPLACE INTO `chart`.`__________charts_new` (`id`, `userid`, `subtotal`, `shippingcharges`, `destpin`, `carrierid`, `codcharges`, `total`, `created`,`migrationid` ) VALUES (NEW.`id`, NEW.`userid`, NEW.`subtotal`, NEW.`shippingcharges`, NEW.`destpin`, NEW.`carrierid`,  NEW.`total`,  NEW.`created`, NEW.`mi
*** (2) HOLDS THE LOCK(S):
TABLE LOCK table `chart`.`__________charts_new` trx id 12740293273 lock mode AUTO-INC
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 424 page no 18245 n bits 752 index unique_user_id of table `chart`.`__________charts_new` trx id 12740293273 lock_mode X waiting
Record lock, heap no 673 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 89fb209c; asc     ;;
 1: len 4; hex c2917377; asc   sw;;

1 Like

Could you please share your pt-table-checksum command and output of SHOW CREATE TABLE charts
Please also let us know which verison of pt-table-checksum and MySQL/Percona/MariaDB server you use.

1 Like

Hi @svetasmirnova
Below are the details

pt-online-schema-change --host myhost.com --user=ptuser --password=ptpassword D=chart,t=charts --alter-foreign-keys-method auto --recursion-method=none --no-check-alter  --no-check-plan --no-drop-old-table --no-drop-new-table --progress time,1 --tries copy_rows:500:1 --sleep 0.1  --critical-load Threads_running=100 --print --pause-file=/tmp/pause --alter "ADD COLUMN type varchar(50),ADD UNIQUE INDEX unique_user_id_type (userId,type)" --execute```


```-- Table schema
CREATE TABLE `charts` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `userId` int(10) NOT NULL,
  `subTotal` int(10) NOT NULL,
  `shippingCharges` int(10) NOT NULL,
  `destPin` varchar(100) DEFAULT NULL,
  `carrierId` int(11) DEFAULT NULL,
  `codCharges` int(10) DEFAULT NULL,
  `total` int(10) NOT NULL,
  `created` datetime DEFAULT CURRENT_TIMESTAMP,
  `migrationId` int(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_user_id` (`userId`),
  KEY `userId` (`userId`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1126800166 DEFAULT CHARSET=latin1```

--PT version
pt-online-schema-change 3.0.1
1 Like

This is because pt-online-schema-change creates triggers that execute REPLACE command when your application updates the table you modify. Since the application can fire updates in multiple threads (connections), triggers are also fired in multiple threads. Deadlock happens because your table has two unique indexes. So one transaction, executing REPLACE, holds a lock on the primary key and waits for a lock on the key unique_user_id while another transaction holds a lock on the key unique_user_id and waits for a lock on the primary key.

2 Likes

Thank you @svetasmirnova
while i used to run this application will start throwing error wrt connections any thing that i can use in variables to bypass this or make to run less aggressive and less lock

1 Like

I do not think you can do anything except lowering number of parallel connections in your application. These parallel queries come from your application threads. Depending on the table size it may have sense to schedule downtime and run regular ALTER TABLE statement. Or try gh-ost that uses binary log to make changes online.

1 Like

Thanks @svetasmirnova
I tried using gh-ost but i ended up with data loss as i can see on gho table there is less number of records compare to the original table there was almost 800 records were mismatching
still i’m not sure why there is data mismatch any thing do i need to look up here

+----------+
| count(1) |
+----------+
| 53484877 |
+----------+
1 row in set (6.59 sec)

mysql> select count(1) from _charts_gho;
+----------+
| count(1) |
+----------+
| 53484075 |
+----------+
1 row in set (6.65 sec)

mysql> select count(1) from _charts_gho;
+----------+
| count(1) |
+----------+
| 53484092 |
+----------+
1 row in set (6.64 sec)

mysql> select count(1) from charts;
+----------+
| count(1) |
+----------+
| 53484886 |
+----------+```
1 Like

Do you continue writing to the table after the copy is finished? If yes, it is expected that the original table would have more rows than the copy. If not I need to see the command you use and know the version of both MySQL server and gh-ost.

1 Like

Yes @svetasmirnova the application will be continuously writing to the table as it is one of the busy table so downtime is not possible

Below is the command i have used wrt gh-ost

gh-ost --max-load=Threads_running=25 --critical-load=Threads_running=100 --chunk-size=1000 --max-lag-millis=5000 --user="review_admin" --password="DKpcvXSdZggRM9wz" --host=myhost.com --throttle-control-replicas="myreplica.com" --allow-on-master --database="chart" --table="charts" --initially-drop-ghost-table --alter="ADD UNIQUE KEY unique_user_id_type (userId,type)" --switch-to-rbr --cut-over=default --exact-rowcount --concurrent-rowcount --default-retries=120 --panic-flag-file=/tmp/ghost.panic.flag --postpone-cut-over-flag-file=/home/ghost.postpone.flag --execute

1 Like

Hi @devaraj !

gh-ost should swap tables in the end. If you measure the difference while /home/ghost.postpone.flag is still present it is expected for the original table to have more rows than the copy. You need to find a reliable way to verify how the tool works after tables are swapped.

Thank you @svetasmirnova for the help, will try to read more on gh-ost