How to fix deadlock innodb transaction - InnoDB History Length keeps rising

Hello, we have an issue with the production pxc cluster on the primary node causing CPU usage to be almost at 100%. Pxc pod was using around 1.5cpu and from a few days ago it went to nearly 5cpu usage in minutes.

The only thing I could find out of the ordinary is that InnoDB Transaction History Length keeps rising in PMM

So it led me to find your article here: Chasing a Hung MySQL Transaction: InnoDB History Length Strikes Back

and I checked the innodb status and found:

--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
4 read views open inside InnoDB
0 RW transactions active inside InnoDB
---OLDEST VIEW---
Read view low limit trx n:o 48866701
Trx read view will not see trx with id >= 48866702, sees < 48866702
Read view individually stored trx ids:
-----------------
Process ID=1, Main thread ID=139765690038016 , state=sleeping
Number of rows inserted 11471887, updated 4318579, deleted 102619, read 484604475075
2.20 inserts/s, 0.00 updates/s, 0.00 deletes/s, 326912.51 reads/s
Number of system rows inserted 759371, updated 46258, deleted 758830, read 845931154
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 116.09 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

There are 2 transactions in the deadlock from 27.12

------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-12-27 15:12:12 139765681645312
*** (1) TRANSACTION:
TRANSACTION 47064717, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 12 lock struct(s), heap size 1128, 11 row lock(s), undo log entries 2
MySQL thread id 10713458, OS thread handle 139765246965504, query id 250807995 10-42-166-85.pxc-db-haproxy-replicas.percona.svc.cluster.local 10.42.166.85 user1 updating
update `xxxxxxx` set `priority` = 1, `xxxxxxxx`.`updated_at` = '2023-12-27 15:12:12' where `id` in ('xxxxxxxxxxxxxxxxxxxxxxxxxxxxx') order by `xxx`.`created_at` desc

*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 383 page no 220450 n bits 320 index traces_priority_index of table `tablex`.`xxx` trx id 47064717 lock_mode X locks rec but not gap
Record lock, heap no 254 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 1; hex 81; asc  ;;
 1: len 30; hex 39656565366463392d366534622d343963382d396162332d323834643736; asc 9eee6dc9-6e4b-49c8-9ab3-284d76; (total 36 bytes);


*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 383 page no 221958 n bits 280 index traces_priority_index of table `tablex`.`xxx` trx id 47064717 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 199 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 1; hex 81; asc  ;;
 1: len 30; hex 30616265623132652d343265372d346666352d626365332d306339656333; asc 0abeb12e-42e7-4ff5-bce3-0c9ec3; (total 36 bytes);

I see the answer is to kill thread_id or kill the query, but none works. Mysql reports no query nor thread_id exists from the logs.
So I’m not sure how to recover InnoDB and lower the CPU usage, or is it safe to simply restart the pxc1 pod? Cluster reports it is replicated and other 2 replicas uses resources normally.

Thank you for any help, it is urgent for me to solve this somehow.

Hi Slavisa,

The “latest detected deadlock” section only shows the last deadlock that happened, and is not necessarily related to currently-ongoing transactions.

You will need to check which connections have current transactions running (for instance with SHOW FULL PROCESSLIST) in order to get the thread_id needed for you to kill it.

The last deadlock was on Dec 27th, and the history list length increasing issues started on Jan 3rd according to your PMM screenshot, so I’d say they are not linked to each other.

1 Like

Thank you for you answer, i was also wondering if that’s related because of different date it started, but i wasn’t able to find anything else different from other 2 nodes.
So the report where it says innodb is not seeing latest transactions doesn’t matter?

—OLDEST VIEW—
Read view low limit trx n:o 48866701
Trx read view will not see trx with id >= 48866702, sees < 48866702
Read view individually stored trx ids:

I also noticed in pmm there are around 50% idle threads, but in procces list it shows few system processes like wsrep and several clients. And in pmm it shows greater number of threads. So i think it is not showing all threads with show full process list
I can send screenshot later.

Do you think it would be safe to simply restart pxc1 since the data is replicated?

@Agustin_G thank you, you were right. I killed some of the threads I noticed were hung (showing all the time in the process list) and CPU usage went down immediately, and also transaction history.

Hi Slavisa,

Glad to hear you were able to solve it following those steps :slight_smile:

1 Like