Hi there,
We’ve been having trouble with a 3 node galera cluster, where seemingly out of nowhere one of the nodes will decide it doesn’t like a query and hang indefinitely. It will lock a table and hold that table. At this point, all tables start to lock and all queries end up creating new table locks. The process list will simply accumulate processes, existing processes will increment their time, and the open tables will just keep incrementing. Killing the process (or all processes) does not help. The only way to recover seems to be to cycle mysqld.
Here’s an example of one hung query from SHOW PROCESSLIST. I’ll spare the full list, unless you think it would help.
| 8 | xxxx_system_user | localhost | xxxx_system | Killed | 49634 | updating | UPDATE leases SET version=version+1,holder_id=1511216099643476245,expires=NOW() + INTERVAL 6930000 M | 0 | 0 |
Here’s a related row from SHOW OPEN TABLES.
| xxxx_system | leases | 3 | 0 |
And here’s a bit from the SHOW ENGINE INNODB STATUS that looks suspect:
—TRANSACTION 421965858011080, not started sleeping before entering InnoDB
mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
—TRANSACTION 421965858013496, not started sleeping before entering InnoDB
mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
—TRANSACTION 421965858009872, not started sleeping before entering InnoDB
mysql tables in use 1, locked 1
0 lock struct(s), heap size 1136, 0 row lock(s)
—TRANSACTION 421965858008664, not started sleeping before entering InnoDB
mysql tables in use 1, locked 1
We’re using pxc_strict_mode=ENFORCING and we’re using a read/write splitting proxy (MaxScale).
We have tried everything we know of, to both avoid the situation, as well as to recover from it without a restart.
Please let me know if any other information would be helpful in this case.
Cheers,
Clinton