We have experienced some crashes lately, when this happens all nodes in the cluster goes down.
The cluster worked fine until this started, so it is something that triggers from our code, unfortunately we have not found a way to reproduce this yet.
Could we have stumbled on a bug that it fixed in the latest release or have we started doing something that we shouldn’t do to the cluster?
Is it possible to send more info to percona to do some more debugging.
A common message after each crash is:
InnoDB: ###### Diagnostic info printed to the standard error stream
2018-12-13T08:46:39.425918+01:00 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2018-12-13 08:46:39 0x7f9227b24700 InnoDB: Assertion failure in thread 140265707947776 in file ut0ut.cc line 943
and there is a lot of:
—TRANSACTION 17150185739, ACTIVE 691 sec updating or deleting
mysql tables in use 1, locked 1
2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 394295, OS thread handle 140302384207616, query id 11664555 … tx updating
UPDATE database.table SET column = NOW() WHERE id = ‘7280’
—TRANSACTION 17150185805, ACTIVE 479 sec starting index read
mysql tables in use 1, locked 1
1 lock struct(s), heap size 1136, 0 row lock(s)
MySQL thread id 394338, OS thread handle 140302390064896, query id 11668684 … … updating
update tfl.test set tid=NOW() where id=1
…
–Thread 140265423623936 has waited at dict0stats.cc line 2376 for 876.00 seconds the semaphore:
X-lock on RW-latch at 0x2c1d608 created in file dict0dict.cc line 1197
a writer (thread id 140302403643136) has reserved it in mode wait exclusive
number of readers 2, waiters flag 1, lock_word: fffffffffffffffe
Last time read locked in file row0purge.cc line 865
Last time write locked in file /mnt/workspace/percona-xtradb-cluster-5.7-redhat-binary/label/centos6-64/rpmbuild/BUILD/Percona-XtraDB-Cluster-5.7.21-29.26/storage/innobase/row/row0mysql.cc line 4807
…