PXC node assertion failure

Hi everyone,

I’m having weird problem with only one of my 3 PXC nodes. For the past couple of months and at random intervals, even though mysql service starts up ok and receives full SST from donor, sometimes after a couple of days, sometimes after a week, it crashes with the exact same error every time. The log is as follows:

2015-10-07 01:21:51 7fc7e8ff7700 InnoDB: Assertion failure in thread 140496584275712 in file fsp0fsp.cc line 1509
InnoDB: Failing assertion: frag_n_used > 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
23:21:51 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=268435456
read_buffer_size=131072
max_used_connections=0
max_threads=2502
thread_count=3
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1258688 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fc7cc000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fc7e8ff6a60 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x90013e]
/usr/sbin/mysqld(handle_fatal_signal+0x494)[0x698714]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fc967f78cb0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7fc9673ce0d5]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7fc9673d183b]
/usr/sbin/mysqld[0xaa97e7]
/usr/sbin/mysqld[0xab08d2]
/usr/sbin/mysqld[0xa15ffa]
/usr/sbin/mysqld[0xa19a7c]
/usr/sbin/mysqld[0xa13bd7]
/usr/sbin/mysqld[0xa14868]
/usr/sbin/mysqld[0xa148f2]
/usr/sbin/mysqld[0x91f798]
/usr/sbin/mysqld[0x927ef4]
/usr/sbin/mysqld(_Z13ha_commit_lowP3THDbb+0x112)[0x5e5262]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG26process_commit_stage_queueEP3THDS1_+0x36a)[0x8b2aba]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG14ordered_commitEP3THDbb+0x441)[0x8ba281]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG6commitEP3THDb+0x56c)[0x8ba90c]
/usr/sbin/mysqld(_Z15ha_commit_transP3THDbb+0x34f)[0x5e5aef]
/usr/sbin/mysqld(_Z12trans_commitP3THD+0x47)[0x7a32e7]
/usr/sbin/mysqld(_Z15wsrep_commit_cbPvjPK14wsrep_trx_metaPbb+0x20e)[0x5df42e]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0x150)[0x7fc930223ed0]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM8recv_ISTEPv+0x2a4)[0x7fc930231d64]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM22request_state_transferEPvRK10wsrep_uuidlPKvl+0x681)[0x7fc930233921]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM19process_conf_changeEPvRK15wsrep_view_infoiNS_10Replicator5StateEl+0xb49)[0x7fc930226249]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x67b)[0x7fc9301fe0cb]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x5e)[0x7fc9301ff14e]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x78)[0x7fc930226628]
/usr/lib/libgalera_smm.so(galera_recv+0x1e)[0x7fc930238b5e]
/usr/sbin/mysqld[0x5df881]
/usr/sbin/mysqld(start_wsrep_THD+0x2f8)[0x5c73b8]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fc967f70e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fc96748b8bd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 1
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
151007 01:21:52 mysqld_safe Number of processes running now: 0
151007 01:21:52 mysqld_safe WSREP: not restarting wsrep node automatically
151007 01:21:52 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

Obviously I searched before posting but couldn’t find anyone who doesn’t have this kind of problem at service startup. I also did a full hardware check on my node and nothing came up. Even had an fsck on both members of my raid1 array with no errors whatsoever, same with smartctl. I cannot believe it’s something to do with InnoDB tablespace integrity since the other 2 nodes have the exact same data and they haven’t thrown an error all this time.
Anyone has any clues?


EDIT: the cluster is comprised of 3 nodes running “5.6.24-72.2-56-log Percona XtraDB Cluster (GPL), Release rel72.2, Revision 43abf03, WSREP version 25.11, wsrep_25.11” with Galera 3.11(r93aca2d)