Assertion failure in file lock0lock.cc line 3980

Hi,
3 node cluster crash all the time after
InnoDB: Assertion failure in thread 140546941892352 in file lock0lock.cc line 3980

2014-10-02 12:55:33 17003 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.74439S), skipping check
2014-10-02 13:08:43 7fd3a28bb700 InnoDB: Assertion failure in thread 140546941892352 in file lock0lock.cc line 3980
InnoDB: Failing assertion: lock != ctx->wait_lock
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: [url]http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html[/url]
InnoDB: about forcing recovery.
11:08:43 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at [url]https://bugs.launchpad.net/percona-xtradb-cluster[/url]

key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=15
max_threads=202
thread_count=11
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 97010 K bytes of memory
Hope that’s ok; if not, decrease some variables in the equation.

Thread pointer: 0x23796fe0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
stack_bottom = 7fd3a28bad30 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0x8fa26b]
/usr/sbin/mysqld(handle_fatal_signal+0x471)[0x673f21]
/usr/lib64/libpthread.so.0(+0xf130)[0x7fd9318b3130]
/usr/lib64/libc.so.6(gsignal+0x39)[0x7fd92fc4d5c9]
/usr/lib64/libc.so.6(abort+0x148)[0x7fd92fc4ecd8]
/usr/sbin/mysqld[0x9940fa]
/usr/sbin/mysqld[0x99cc71]
/usr/sbin/mysqld[0x99e368]
/usr/sbin/mysqld[0x99eecd]
/usr/sbin/mysqld[0x99fc1c]
/usr/sbin/mysqld[0x9fc043]
/usr/sbin/mysqld[0x9fc4dd]
/usr/sbin/mysqld[0x9fcb24]
/usr/sbin/mysqld[0xa07bf7]
/usr/sbin/mysqld[0x96be37]
/usr/sbin/mysqld(_ZN7handler12ha_write_rowEPh+0x10f)[0x5b96af]
/usr/sbin/mysqld(Z12write_recordP3THDP5TABLEP9COPY_INFOS4+0x215)[0x6e0dc5]
/usr/sbin/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0x10a1)[0x6e6881]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2e30)[0x6fa870]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x5e8)[0x6ffd78]
/usr/sbin/mysqld[0x7005c8]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x11fa)[0x701e1a]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x18f)[0x703fef]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x182)[0x6cd072]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x6cd260]
/usr/sbin/mysqld(pfs_spawn_thread+0x143)[0x9327d3]
/usr/lib64/libpthread.so.0(+0x7df3)[0x7fd9318abdf3]
/usr/lib64/libc.so.6(clone+0x6d)[0x7fd92fd0e01d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fd34005d4e0): is an invalid pointer
Connection ID (thread ID): 3295
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
[url]http://www.percona.com/software/percona-xtradb-cluster/[/url]. You may find information
in the manual which will help you identify the cause of the crash.
141002 13:08:43 mysqld_safe Number of processes running now: 0
141002 13:08:43 mysqld_safe WSREP: not restarting wsrep node automatically
141002 13:08:43 mysqld_safe mysqld from pid file /storage/mysql.pid ended

rpm -qa | grep Percona
Percona-XtraDB-Cluster-client-56-5.6.20-25.7.888.el7.x86_64
Percona-XtraDB-Cluster-garbd-3-3.7-1.3254.rhel7.x86_64
Percona-XtraDB-Cluster-server-56-5.6.20-25.7.888.el7.x86_64
Percona-XtraDB-Cluster-galera-3-3.7-1.3254.rhel7.x86_64
Percona-XtraDB-Cluster-56-5.6.20-25.7.888.el7.x86_64
Percona-XtraDB-Cluster-shared-56-5.6.20-25.7.888.el7.x86_64
Percona-XtraDB-Cluster-galera-3-debuginfo-3.7-1.3254.rhel7.x86_64
Percona-XtraDB-Cluster-test-56-5.6.20-25.7.888.el7.x86_64
Percona-XtraDB-Cluster-56-debuginfo-5.6.20-25.7.888.el7.x86_64
Percona-XtraDB-Cluster-full-56-5.6.20-25.7.888.el7.x86_64

Aftert that cluster loses quorum and other nodes fails too

Can you help me ?

Can you check for any GRA* files created around the crash, decode them and show what writes to what tables caused this?
[url]http://www.percona.com/blog/2012/12/19/percona-xtradb-cluster-pxc-what-about-gra_-log-files/[/url]
Also upload your my.cnf file. A fully reproducible test case would be more then appreciated!

Hi,
There is no GRA* files arround.
Appreciate any ideas

[mysql]

CLIENT

port = 3306
socket = /storage/mysql.sock

[mysqld]

datadir=/storage
user=mysql

Path to Galera library

wsrep_provider = /usr/lib64/galera3/libgalera_smm.so

Cluster connection URL contains the IPs of node#1, node#2 and node#3

wsrep_cluster_address=gcomm://dba1,dba2,dba3

In order for Galera to work correctly binlog format should be ROW

binlog_format=ROW

MyISAM storage engine has only experimental support

default_storage_engine=InnoDB

This changes how InnoDB autoincrement locks are managed and is a requirement for Galera

innodb_autoinc_lock_mode=2
wsrep_slave_threads = 1

Node #1 address

wsrep_node_address=dba3
wsrep_provider_options=‘gcache.size=1G’

SST method

wsrep_sst_method=xtrabackup-v2
wsrep_slave_threads = 8

Cluster name

wsrep_cluster_name=inquiry

Authentication for SST method

wsrep_sst_auth=“user:pass”

GENERAL

user = mysql
#default-storage-engine = InnoDB
socket = /storage/mysql.sock
#pid-file = /storage/mysql.pid

MyISAM

key-buffer-size = 16M
tmpdir = /tmp/mysqltmp
default_tmp_storage_engine = MyISAM
#myisam-recover = FORCE,BACKUP

SAFETY

max-allowed-packet = 16M
max-connect-errors = 1000000
connect-timeout =360

DATA STORAGE

datadir = /storage/

BINARY LOGGING

log-bin = /storage/mysql-bin
expire-logs-days = 1
sync-binlog = 1

CACHES AND LIMITS

tmp-table-size = 128M
max-heap-table-size = 256M
query-cache-type = 0
query-cache-size = 0
max-connections = 200

open-files-limit = 65535
table-definition-cache = 2048

#thread_stack = 192K
#thread_cache_size = 16
#read_rnd_buffer_size = 128K
#sort_buffer_size = 1M

#join_buffer_size = 8M
#read_buffer_size = 128K
table_open_cache = 1024

INNODB

innodb_buffer_pool_dump_at_shutdown = ON
innodb_buffer_pool_load_at_startup = ON

innodb_thread_concurrency = 8
innodb_read_io_threads = 4
innodb_write_io_threads = 4

innodb_log_buffer_size = 128M
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 128M
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table = 1
innodb-buffer-pool-size = 19G
innodb_file_format = Barracuda

LOGGING

log-error = /var/log/mysql-error.log
log-queries-not-using-indexes = 0
slow-query-log = 1
long_query_time = 2
slow-query-log-file = /var/log/mysql-slow.log

We are getting this error between 1 to 5 times a day on a 3 node cluster. There is a bug report on launchpad here:

[url]https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1318866[/url]

Unfortunately it never stores any GRA* files on any of the nodes when this crash happens.

What else can we do to generate more data on this crash to troubleshoot it?