All nodes in cluster failed.

Hello,

Today my cluster with 2 nodes in it unexpectedly went offline. Both nodes were down with following error in error log:

12:07:53 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at [url]https://bugs.launchpad.net/percona-xtradb-cluster[/url]

key_buffer_size=2147483648
read_buffer_size=20971520
max_used_connections=54
max_threads=153
thread_count=21
connection_count=21
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 8366310 K bytes of memory
Hope that’s ok; if not, decrease some variables in the equation.

Thread pointer: 0xc1d3db0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong…
stack_bottom = 7fc52c43dd28 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x901e65]
/usr/sbin/mysqld(handle_fatal_signal+0x4c4)[0x67fc34]
/lib64/libpthread.so.0(+0xfae0)[0x7fc55a946ae0]
/usr/lib64/libgalera_smm.so(_ZN6galera13Certification10do_test_v3EPNS_9TrxHandleEb+0x1b1)[0x7fc538a4f571]
/usr/lib64/libgalera_smm.so(_ZN6galera13Certification7do_testEPNS_9TrxHandleEb+0x36f)[0x7fc538a52d1f]
/usr/lib64/libgalera_smm.so(_ZN6galera13Certification4testEPNS_9TrxHandleEb+0x28)[0x7fc538a52ee8]
/usr/lib64/libgalera_smm.so(_ZN6galera13Certification10append_trxEPNS_9TrxHandleE+0x8a)[0x7fc538a52f8a]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM4certEPNS_9TrxHandleE+0x8b)[0x7fc538a7f3fb]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM10pre_commitEPNS_9TrxHandleEP14wsrep_trx_meta+0x59)[0x7fc538a7f8d9]
/usr/lib64/libgalera_smm.so(galera_pre_commit+0x148)[0x7fc538a90728]
/usr/sbin/mysqld(_Z22wsrep_run_wsrep_commitP3THDP10handlertonb+0x9a2)[0x7ba0f2]
/usr/sbin/mysqld[0x7baa83]
/usr/sbin/mysqld(_Z14ha_prepare_lowP3THDb+0x8c)[0x5c37bc]
/usr/sbin/mysqld(_Z15ha_commit_transP3THDbb+0x27c)[0x5c50dc]
/usr/sbin/mysqld(_Z12trans_commitP3THD+0x49)[0x79ff59]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x3571)[0x7045a1]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x608)[0x7072e8]
/usr/sbin/mysqld[0x707411]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1ad4)[0x709694]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x1e3)[0x70aa63]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x17f)[0x6d430f]
/usr/sbin/mysqld(handle_one_connection+0x47)[0x6d44e7]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xb3b63a]
/lib64/libpthread.so.0(+0x7ddb)[0x7fc55a93eddb]
/lib64/libc.so.6(clone+0x6d)[0x7fc55907ca1d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fc2e803b660): is an invalid pointer
Connection ID (thread ID): 9004
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
[url]http://www.percona.com/software/percona-xtradb-cluster/[/url]. You may find information
in the manual which will help you identify the cause of the crash.
140214 12:07:54 mysqld_safe Number of processes running now: 0
140214 12:07:54 mysqld_safe WSREP: not restarting wsrep node automatically
140214 12:07:54 mysqld_safe mysqld from pid file /var/lib/mysql/ip-10-1-7-144.pid ended

datadir=/var/lib/mysql
user=mysql
wsrep_provider=/usr/lib64/libgalera_smm.so
wsrep_cluster_address=gcomm://10.1.7.144,10.1.8.218
binlog_format=ROW
default_storage_engine=InnoDB
innodb_locks_unsafe_for_binlog=1
innodb_buffer_pool_size=6G
key_buffer_size = 2048M
max_allowed_packet = 50M
table_open_cache = 1024
sort_buffer_size = 20M
read_buffer_size = 20M
read_rnd_buffer_size = 80M
myisam_sort_buffer_size = 64M
thread_cache_size = 32
query_cache_size = 32M
thread_concurrency = 4
innodb_flush_method=O_DIRECT
innodb_log_file_size=1G
innodb_buffer_pool_size=6G
innodb_autoinc_lock_mode=2
wsrep_node_address=10.1.7.144
wsrep_sst_method=xtrabackup
wsrep_cluster_name=my_centos_cluster
wsrep_sst_auth=“sstuser:s3cret”
[mysql]
prompt=\u@\h [\d]>\_

This is weird that both nodes went offline, can someone suggest how to avoid this in future please?

There could be many causes for this type of problem!, make sure you are not setting higher values that could run out of resources like (memory,disk space etc…).

In your above config u have specified [COLOR=#252C2F]

for more info u can go through below links. (see if altering the values could do any better…)
[url]http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/[/url]
[url]http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/[/url]

Thank you for your response, will check these links. As for [COLOR=#252C2F]