Cluster Node crached with strange error

Hello!

We have Percona XtraDB Cluster with 3 modes.
Debian 6.0
Percona percona-xtradb-cluster-server-5.5 (Version: 5.5.33-23.7.6-496.squeeze)

Every few days server node is craching with error. We trued to upgrade software version from repos, but this didn’t solve the problem.

Configuration file:

/etc/mysql/my.cnf
[mysqld]

datadir=/var/lib/mysql
user=mysql

binlog_format=ROW

wsrep_provider=/usr/lib/libgalera_smm.so

wsrep_cluster_address=gcomm://176.X.X.66,95.X.X.218

wsrep_slave_threads=16
wsrep_cluster_name=xxx_cluster
wsrep_node_name=node_2

wsrep_sst_method=rsync
wsrep_sst_auth=xxxx:xxxxxxxxxx
log_error=/var/log/mysql/error.log


innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2

innodb_buffer_pool_size=4G
innodb_log_file_size=128M
innodb_log_buffer_size=4M
innodb-file-per-table

query_cache_size = 0
innodb_flush_log_at_trx_commit = 0

max_connect_errors = 10000

Error log from crach node:


131008 19:27:19 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 255f747f-2ebf-11e3-ab78-ef46ed89d801 (tcp://176.X.X.66:4567), attempt 0
131008 19:27:19 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting off
131008 19:27:22 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://176.X.X.66:4567
131008 19:27:23 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 255f747f-2ebf-11e3-ab78-ef46ed89d801 (tcp://176.X.X.66:4567), attempt 0
131008 19:27:36 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2 (tcp://95.X.X.218:4567), attempt 0
131008 19:27:36 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting off
131008 19:27:37 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, OPERATIONAL, view_id(REG,20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2,97)) suspecting node: 20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2
131008 19:27:37 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, OPERATIONAL, view_id(REG,20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2,97)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
131008 19:27:38 [Note] WSREP: view(view_id(NON_PRIM,20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2,97) memb {
42794e56-300e-11e3-9431-5a351eb5fbb0,
} joined {
} left {
} partitioned {
20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2,
255f747f-2ebf-11e3-ab78-ef46ed89d801,
})
131008 19:27:38 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
131008 19:27:38 [Note] WSREP: view(view_id(NON_PRIM,42794e56-300e-11e3-9431-5a351eb5fbb0,98) memb {
42794e56-300e-11e3-9431-5a351eb5fbb0,
} joined {
} left {
} partitioned {
20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2,
255f747f-2ebf-11e3-ab78-ef46ed89d801,
})
131008 19:27:38 [Note] WSREP: Flow-control interval: [16, 16]
131008 19:27:38 [Note] WSREP: Received NON-PRIMARY.
131008 19:27:38 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1526)
131008 19:27:38 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
131008 19:27:38 [Note] WSREP: Flow-control interval: [16, 16]
131008 19:27:38 [Note] WSREP: New cluster view: global state: 18f7464c-2cee-11e3-0800-6a67365ca6cb:1526, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
131008 19:27:38 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
131008 19:27:38 [Note] WSREP: Received NON-PRIMARY.
131008 19:27:38 [Note] WSREP: New cluster view: global state: 18f7464c-2cee-11e3-0800-6a67365ca6cb:1526, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
131008 19:27:38 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
131008 19:27:39 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://176.X.X.66:4567
131008 19:27:40 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 255f747f-2ebf-11e3-ab78-ef46ed89d801 (tcp://176.X.X.66:4567), attempt 0
131008 19:27:41 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2 (tcp://95.X.X.218:4567), attempt 0
131008 19:27:43 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2 (tcp://95.X.X.218:4567), attempt 0
131008 19:27:43 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
131008 19:27:44 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
131008 19:27:44 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
131008 19:27:45 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
131008 19:27:46 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
131008 19:27:47 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
131008 19:27:49 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
131008 19:27:53 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting off
131008 19:27:54 [Note] WSREP: declaring 20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2 stable
131008 19:27:54 [Note] WSREP: declaring 255f747f-2ebf-11e3-ab78-ef46ed89d801 stable
131008 19:27:54 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://95.X.X.218:4567

16:27:54 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=20
max_threads=153
thread_count=17
connection_count=17
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 343054 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7f51d5]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x6c17a4]
/lib/libpthread.so.0(+0xeff0)[0x7fb8f4b74ff0]
/lib/libc.so.6(+0x72146)[0x7fb8f37b8146]
/lib/libc.so.6(+0x73758)[0x7fb8f37b9758]
/lib/libc.so.6(cfree+0x6c)[0x7fb8f37bcb8c]
/usr/lib/libgalera_smm.so(_ZN5gcomm13AsioTcpSocket13write_handlerERKN4asio10error_codeEm+0x358)[0x7fb8f1ff0b48]
/usr/lib/libgalera_smm.so(_ZN4asio6detail8write_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS4_EEEEN5boost5arrayINS_12const_bufferELm2EEENS0_14transfer_all_tENS8_3_bi6bind_tIvNS8_4_mfi3mf2IvN5gcomm13AsioTcpSocketERKNS_10error_codeEmEENSD_5list3INSD_5valueINS8_10shared_ptrISI_EEEEPFNS8_3argILi1EEEvEPFNSS_ILi2EEEvEEEEEEclESL_mi+0xee)[0x7fb8f1ff986e]
/usr/lib/libgalera_smm.so(_ZN4asio6detail23reactive_socket_send_opINS0_17consuming_buffersINS_12const_bufferEN5boost5arrayIS3_Lm2EEEEENS0_8write_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceISB_EEEES6_NS0_14transfer_all_tENS4_3_bi6bind_tIvNS4_4_mfi3mf2IvN5gcomm13AsioTcpSocketERKNS_10error_codeEmEENSG_5list3INSG_5valueINS4_10shared_ptrISL_EEEEPFNS4_3argILi1EEEvEPFNSV_ILi2EEEvEEEEEEEE11do_completeEPNS0_15task_io_serviceEPNS0_25task_io_service_operationESM_m+0x2bf)[0x7fb8f1ffa32f]
/usr/lib/libgalera_smm.so(_ZN4asio6detail15task_io_service3runERNS_10error_codeE+0x45a)[0x7fb8f201924a]
/usr/lib/libgalera_smm.so(_ZN5gcomm12AsioProtonet10event_loopERKN2gu8datetime6PeriodE+0x1d6)[0x7fb8f2012e76]
/usr/lib/libgalera_smm.so(_ZN9GCommConn3runEv+0x57)[0x7fb8f202c757]
/usr/lib/libgalera_smm.so(_ZN9GCommConn6run_fnEPv+0x9)[0x7fb8f2030989]
/lib/libpthread.so.0(+0x68ca)[0x7fb8f4b6c8ca]
/lib/libc.so.6(clone+0x6d)[0x7fb8f3815b6d]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.

131008 19:27:54 mysqld_safe Number of processes running now: 0
131008 19:27:54 mysqld_safe WSREP: not restarting wsrep node automatically
131008 19:27:54 mysqld_safe mysqld from pid file /var/lib/mysql/server.my.pid ended

What can be a reason of suck craching?
Thank you!

Hi,

I do not see “wsrep_node_address” option in your configuration file.
Can you please set it and check if crash will happen again?

http://www.percona.com/doc/percona-xtradb-cluster/wsrep-system-index.html#wsrep_node_address
http://www.codership.com/wiki/doku.php?id=mysql_galera_configuration#wsrep_node_address