XtraDB Cluster node failure

How do you diagnose what caused an XtraDB cluster node to fail?

What are ALL the log files etc that I should be looking at?

Thanks.

Dom

A node (6 node cluster, split three members on two sites (1000MB or more between sites) failed at Nov 2 00:55:14 and failed to restart, several times, mysql_safe seems to have captured the failure but failed to restart it.

We had to do a HARD OS restart and a manual mysql restart as you can see from the mysql error log.

There was no user activity on the db at the time of the failure.

We lost a second node 6 hours after the first and a third about 5 hours later, again nothing seems to explain it.

Environment:
scientific 6.2 64bit
vmware 5
2vcpu
8GB ram
4GB db
5.9GB innoDB buffer
Version 5.5.24-23.6

my.cnf file:

[client]
port = 3306
socket = /var/lib/mysql/mysql.sock

[mysqld]
skip-name-resolv
character-set-server=utf8
collation-server=utf8_bin
datadir=/var/lib/mysql
user=mysql
expire_logs_days=3
port = 3306
socket = /var/lib/mysql/mysql.sock
back_log = 50
max_connections = 250
max_connect_errors = 10
table_open_cache = 2048
max_allowed_packet = 32M
thread_cache_size = 8

query_cache_size = 0
query_cache_type = 0
ft_min_word_len = 4

default-storage-engine = InnoDB
thread_stack = 192K
transaction_isolation = REPEATABLE-READ
tmp_table_size = 64M

log-bin=mysql-bin
binlog_format=ROW
log_slave_updates
slow_query_log=1
slow_query_log_file=/var/log/mysql/slow.log
long_query_time = 2

general_log=0
general_log_file=/var/log/mysql/general.log
log-error=/var/log/mysql/error.log

server-id=183371624
key_buffer_size = 32M
bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 128M
myisam_max_sort_file_size = 10G
myisam_repair_threads = 1
myisam_recover

innodb_additional_mem_pool_size = 16M
innodb_buffer_pool_size = 6000M
innodb_data_file_path = ibdata1:10M:autoextend
innodb_thread_concurrency = 0
innodb_flush_log_at_trx_commit = 0
innodb_doublewrite=1
innodb_log_buffer_size = 8M
innodb_log_file_size = 384M
innodb_log_files_in_group = 2
innodb_max_dirty_pages_pct = 90
innodb_file_per_table
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1

wait_timeout=300

wsrep_provider=/usr/lib64/libgalera_smm.so
wsrep_provider_options = “gcache.size=4G; gmcast.listen_addr=tcp://0.0.0.0:4010;evs.send_window=512; evs.user_send_window=512;”
wsrep_slave_threads=8
wsrep_cluster_name=lptlive
wsrep_sst_method=xtrabackup
wsrep_sst_auth=replic:5vaeNeIiTYBRU
wsrep_notify_cmd=/usr/bin/wsrep_notify.sh

[xtrabackup]
target-dir = /backup

[mysqldump]
quick
max_allowed_packet = 32M

[mysql]
no-auto-rehash

[myisamchk]
key_buffer_size = 512M
sort_buffer_size = 512M
read_buffer = 8M
write_buffer = 8M

[mysqlhotcopy]
interactive-timeout

[mysqld_safe]
open-files-limit = 8192
wsrep_urls= gcomm://192.168.100.101:4010,gcomm://192.168.100.102:4010,gc omm://192.168.100.103:4010,gcomm://192.168.101.104:4010,gcom m://192.168.101.105:4010,gcomm://192.168.101.106:4010,gcomm: //

I have checked the mysql error log and it says:

121102 00:55:14 mysqld_safe Number of processes running now: 0
121102 00:55:14 mysqld_safe mysqld restarted
121102 0:55:15 [Note] Flashcache bypass: disabled
121102 0:55:15 [Note] Flashcache setup error is : open flash device failed

121102 0:55:15 [Note] WSREP: Read nil XID from storage engines, skipping position init
121102 0:55:15 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/libgalera_smm.so’
121102 0:55:15 [Note] WSREP: wsrep_load(): Galera 2.1(r113) by Codership Oy <info&#64;codership.com> loaded succesfully.
121102 0:55:15 [Note] WSREP: Found saved state: 72de1b90-022d-11e2-0800-484cdf256d28:-1
121102 0:55:15 [Note] WSREP: Reusing existing ‘/var/lib/mysql//galera.cache’.
121102 0:55:15 [Note] WSREP: Passing config to GCS: base_host = 192.168.100.103; evs.send_window = 512; evs.user_send_window = 512; gcache.dir = /var/lib/mysql/; gcache.keep_pages
_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 4G; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16;
gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; g
mcast.listen_addr = tcp://0.0.0.0:4010; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
121102 0:55:15 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
121102 0:55:15 [Note] WSREP: wsrep_sst_grab()
121102 0:55:15 [Note] WSREP: Start replication
121102 0:55:15 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
121102 0:55:15 [Note] WSREP: protonet asio version 0
121102 0:55:15 [Note] WSREP: backend: asio
121102 0:55:15 [Note] WSREP: GMCast version 0
121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, ‘tcp://0.0.0.0:4010’) listening at tcp://0.0.0.0:4010
121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, ‘tcp://0.0.0.0:4010’) multicast: , ttl: 1
121102 0:55:15 [Note] WSREP: EVS version 0
121102 0:55:15 [Note] WSREP: PC version 0
121102 0:55:15 [Note] WSREP: gcomm: connecting to group ‘lptlive’, peer ‘192.168.100.101:4010’
121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, ‘tcp://0.0.0.0:4010’) turning message relay requesting on, nonlive peers: tcp://192.168.100.57:4010 tcp://192.168.101.105:4010 tcp://192.168.101.104:4010 tcp://192.168.100.102:4010 tcp://192.168.100.139:4010
121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, ‘tcp://0.0.0.0:4010’) cleaning up duplicate 0x28b0660 after established 0x289c490
121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, ‘tcp://0.0.0.0:4010’) turning message relay requesting off
121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, ‘tcp://0.0.0.0:4010’) cleaning up duplicate 0x28ae630 after established 0x28dc140
121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, ‘tcp://0.0.0.0:4010’) cleaning up duplicate 0x28b0000 after established 0x28db420
121102 0:55:19 [Note] WSREP: declaring 72f89f95-0729-11e2-0800-e80c813e3134 stable
121102 0:55:19 [Note] WSREP: declaring a8511994-190a-11e2-0800-580d0a80208f stable
121102 0:55:19 [Note] WSREP: declaring a882f2cb-190a-11e2-0800-00171a908230 stable
121102 0:55:19 [Note] WSREP: declaring a8b2f666-190a-11e2-0800-56702e7cb352 stable
121102 0:55:19 [Note] WSREP: declaring a8beb15d-190a-11e2-0800-81fcb2686439 stable
121102 0:55:19 [Note] WSREP: declaring a8c0cfb1-190a-11e2-0800-7533cf9b72c7 stable
121102 0:55:19 [Note] WSREP: view(view_id(PRIM,72f89f95-0729-11e2-0800-e80c813e3134,117) memb {
72f89f95-0729-11e2-0800-e80c813e3134,
a8511994-190a-11e2-0800-580d0a80208f,
a882f2cb-190a-11e2-0800-00171a908230,
a8b2f666-190a-11e2-0800-56702e7cb352,
a8beb15d-190a-11e2-0800-81fcb2686439,
a8c0cfb1-190a-11e2-0800-7533cf9b72c7,
f6ba0383-2487-11e2-0800-0e4230df1ac6,
} joined {
} left {
} partitioned {
aae97de1-1831-11e2-0800-ee58636f1b39,
})
121102 0:55:19 [Note] WSREP: gcomm: connected
121102 0:55:19 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
121102 0:55:19 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
121102 0:55:19 [Note] WSREP: Opened channel ‘lptlive’
121102 0:55:19 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 6, memb_num = 7
121102 0:55:19 [Note] WSREP: Waiting for SST to complete.
121102 0:55:19 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
121102 0:55:19 [Note] WSREP: STATE EXCHANGE: sent state msg: f95d07c6-2487-11e2-0800-d49782a4e38d
121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 0 (garb)
121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 1 (mysqldbdrv03)
121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 2 (mysqldbdrv01)
121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 4 (mysqldb03)
121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 5 (mysqldb02)
121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 3 (mysqldbdrv02)
121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 6 (mysqldb01)
121102 0:55:19 [Note] WSREP: ‘garb’ demoted SYNCED->PRIMARY due to gap in history: 8585650 - 8585807
121102 0:55:19 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 112,
members = 5/7 (joined/total),
act_id = 8585807,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 72de1b90-022d-11e2-0800-484cdf256d28
121102 0:55:19 [Note] WSREP: Flow-control interval: [21, 42]
121102 0:55:19 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8585807)
121102 0:55:19 [Note] WSREP: State transfer required:
Group state: 72de1b90-022d-11e2-0800-484cdf256d28:8585807
Local state: 72de1b90-022d-11e2-0800-484cdf256d28:-1
121102 0:55:19 [Note] WSREP: New cluster view: global state: 72de1b90-022d-11e2-0800-484cdf256d28:8585807, view# 113: Primary, number of nodes: 7, my index: 6, protocol version 2
121102 0:55:19 [Warning] WSREP: Gap in state sequence. Need state transfer.
121102 0:55:19 [Note] WSREP: Node 0 (garb) requested state transfer from ‘any’. Selected 1 (mysqldbdrv03)(SYNCED) as donor.
121102 0:55:19 [Note] WSREP: 0 (garb): State transfer from 1 (mysqldbdrv03) complete.
121102 0:55:19 [Note] WSREP: Member 0 (garb) synced with group.
121102 0:55:20 [Warning] WSREP: 1 (mysqldbdrv03): State transfer to 0 (garb) failed: -1 (Operation not permitted)
121102 0:55:20 [Note] WSREP: Member 1 (mysqldbdrv03) synced with group.
121102 0:55:21 [Note] WSREP: Running: ‘wsrep_sst_xtrabackup ‘joiner’ ‘192.168.100.103’ ‘replic:5vaeNeIiTYBRU’ ‘/var/lib/mysql/’ ‘/etc/my.cnf’ ‘4981’ 2>sst.err’
121102 0:55:21 [Note] WSREP: Prepared SST request: xtrabackup|192.168.100.103:4444/xtrabackup_sst
ERROR 2003 (HY000): Can’t connect to MySQL server on ‘127.0.0.1’ (111)
sed: couldn’t flush stdout: Broken pipe
121102 0:55:21 [ERROR] WSREP: Process completed with error: /usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 6 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,a8511994-190a-11e 2-0800-580d0a80208f/mysqldbdrv03/192.168.100.
121102 0:55:21 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 6 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,a8511994-190a-11e 2-0800-580d0a802
121102 0:55:21 [Note] WSREP: Assign initial position for certification: 8585807, protocol version: 2
121102 0:55:21 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.168.100.103:4011
121102 0:55:21 [Note] WSREP: Node 6 (mysqldb01) requested state transfer from ‘any’. Selected 1 (mysqldbdrv03)(SYNCED) as donor.
121102 0:55:21 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8585830)
121102 0:55:21 [Note] WSREP: Requesting state transfer: success, donor: 1
121102 0:56:57 [Note] WSREP: 1 (mysqldbdrv03): State transfer to 6 (mysqldb01) complete.
121102 0:56:57 [Note] WSREP: Member 1 (mysqldbdrv03) synced with group.
121102 0:57:17 [Note] WSREP: SST complete, seqno: 8585950
121102 0:57:17 [Note] Plugin ‘FEDERATED’ is disabled.
121102 0:57:17 InnoDB: The InnoDB memory heap is disabled
121102 0:57:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins
121102 0:57:17 InnoDB: Compressed tables use zlib 1.2.3
121102 0:57:17 InnoDB: Using Linux native AIO
121102 0:57:17 InnoDB: Initializing buffer pool, size = 5.9G
InnoDB: mmap(6463488000 bytes) failed; errno 12
121102 0:57:17 InnoDB: Completed initialization of buffer pool
121102 0:57:17 InnoDB: Fatal error: cannot allocate memory for the buffer pool
121102 0:57:17 [ERROR] Plugin ‘InnoDB’ init function returned error.
121102 0:57:17 [ERROR] Plugin ‘InnoDB’ registration as a STORAGE ENGINE failed.
121102 0:57:17 [ERROR] Unknown/unsupported storage engine: InnoDB
121102 0:57:17 [ERROR] Aborting

121102 0:57:19 [Note] WSREP: Closing send monitor…
121102 0:57:19 [Note] WSREP: Closed send monitor.
121102 0:57:19 [Note] WSREP: gcomm: terminating thread
121102 0:57:19 [Note] WSREP: gcomm: joining thread
121102 0:57:19 [Note] WSREP: gcomm: closing backend
121102 0:57:19 [Note] WSREP: view(view_id(NON_PRIM,72f89f95-0729-11e2-0800-e80c813e3134,1 17) memb {
f6ba0383-2487-11e2-0800-0e4230df1ac6,
} joined {
} left {
} partitioned {
72f89f95-0729-11e2-0800-e80c813e3134,
a8511994-190a-11e2-0800-580d0a80208f,
a882f2cb-190a-11e2-0800-00171a908230,
a8b2f666-190a-11e2-0800-56702e7cb352,
a8beb15d-190a-11e2-0800-81fcb2686439,
a8c0cfb1-190a-11e2-0800-7533cf9b72c7,
})
121102 0:57:19 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
121102 0:57:19 [Note] WSREP: view((empty))
121102 0:57:19 [Note] WSREP: gcomm: closed
121102 0:57:19 [Note] WSREP: Flow-control interval: [8, 16]
121102 0:57:19 [Note] WSREP: Received NON-PRIMARY.
121102 0:57:19 [Note] WSREP: Shifting JOINER -> OPEN (TO: 8586007)
121102 0:57:19 [Note] WSREP: Received self-leave message.
121102 0:57:19 [Note] WSREP: Flow-control interval: [0, 0]
121102 0:57:19 [Note] WSREP: Received SELF-LEAVE. Closing connection.
121102 0:57:19 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 8586007)
121102 0:57:19 [Note] WSREP: RECV thread exiting 0: Success
121102 0:57:19 [Note] WSREP: recv_thread() joined.
121102 0:57:19 [Note] WSREP: Closing slave action queue.
121102 0:57:19 [Note] WSREP: Service disconnected.
121102 0:57:19 [Note] WSREP: rollbacker thread exiting
121102 0:57:20 [Note] WSREP: Some threads may fail to exit.
121102 0:57:20 [Note] /usr/sbin/mysqld: Shutdown complete

Error in my_thread_global_end(): 2 threads didn’t exit
121102 00:57:25 mysqld_safe mysqld from pid file /var/lib/mysql/mysqldb01.pid ended
121102 13:02:33 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
121102 13:02:33 [Note] Flashcache bypass: disabled
121102 13:02:33 [Note] Flashcache setup error is : open flash device failed

121102 13:02:33 [Note] WSREP: Read nil XID from storage engines, skipping position init
121102 13:02:33 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/libgalera_smm.so’
121102 13:02:33 [Note] WSREP: wsrep_load(): Galera 2.1(r113) by Codership Oy <info&#64;codership.com> loaded succesfully.
121102 13:02:33 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
121102 13:02:33 [Note] WSREP: Reusing existing ‘/var/lib/mysql//galera.cache’.
121102 13:02:33 [Note] WSREP: Passing config to GCS: base_host = 192.168.100.103; evs.send_window = 512; evs.user_send_window = 512; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 4G; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4010; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
121102 13:02:33 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
121102 13:02:33 [Note] WSREP: wsrep_sst_grab()
121102 13:02:33 [Note] WSREP: Start replication
121102 13:02:33 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
121102 13:02:33 [Note] WSREP: protonet asio version 0
121102 13:02:33 [Note] WSREP: backend: asio
121102 13:02:33 [Note] WSREP: GMCast version 0
121102 13:02:33 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, ‘tcp://0.0.0.0:4010’) listening at tcp://0.0.0.0:4010
121102 13:02:33 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, ‘tcp://0.0.0.0:4010’) multicast: , ttl: 1
121102 13:02:33 [Note] WSREP: EVS version 0
121102 13:02:33 [Note] WSREP: PC version 0
121102 13:02:33 [Note] WSREP: gcomm: connecting to group ‘lptlive’, peer ‘192.168.100.101:4010’
121102 13:02:33 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, ‘tcp://0.0.0.0:4010’) turning message relay requesting on, nonlive peers: tcp://192.168.100.57:4010 tcp://192.168.101.105:4010 tcp://192.168.101.104:4010 tcp://192.168.100.139:4010
121102 13:02:33 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, ‘tcp://0.0.0.0:4010’) cleaning up duplicate 0x13d9290 after established 0x13d05e0
121102 13:02:33 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, ‘tcp://0.0.0.0:4010’) cleaning up duplicate 0x13f3110 after established 0x13ea570
121102 13:02:34 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, ‘tcp://0.0.0.0:4010’) turning message relay requesting off
121102 13:02:34 [Note] WSREP: declaring 72f89f95-0729-11e2-0800-e80c813e3134 stable
121102 13:02:34 [Note] WSREP: declaring a8511994-190a-11e2-0800-580d0a80208f stable
121102 13:02:34 [Note] WSREP: declaring a882f2cb-190a-11e2-0800-00171a908230 stable
121102 13:02:34 [Note] WSREP: declaring a8b2f666-190a-11e2-0800-56702e7cb352 stable
121102 13:02:34 [Note] WSREP: declaring a8beb15d-190a-11e2-0800-81fcb2686439 stable
121102 13:02:34 [Note] WSREP: view(view_id(PRIM,72f89f95-0729-11e2-0800-e80c813e3134,121) memb {
72f89f95-0729-11e2-0800-e80c813e3134,
914eeace-24ed-11e2-0800-f3b6d7ea9afa,
a8511994-190a-11e2-0800-580d0a80208f,
a882f2cb-190a-11e2-0800-00171a908230,
a8b2f666-190a-11e2-0800-56702e7cb352,
a8beb15d-190a-11e2-0800-81fcb2686439,
} joined {
} left {
} partitioned {
})
121102 13:02:34 [Note] WSREP: gcomm: connected
121102 13:02:34 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
121102 13:02:34 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
121102 13:02:34 [Note] WSREP: Opened channel ‘lptlive’
121102 13:02:34 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 6
121102 13:02:34 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
121102 13:02:34 [Note] WSREP: Waiting for SST to complete.
121102 13:02:34 [Note] WSREP: STATE EXCHANGE: sent state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a
121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 0 (garb)
121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 2 (mysqldbdrv03)
121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 3 (mysqldbdrv01)
121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 4 (mysqldbdrv02)
121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 5 (mysqldb03)
121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 1 (mysqldb01)
121102 13:02:34 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 116,
members = 5/6 (joined/total),
act_id = 8643168,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 72de1b90-022d-11e2-0800-484cdf256d28
121102 13:02:34 [Note] WSREP: Flow-control interval: [20, 39]
121102 13:02:34 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8643168)
121102 13:02:34 [Note] WSREP: State transfer required:
Group state: 72de1b90-022d-11e2-0800-484cdf256d28:8643168
Local state: 00000000-0000-0000-0000-000000000000:-1
121102 13:02:34 [Note] WSREP: New cluster view: global state: 72de1b90-022d-11e2-0800-484cdf256d28:8643168, view# 117: Primary, number of nodes: 6, my index: 1, protocol version
2
121102 13:02:34 [Warning] WSREP: Gap in state sequence. Need state transfer.
121102 13:02:36 [Note] WSREP: Running: ‘wsrep_sst_xtrabackup ‘joiner’ ‘192.168.100.103’ ‘replic:5vaeNeIiTYBRU’ ‘/var/lib/mysql/’ ‘/etc/my.cnf’ ‘22659’ 2>sst.err’
121102 13:02:36 [Note] WSREP: Prepared SST request: xtrabackup|192.168.100.103:4444/xtrabackup_sst
ERROR 2003 (HY000): Can’t connect to MySQL server on ‘127.0.0.1’ (111)
sed: couldn’t flush stdout: Broken pipe
121102 13:02:36 [ERROR] WSREP: Process completed with error: /usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 1 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,914eeace-24ed-11e 2-0800-f3b6d7ea9afa/mysqldb01/10.238.
121102 13:02:36 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 1 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,914eeace-24ed-11e 2-0800-f3b6d7ea9
121102 13:02:36 [Note] WSREP: Assign initial position for certification: 8643168, protocol version: 2
121102 13:02:36 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (72de1b90-022d-11e2-0800-484cdf256d28): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be unavailable.
121102 13:02:36 [Note] WSREP: Node 1 (mysqldb01) requested state transfer from ‘any’. Selected 2 (mysqldbdrv03)(SYNCED) as donor.
121102 13:02:36 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8643172)
121102 13:02:36 [Note] WSREP: Requesting state transfer: success, donor: 2
121102 13:04:09 [Note] WSREP: 2 (mysqldbdrv03): State transfer to 1 (mysqldb01) complete.
121102 13:04:09 [Note] WSREP: Member 2 (mysqldbdrv03) synced with group.
121102 13:04:32 [Note] WSREP: SST complete, seqno: 8643297
121102 13:04:32 [Note] Plugin ‘FEDERATED’ is disabled.
121102 13:04:32 InnoDB: The InnoDB memory heap is disabled
121102 13:04:32 InnoDB: Mutexes and rw_locks use GCC atomic builtins
121102 13:04:32 InnoDB: Compressed tables use zlib 1.2.3
121102 13:04:32 InnoDB: Using Linux native AIO
121102 13:04:32 InnoDB: Initializing buffer pool, size = 5.9G
InnoDB: mmap(6463488000 bytes) failed; errno 12
121102 13:04:32 InnoDB: Completed initialization of buffer pool
121102 13:04:32 InnoDB: Fatal error: cannot allocate memory for the buffer pool
121102 13:04:32 [ERROR] Plugin ‘InnoDB’ init function returned error.
121102 13:04:32 [ERROR] Plugin ‘InnoDB’ registration as a STORAGE ENGINE failed.
121102 13:04:32 [ERROR] Unknown/unsupported storage engine: InnoDB
121102 13:04:32 [ERROR] Aborting

121102 13:04:34 [Note] WSREP: Closing send monitor…
121102 13:04:34 [Note] WSREP: Closed send monitor.
121102 13:04:34 [Note] WSREP: gcomm: terminating thread
121102 13:04:34 [Note] WSREP: gcomm: joining thread
121102 13:04:34 [Note] WSREP: gcomm: closing backend
121102 13:04:34 [Note] WSREP: view(view_id(NON_PRIM,72f89f95-0729-11e2-0800-e80c813e3134,1 21) memb {
914eeace-24ed-11e2-0800-f3b6d7ea9afa,
} joined {
} left {
} partitioned {
72f89f95-0729-11e2-0800-e80c813e3134,
a8511994-190a-11e2-0800-580d0a80208f,
a882f2cb-190a-11e2-0800-00171a908230,
a8b2f666-190a-11e2-0800-56702e7cb352,
a8beb15d-190a-11e2-0800-81fcb2686439,
})
121102 13:04:34 [Note] WSREP: view((empty))
121102 13:04:34 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
121102 13:04:34 [Note] WSREP: gcomm: closed
121102 13:04:34 [Note] WSREP: Flow-control interval: [8, 16]
121102 13:04:34 [Note] WSREP: Received NON-PRIMARY.
121102 13:04:34 [Note] WSREP: Shifting JOINER -> OPEN (TO: 8643326)
121102 13:04:34 [Note] WSREP: Received self-leave message.
121102 13:04:34 [Note] WSREP: Flow-control interval: [0, 0]
121102 13:04:34 [Note] WSREP: Received SELF-LEAVE. Closing connection.
121102 13:04:34 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 8643326)
121102 13:04:34 [Note] WSREP: RECV thread exiting 0: Success
121102 13:04:34 [Note] WSREP: recv_thread() joined.
121102 13:04:34 [Note] WSREP: Closing slave action queue.
121102 13:04:34 [Note] WSREP: Service disconnected.
121102 13:04:34 [Note] WSREP: rollbacker thread exiting
121102 13:04:35 [Note] WSREP: Some threads may fail to exit.
121102 13:04:35 [Note] /usr/sbin/mysqld: Shutdown complete

Error in my_thread_global_end(): 2 threads didn’t exit
121102 13:04:40 mysqld_safe mysqld from pid file /var/lib/mysql/mysqldb01.pid ended
121102 13:11:44 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
121102 13:11:45 [Note] Flashcache bypass: disabled
121102 13:11:45 [Note] Flashcache setup error is : open flash device failed

121102 13:11:45 [Note] WSREP: Read nil XID from storage engines, skipping position init
121102 13:11:45 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/libgalera_smm.so’
121102 13:11:45 [Note] WSREP: wsrep_load(): Galera 2.1(r113) by Codership Oy <info&#64;codership.com> loaded succesfully.
121102 13:11:45 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
121102 13:11:45 [Note] WSREP: Reusing existing ‘/var/lib/mysql//galera.cache’.
121102 13:11:45 [Note] WSREP: Passing config to GCS: base_host = 192.168.100.103; evs.send_window = 512; evs.user_send_window = 512; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 4G; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4010; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
121102 13:11:45 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
121102 13:11:45 [Note] WSREP: wsrep_sst_grab()
121102 13:11:45 [Note] WSREP: Start replication
121102 13:11:45 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
121102 13:11:45 [Note] WSREP: protonet asio version 0
121102 13:11:45 [Note] WSREP: backend: asio
121102 13:11:45 [Note] WSREP: GMCast version 0
121102 13:11:45 [Note] WSREP: (d9fd24c5-24ee-11e2-0800-044cf4047e4f, ‘tcp://0.0.0.0:4010’) listening at tcp://0.0.0.0:4010
121102 13:11:45 [Note] WSREP: (d9fd24c5-24ee-11e2-0800-044cf4047e4f, ‘tcp://0.0.0.0:4010’) multicast: , ttl: 1
121102 13:11:45 [Note] WSREP: EVS version 0
121102 13:11:45 [Note] WSREP: PC version 0
121102 13:11:45 [Note] WSREP: gcomm: connecting to group ‘lptlive’, peer ‘192.168.100.101:4010’
121102 13:11:45 [Note] WSREP: (d9fd24c5-24ee-11e2-0800-044cf4047e4f, ‘tcp://0.0.0.0:4010’) turning message relay requesting on, nonlive peers: tcp://192.168.100.57:4010 tcp://192.168.101.105:4010 tcp://192.168.101.104:4010 tcp://192.168.100.139:4010
121102 13:11:45 [Note] WSREP: (d9fd24c5-24ee-11e2-0800-044cf4047e4f, ‘tcp://0.0.0.0:4010’) turning message relay requesting off
121102 13:11:45 [Note] WSREP: declaring 72f89f95-0729-11e2-0800-e80c813e3134 stable
121102 13:11:45 [Note] WSREP: declaring a8511994-190a-11e2-0800-580d0a80208f stable
121102 13:11:45 [Note] WSREP: declaring a882f2cb-190a-11e2-0800-00171a908230 stable
121102 13:11:45 [Note] WSREP: declaring a8b2f666-190a-11e2-0800-56702e7cb352 stable
121102 13:11:45 [Note] WSREP: declaring a8beb15d-190a-11e2-0800-81fcb2686439 stable
121102 13:11:45 [Note] WSREP: view(view_id(PRIM,72f89f95-0729-11e2-0800-e80c813e3134,123) memb {
72f89f95-0729-11e2-0800-e80c813e3134,
a8511994-190a-11e2-0800-580d0a80208f,
a882f2cb-190a-11e2-0800-00171a908230,
a8b2f666-190a-11e2-0800-56702e7cb352,
a8beb15d-190a-11e2-0800-81fcb2686439,
d9fd24c5-24ee-11e2-0800-044cf4047e4f,
} joined {
} left {
} partitioned {
})
121102 13:11:45 [Note] WSREP: gcomm: connected
121102 13:11:45 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
121102 13:11:45 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
121102 13:11:45 [Note] WSREP: Opened channel ‘lptlive’
121102 13:11:45 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 5, memb_num = 6
121102 13:11:45 [Note] WSREP: Waiting for SST to complete.
121102 13:11:45 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
121102 13:11:45 [Note] WSREP: STATE EXCHANGE: sent state msg: da2023f8-24ee-11e2-0800-d38b758c56bd
121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 0 (garb)
121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 1 (mysqldbdrv03)
121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 3 (mysqldbdrv02)
121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 4 (mysqldb03)
121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 2 (mysqldbdrv01)
121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 5 (mysqldb01)
121102 13:11:45 [Note] WSREP: Quorum results:
version = 2,
component = PRIMARY,
conf_id = 118,
members = 5/6 (joined/total),
act_id = 8643880,
last_appl. = -1,
protocols = 0/4/2 (gcs/repl/appl),
group UUID = 72de1b90-022d-11e2-0800-484cdf256d28
121102 13:11:45 [Note] WSREP: Flow-control interval: [20, 39]
121102 13:11:45 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8643880)
121102 13:11:45 [Note] WSREP: State transfer required:
Group state: 72de1b90-022d-11e2-0800-484cdf256d28:8643880
Local state: 00000000-0000-0000-0000-000000000000:-1
121102 13:11:45 [Note] WSREP: New cluster view: global state: 72de1b90-022d-11e2-0800-484cdf256d28:8643880, view# 119: Primary, number of nodes: 6, my index: 5, protocol version 2
121102 13:11:45 [Warning] WSREP: Gap in state sequence. Need state transfer.
121102 13:11:47 [Note] WSREP: Running: ‘wsrep_sst_xtrabackup ‘joiner’ ‘192.168.100.103’ ‘replic:5vaeNeIiTYBRU’ ‘/var/lib/mysql/’ ‘/etc/my.cnf’ ‘12805’ 2>sst.err’
121102 13:11:47 [Note] WSREP: Prepared SST request: xtrabackup|192.168.100.103:4444/xtrabackup_sst
ERROR 2003 (HY000): Can’t connect to MySQL server on ‘127.0.0.1’ (111)
sed: couldn’t flush stdout: Broken pipe
121102 13:11:47 [ERROR] WSREP: Process completed with error: /usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 5 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,a8511994-190a-11e 2-0800-580d0a80208f/mysqldbdrv03/192.168.100.
121102 13:11:47 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 5 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,a8511994-190a-11e 2-0800-580d0a802
121102 13:11:47 [Note] WSREP: Assign initial position for certification: 8643880, protocol version: 2
121102 13:11:47 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (72de1b90-022d-11e2-0800-484cdf256d28): 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be unavailable.
121102 13:11:47 [Note] WSREP: Node 5 (mysqldb01) requested state transfer from ‘any’. Selected 1 (mysqldbdrv03)(SYNCED) as donor.
121102 13:11:47 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8643883)
121102 13:11:47 [Note] WSREP: Requesting state transfer: success, donor: 1
121102 13:13:24 [Note] WSREP: 1 (mysqldbdrv03): State transfer to 5 (mysqldb01) complete.
121102 13:13:24 [Note] WSREP: Member 1 (mysqldbdrv03) synced with group.
121102 13:13:38 [Note] WSREP: SST complete, seqno: 8644014
121102 13:13:38 [Note] Plugin ‘FEDERATED’ is disabled.
121102 13:13:38 InnoDB: The InnoDB memory heap is disabled
121102 13:13:38 InnoDB: Mutexes and rw_locks use GCC atomic builtins
121102 13:13:38 InnoDB: Compressed tables use zlib 1.2.3
121102 13:13:38 InnoDB: Using Linux native AIO
121102 13:13:38 InnoDB: Initializing buffer pool, size = 5.9G
121102 13:13:38 InnoDB: Completed initialization of buffer pool
121102 13:13:38 InnoDB: highest supported file format is Barracuda.
121102 13:13:38 InnoDB: Waiting for the background threads to start
121102 13:13:39 Percona XtraDB (http://www.percona.com) 1.1.8-rel25.3 started; log sequence number 31925464588
121102 13:13:39 [Note] Recovering after a crash using mysql-bin
121102 13:13:40 [Note] Starting crash recovery…
121102 13:13:40 [Note] Crash recovery finished.
121102 13:13:40 [Note] Server hostname (bind-address): ‘(null)’; port: 3306
121102 13:13:40 [Note] - ‘(null)’ resolves to ‘0.0.0.0’;
121102 13:13:40 [Note] - ‘(null)’ resolves to ‘::’;
121102 13:13:40 [Note] Server socket created on IP: ‘0.0.0.0’.
121102 13:13:40 [Warning] ‘proxies_priv’ entry ‘@ root@mysqldb02’ ignored in --skip-name-resolve mode.
121102 13:13:40 [Note] Event Scheduler: Loaded 0 events
121102 13:13:40 [Note] WSREP: Signalling provider to continue.
121102 13:13:40 [Note] WSREP: Received SST: 72de1b90-022d-11e2-0800-484cdf256d28:8644014
121102 13:13:40 [Note] WSREP: SST received: 72de1b90-022d-11e2-0800-484cdf256d28:8644014
121102 13:13:40 [Note] WSREP: 5 (mysqldb01): State transfer from 1 (mysqldbdrv03) complete

The /var/log/messages at the time of failure says:

Nov 2 00:55:12 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4850 from=::ffff:192.168.100.102
Nov 2 00:55:12 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4851 from=::ffff:192.168.100.101
Nov 2 00:55:13 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4851 duration=1(sec)
Nov 2 00:55:13 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4850 duration=1(sec)
Nov 2 00:55:13 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4909 from=::ffff:192.168.100.101
Nov 2 00:55:13 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4910 from=::ffff:192.168.100.102
Nov 2 00:55:14 mysqldbsrv01 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Nov 2 00:55:14 mysqldbsrv01 kernel: mysqld cpuset=/ mems_allowed=0
Nov 2 00:55:14 mysqldbsrv01 kernel: Pid: 4457, comm: mysqld Not tainted 2.6.32-220.el6.x86_64 #1
Nov 2 00:55:14 mysqldbsrv01 kernel: Call Trace:
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? cpuset_print_task_mems_allowed+0x91/0xb0
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? dump_header+0x90/0x1b0
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? security_real_capable_noaudit+0x3c/0x70
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? oom_kill_process+0x8a/0x2c0
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? select_bad_process+0xe1/0x120
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? out_of_memory+0x220/0x3c0
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? __alloc_pages_nodemask+0x89e/0x940
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? alloc_pages_current+0xaa/0x110
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? __page_cache_alloc+0x87/0x90
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? __do_page_cache_readahead+0xdb/0x210
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? ra_submit+0x21/0x30
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? filemap_fault+0x4c3/0x500
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? __do_fault+0x54/0x510
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? handle_pte_fault+0xf7/0xb50
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? sock_aio_read+0x181/0x190
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? handle_mm_fault+0x1e4/0x2b0
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? __do_page_fault+0x139/0x480
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? lock_sock_nested+0xac/0xc0
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? _spin_unlock_bh+0x1b/0x20
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? release_sock+0xce/0xe0
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? sock_setsockopt+0x16d/0x6f0
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? do_page_fault+0x3e/0xa0
Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? page_fault+0x25/0x30
Nov 2 00:55:14 mysqldbsrv01 kernel: Mem-Info:
Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA per-cpu:
Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 0: hi: 0, btch: 1 usd: 0
Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 1: hi: 0, btch: 1 usd: 0
Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA32 per-cpu:
Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 0: hi: 186, btch: 31 usd: 21
Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 1: hi: 186, btch: 31 usd: 46
Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 Normal per-cpu:
Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 0: hi: 186, btch: 31 usd: 34
Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 1: hi: 186, btch: 31 usd: 53
Nov 2 00:55:14 mysqldbsrv01 kernel: active_anon:1648167 inactive_anon:322976 isolated_anon:0
Nov 2 00:55:14 mysqldbsrv01 kernel: active_file:210 inactive_file:3276 isolated_file:0
Nov 2 00:55:14 mysqldbsrv01 kernel: unevictable:0 dirty:0 writeback:0 unstable:0
Nov 2 00:55:14 mysqldbsrv01 kernel: free:25820 slab_reclaimable:2474 slab_unreclaimable:12998
Nov 2 00:55:14 mysqldbsrv01 kernel: mapped:759 shmem:0 pagetables:7759 bounce:0
Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA free:15660kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15268kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov 2 00:55:14 mysqldbsrv01 kernel: lowmem_reserve[]: 0 3000 8050 8050
Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA32 free:45348kB min:25140kB low:31424kB high:37708kB active_anon:2310072kB inactive_anon:578088kB active_file:24kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:24kB shmem:0kB slab_reclaimable:144kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1636kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:33 all_unreclaimable? no
Nov 2 00:55:14 mysqldbsrv01 kernel: lowmem_reserve[]: 0 0 5050 5050
Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 Normal free:42272kB min:42316kB low:52892kB high:63472kB active_anon:4282596kB inactive_anon:713816kB active_file:816kB inactive_file:13104kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:5171200kB mlocked:0kB dirty:0kB writeback:0kB mapped:3184kB shmem:0kB slab_reclaimable:9752kB slab_unreclaimable:51992kB kernel_stack:1352kB pagetables:29400kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:608 all_unreclaimable? yes
Nov 2 00:55:14 mysqldbsrv01 kernel: lowmem_reserve[]: 0 0 0 0
Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA: 14kB 18kB 016kB 132kB 264kB 1128kB 0256kB 0512kB 11024kB 12048kB 34096kB = 15660kB
Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 Normal: 416
4kB 2828kB 20516kB 14832kB 12264kB 64128kB 28256kB 10512kB 21024kB 02048kB 04096kB = 42272kB
Nov 2 00:55:14 mysqldbsrv01 kernel: 9145 total pagecache pages
Nov 2 00:55:14 mysqldbsrv01 kernel: 5631 pages in swap cache
Nov 2 00:55:14 mysqldbsrv01 kernel: Swap cache stats: add 684333, delete 678702, find 182870270/182890443
Nov 2 00:55:14 mysqldbsrv01 kernel: Free swap = 0kB
Nov 2 00:55:14 mysqldbsrv01 kernel: Total swap = 999992kB
Nov 2 00:55:14 mysqldbsrv01 kernel: 2097136 pages RAM
Nov 2 00:55:14 mysqldbsrv01 kernel: 48654 pages reserved
Nov 2 00:55:14 mysqldbsrv01 kernel: 2079 pages shared
Nov 2 00:55:14 mysqldbsrv01 kernel: 2018022 pages non-shared
Nov 2 00:55:14 mysqldbsrv01 kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 453] 0 453 2880 0 1 -17 -1000 udevd
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1013] 0 1013 37677 114 0 0 0 vmtoolsd
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1266] 0 1266 23306 31 0 -17 -1000 auditd
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1282] 0 1282 62714 105 0 0 0 rsyslogd
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1432] 0 1432 29321 24 1 0 0 crond
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1460] 0 1460 1029 1 0 0 0 mingetty
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1462] 0 1462 1029 1 0 0 0 mingetty
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1464] 0 1464 3077 0 0 -17 -1000 udevd
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1465] 0 1465 3077 0 0 -17 -1000 udevd
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1466] 0 1466 1029 1 1 0 0 mingetty
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1468] 0 1468 1029 1 1 0 0 mingetty
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1470] 0 1470 1029 1 1 0 0 mingetty
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3338] 81 3338 7920 1 1 0 0 dbus-daemon
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3366] 0 3366 5767 0 0 0 0 oddjobd
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3476] 0 3476 16017 24 1 0 0 sshd
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3527] 38 3527 7014 39 0 0 0 ntpd
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3894] 0 3894 44097 54 1 0 0 sssd
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3896] 0 3896 635966 456032 0 0 0 sssd_be
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3898] 0 3898 156200 41587 0 0 0 sssd_nss
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3899] 0 3899 41845 28 0 0 0 sssd_pam
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4893] 0 4893 12909 23 1 0 0 master
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4896] 89 4896 13463 17 0 0 0 qmgr
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 5541] 0 5541 33684 8872 0 0 0 ruby
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 6798] 0 6798 1029 1 1 0 0 mingetty
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 8461] 0 8461 5536 32 0 0 0 xinetd
Nov 2 00:55:14 mysqldbsrv01 kernel: [23283] 0 23283 2873 17 0 0 0 mysqld_safe
Nov 2 00:55:14 mysqldbsrv01 kernel: [26543] 0 26543 49177 142 1 0 0 snmpd
Nov 2 00:55:14 mysqldbsrv01 kernel: [15222] 498 15222 3092711 1457974 1 0 0 mysqld
Nov 2 00:55:14 mysqldbsrv01 kernel: [22692] 20504 22692 2614 38 1 0 0 nrpe
Nov 2 00:55:14 mysqldbsrv01 kernel: [27036] 89 27036 13445 140 1 0 0 pickup
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4871] 20504 4871 2614 34 0 0 0 nrpe
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4872] 20504 4872 2615 34 1 0 0 nrpe
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4873] 20504 4873 27054 69 1 0 0 pmp-check-mysql
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4875] 20504 4875 2614 34 0 0 0 nrpe
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4876] 20504 4876 2615 34 1 0 0 nrpe
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4877] 20504 4877 27054 69 0 0 0 pmp-check-mysql
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4879] 20504 4879 2614 34 0 0 0 nrpe
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4880] 20504 4880 2615 34 1 0 0 nrpe
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4881] 20504 4881 27054 69 0 0 0 pmp-check-mysql
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4882] 20504 4882 27087 77 1 0 0 pmp-check-mysql
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4883] 20504 4883 27087 77 0 0 0 pmp-check-mysql
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4884] 20504 4884 27087 77 0 0 0 pmp-check-mysql
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4906] 20504 4906 49488 391 0 0 0 mysql
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4907] 20504 4907 49488 391 0 0 0 mysql
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4908] 20504 4908 49488 388 0 0 0 mysql
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4909] 99 4909 26521 176 0 0 0 clustercheck
Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4910] 99 4910 26521 177 0 0 0 clustercheck
Nov 2 00:55:14 mysqldbsrv01 kernel: Out of memory: Kill process 15222 (mysqld) score 654 or sacrifice child
Nov 2 00:55:14 mysqldbsrv01 kernel: Killed process 15222, UID 498, (mysqld) total-vm:12370844kB, anon-rss:5830224kB, file-rss:1672kB
Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4922 from=::ffff:192.168.100.101
Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4923 from=::ffff:192.168.100.102
Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4909 duration=1(sec)
Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4923 duration=0(sec)
Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4922 duration=0(sec)
Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4910 duration=1(sec)
Nov 2 00:55:15 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4983 from=::ffff:192.168.100.101

Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA32: 3534kB 1468kB 7116kB 5332kB 3664kB 28128kB 21256kB 20512kB 101024kB 42048kB 0*4096kB = 45348kB

No one able to comment?

Well this is just a guess, but in your log messages I see some notices about running out of memory, so that’s probably the root cause.

121102 0:57:17 InnoDB: Fatal error: cannot allocate memory for the buffer pool

And this point kernel oom killer has step in by killing mysqld:

Nov 2 00:55:14 mysqldbsrv01 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0

I’m currently evaluating Galera cluster by myself and I’m having problem with excessive memory usage compared to standalone mysql server, so maybe these issues are related. http://forum.percona.com/index.php?t=msg&th=2871

Thanks for the response…

The error messages that it had run out of memory are noted but there isn’t any explanation as to why it happened (why I asked were there some other logs that should be looked at) the servers and databases were not under load high IO or had anything that could obviously explain it … the nodes run fine, and then pop and go down… but why?

I also note the apparent memory leak in 5.5.27 cluster and that 5.5.28 standalone doesn’t have the same leak after your post.

Maybe the problems I am seeing are related.

Dom