I have a two node Percona XtraDB Cluster (v5.5.33) with Galera setup. Galera has been configured to ignore split-brain.
When I perform failover tests for these nodes then I see strange behaviour which I cant get a grip on.
The following is the scenario:
Node2 was the cluster creator. It has wsrep_cluster_address of “gcomm://”, Node1 has address “gcomm://10.0.100.2” when I start this scenario.
Node1 (10.0.100.1) and node2 (10.0.100.2) are both running and only node 1 is receiving data.
When I reboot node1 then node2 detects this and it happily receives data. No problem here. Node1 comes back up and performs IST to rejoin the cluster.
All still well.
When the IST of node1 is finished and the node is ready then after several minutes I stop node2. As soon as node2 is busy with stopping Percona then node1 hangs all transactions.
Doing a “show status like ‘wsrep%’;” shows me that node 1 ‘believes’ its still part of the cluster and does not seem to detect that the 2nd node is gone.
I’m using all innoDB tables and have a high-load on the server. Several TBs of data with 60GB configured innodb buffer pool size.
I also tried to do a "SET GLOBAL wsrep_cluster_address=‘gcomm://’; " to force node2 to be the cluster creator. But alas, it does not solve the issue described above.
Why is the node hanging? and more importantly how can I fix this?
My my.cnf (config of node1, node2 is the same apart from ofcourse IP-addresses) looks like :
user = mysql
default-storage-engine = innodb
socket = /data/mysql/mysql.sock
pid-file = /data/mysql/mysql.pid
slow-query-log = ON
log-queries-not-using-indexes = ON
innodb_print_all_deadlocks = ON
max_allowed_packet = 120M
max_connect_errors = 2000000000000
sysdate-is-now = 1
innodb = FORCE
innodb-strict-mode = 1
datadir = /data/mysql
tmpdir = /data/mysql-tmp
log-bin = /data/mysql/mysql-bin
expire-logs-days = 5
sync-binlog = 1
log-slave-updates = 1
relay-log = /data/mysql/relay-bin
slave-net-timeout = 60
sync-master-info = 1
sync-relay-log = 1
sync-relay-log-info = 1
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
max-connections = 1000
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 1024
table-open-cache = 1000
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 512M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table = 1
innodb-buffer-pool-size = 60G
server-id = 1
wsrep_provider_options=“pc.ignore_sb = yes; evs.keepalive_period = PT1S; evs.inactive_check_period = PT1S; evs.suspect_timeout = PT5S; evs.inactive_timeout = PT10S; evs.install_timeout = PT10S; gcache.size=32G”
Desired SST donor name.
Reject client queries when donating SST (false)