Lost server synchronisation in a three node cluster

laetiabi · October 10, 2013, 2:09am

Hi,

I have a three node cluster, and I lost synchronisation regularly with one of the server. Now I have corruption in the InnoDB tablespace, I feel uncomfortable to run a innodb_force_recovery=6 because I have 2 important production dabatases on these servers. I retired the failed server from the cluster, reinstall a new one, and all the symptoms reappear : synchronisation lost, and corruption once again.
Servers are Ubuntu 10.04.4 LTS, and I use percona-xtradb-cluster-server-5.5 (version 5.5.31-23.7.5-438).

You can find the error log in the attached file, and below the my.cnf file :


[client]
password = 'xxxxx'
port = 3306
socket = /var/run/mysqld/mysqld.sock

[mysqld_safe]
wsrep_urls=gcomm://192.168.183.40:4567,gcomm://192.168.183.41:4567,gcomm://192.168.183.42:4567

[mysqld]
datadir=/var/lib/mysql
user=mysql

binlog_format=ROW

wsrep_provider=/usr/lib64/libgalera_smm.so

wsrep_slave_threads=2
wsrep_cluster_name=prod_pa
wsrep_sst_method=rsync
wsrep_node_name=lxpadb03

default_storage_engine=InnoDB
innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2

#tuning
max_allowed_packet = 16M
max_connect_errors = 1000000
skip_name_resolve
query_cache_size=0
query_cache_type=0
tmp_table_size = 32M
max_heap_table_size = 32M
max_connections = 500
thread_cache_size = 50
open_files_limit = 65535
table_definition_cache = 4096
table_open_cache = 4096
# INNODB #
innodb_flush_method = O_DIRECT
innodb_log_files_in_group = 2
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 1
innodb_file_per_table = 1
innodb_buffer_pool_size = 3072M

Thanks in advance.
Laeti

lxpadb03.err.zip (60 KB)

przemek · October 16, 2013, 2:21am

So this Innodb corruption happened only on this single node? And by a reinstalling a node you mean it’s a new install on the same machine or completely different machine?
If only single machine shows data corruption I would check dmesg & /var/log/syslog for any signs of disk or memory errors. Memcheck would be good to have too.
If all nodes are experiencing data corruption, I would try taking of them off the cluster and mysldump all data if possible, probably in one of the innodb_force_recovery modes.

laetiabi · October 21, 2013, 4:04am

Yes the Innodb corruption happens only on this node. When I said reinstalling, I created a new virtual machine, but only kept the same hostname and ip address. The three servers are virtual machines.
There is nothing in dmesg and /var/log/syslog.
What I noticed in /var/log/syslog is that the synchronisation is not done one the second database. I see these kind of lines : rsync to rsync_sst/./mysqlslap or rsync to rsync_sst/./performance_schema, rsync to rsync_sst/./private_prod but never for toplink_prod database.

Thanks.
Laeti

przemek · October 26, 2013, 4:46am

Any other differences between this failing node and two other nodes? All living on the same host server?
There is also a chance the data is corrupted on source node from which SST was performed. I would suggest trying Percona XtraBackup from the lxpadb01 node and check how preparing and using this backup on another test host works.

Topic		Replies	Views
problem with sync Percona XtraDB Cluster 5.x	2	774	March 29, 2013
sync problem Percona XtraDB Cluster 5.x	1	370	February 26, 2013
Cluster re-sync causes much lock Percona XtraDB Cluster 5.x mysql	2	784	June 23, 2021
Cluster Node crached with strange error Percona XtraDB Cluster 5.x	1	705	October 12, 2013
Cluster down with 1/3 node down Percona XtraDB Cluster 5.x	4	1526	February 11, 2014

Lost server synchronisation in a three node cluster

Related Topics