Not the answer you need?
Register and ask your own question!

Three nodes cluster shutdowning randomly

Arelag__Arelag__ EntrantCurrent User Role Beginner
Hello,

We deployed a three nodes cluster using PXC last year for one of our client and it is working nicely but we got some random crash affecting the entire cluster. It happened again yesterday and the complete cluster shut down without warning.
MySQL logs on the first node are in attachment (galera_logs_1.txt).


It seems there is some communication problems between nodes as mentioned on the first line : turning message relay requesting on, nonlive peers.
I'm not sure what the root cause can be :can this be network related or load related (load average or SQL traffic, numbers of requests) ? Is there some parameters to adjust ?

After that, I tried bootstrapping the cluster but got another shutdown I don't understand : I bootstrapped the first node, restarted the second which initiated a SST.
After the second node was up and running (WSREP state Synced), I restarted the third node and the two other nodes stopped immediatly.
I put the messages from error log in attachment (galera_logs_2.txt).


It's not the first time I have to reset a PXC cluster like that but I don't understand why the last node created this situation.
Am I missing something ?


For information, we are using Debian with the following packages :
ii  percona-xtradb-cluster-56         5.6.29-25.15-1.wheezy             amd64        Percona XtraDB Cluster with Galera
ii  percona-xtradb-cluster-client-5.6 5.6.29-25.15-1.wheezy             amd64        Percona XtraDB Cluster database client binaries
ii  percona-xtradb-cluster-common-5.6 5.6.29-25.15-1.wheezy             amd64        Percona XtraDB Cluster database common files (e.g. /etc/mysql/my.cnf)
ii  percona-xtradb-cluster-galera-3   3.14-1.wheezy                     amd64        Metapackage for latest version of galera3.
ii  percona-xtradb-cluster-galera-3.x 3.14-1.wheezy                     amd64        Galera components of Percona XtraDB Cluster
ii  percona-xtradb-cluster-server-5.6 5.6.29-25.15-1.wheezy             amd64        Percona XtraDB Cluster database server binaries

I guess an upgrade of those versions is a must have here.

Configuration File :
[mysqld]

# Cluster configuration
wsrep_provider                  = /usr/lib/libgalera_smm.so
wsrep_forced_binlog_format      = ROW
wsrep_cluster_address           = gcomm://10.16.0.92,10.16.0.93,10.16.0.94
wsrep_slave_threads             = 64
wsrep_sst_method                = xtrabackup-v2
wsrep_sst_auth                  = XXXX:XXXX
wsrep_cluster_name              = galera
wsrep_node_name                 = client
wsrep_node_address              = 10.16.0.92
wsrep_causal_reads              = OFF
wsrep_provider_options          = "gcache.size = 50G; gcs.fc_limit = 64"

wsrep_retry_autocommit          = 1
wsrep_debug                     = 0

Thanks for any information about that case.

Comments

  • jriverajrivera Percona Support Engineer Percona Staff Role
    2017-06-08 10:32:07 64450 [ERROR] WSREP: Certification failed for TO isolated action: source: 70435979-4b6e-11e7-86e1-ba8a94aea198 version: 3 local: 1 state: CERTIFYI35, d: -1, ts: 9335174392156268)

    Something is making your nodes inconsistent but it's not clear from your error logs. you can set wsrep_debug=1 on both nodes and maybe we can see more information.
  • Arelag__Arelag__ Entrant Current User Role Beginner
    Thank you for your answer, I did see that line but as it appears later in the message, I wasn't sure it was the cause or not.
    I'll try to enable wsrep_debug but is the effect on performance important ? It is a production cluster with quite a lot of traffic so I have to be careful.

    Besides that, do you think an upgrade of PXC and galera packages could help ?
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.