Full SST after clean shutdown of all nodes in the cluster and bootstrapping one node

Hello,

We are running a 3 node PXC cluster on 5.6.21. After gracefully shutting down all nodes in the cluster, we bootstrapped one node with the most advanced seqno in grastate.dat file (/etc/init.d/mysql bootstrap-pxc). When starting up the rest of the nodes (service mysql start), it triggers full SST, which takes a long time. We are wondering how we can avoid full SST and if IST is possible after bootstrapping a node in the cluster. And what is the best way to restart a cluster after gracefully shutting down all nodes?

Any help is appreciated.

bootstrap-pxc means to all cluster members MUST make SST instead of IST.
By another side, looks like PXC don’t have mechanisms to detect inconsistency to activate IST when inconsistent nodes just need to read forward from bootstrap initiator when they have some data.

Since the PXC 5.6.19, you don’t need to bootstrap the cluster again after a graceful shutdown, if we based our assumptions on the documentation:

[QUOTE]
Percona XtraDB Cluster

Check the complete PERCONA XTRADB CLUSTER 5.6.19-25.6 changelog here:

[URL=“Percona XtraDB Cluster 5.6.19-25.6”]http://www.percona.com/doc/percona-x...6.19-25.6.html[/URL]

Thanks. Is the new feature described in
[url]Auto-bootstrapping an all-down cluster: Percona XtraDB Cluster

The blog mentioned gvwstate.dat will not exist on a node if it was shutdown cleanly, only if the mysqld was uncleanly terminated. This file should exist and be the same on all the nodes for the auto-recovery to work.

So it seems the auto-recovery will not work after a graceful shutdown.

In our case, we are running PXC 5.6.21 (>5.6.19). After all nodes are cleanly shutdown, serivce mysql start does not work, it hang and eventually timed out. One of the nodes has to be bootstrapped to get it restarted. Once we do that, the remaining nodes will be recovered by full SST, which takes hours for large database.

Is it by design that a cluster cannnot be cleanly restarted after all nodes are cleanly shutdown?

Thanks.

It’s good to keep track of the node to shutdown using the MySQL Error Log (tail -f it), the init script can give you a timeout, but, it’s just the init script timeout. Most of time the shutdown processes is running yet behind the scenes.

BTW, let’s organize this thread and check what’s really going on. I’ve got a Galera Cluster with three nodes running here on my side. All the version information about what I’ve been running here is that below:


+------------------------+----------------+
| variable_name | variable_value |
+------------------------+----------------+
| WSREP_PROTOCOL_VERSION | 6 |
| WSREP_PROVIDER_VERSION | 3.8(rf6147dd) |
+------------------------+----------------+
​
mysqld Ver 5.6.21-70.1-56 for Linux on x86_64 (Percona XtraDB Cluster (GPL), Release rel70.1, Revision 938, WSREP version 25.8, wsrep_25.8.r4150)

After some tests, I’d say that considering the version I’m using, it’s crystal clear that the gvwstate.dat will survive just in case of a node/cluster crash. But this is not a guarantee that the cluster could be brought back online with no bootstrapping again. After a clean shutdown, the file will not survive and the cluster must be bootstrapped, what’s a little bit weird if we recap the docs. All the cluster’s nodes has pc.recover as its default (true) and ny other configuration was added to wsrep_provider_options in my.cnf.

Not sure if I have inconsistencies among cluster’s nodes and because that I’m going to keep investigating this problem.