XtraDB Cluster -non-locking option to rebuild 2 nodes w/out taking Production down

Dear Community,
We have a fairly large cluster that went down due to the hardware problems, and we could recover only one node that is currently running Production Apps.
Few attempts to start the two other nodes resulted in full SST , (using Xtrabackup) and at the end of that process - XtraBackup just hung , and connections on the master node kept creeping up . I assume that is due to the app clients trying to write to the master node, and some of the statement included DDL modifications that created a lock.
We need an advise on the least painful method of rebuilding the two remaining nodes that does not lock the master node.

Appreciate your time,
Elena K.

Hi Elena,

we are using following steps to rebuild the nodes when SST is not working/or failing and we cannot shutdown the master node:

  1. start second node (do not start PCX service on it), then stream hot backup from first node to second node using innobackupex tool
  2. on second node empty all contents of PCX datadir and restore the innobackupex backup to PXC datadir
  3. start mysqld with --wsrep-recover option to get correct seqno.
  4. in PXC datadir create grastate.dat file (copy the contents and format from the first node) and write the seqno. into it
  5. now start PXC service…it should join the cluster and do IST only from the first node :slight_smile:

Regards

Pavel