Percona Cluster with master-master between clusters

Hello,

I am in the process on configuring master-master replication between 2 Percona Clusters with the ability to add a 3rd cluster into this mix at a future date.

The Clusters are in different data centres so I do not want to simply create one large cluster.

I believe I have this running correctly on a development cluster but would like some input as to whether I have done this correctly and if there are any things I might have missed that may come back to haunt me in the future.

Running Percona XtraDB Cluster 5.7.24-26 over CentOS 7 on all nodes.

I have two 3 Node systems at present - Cluster A and Cluster B and I have allowed for an option of adding Cluster C in the future (things like auto increment settings allow for 9 nodes in total).

Same mysql.cnf on all nodes with only node address specific changes as follows:


# These values are different for each node in each cluster
# Example here is Cluster A Node 1
log-bin = DB-A-01.bin
wsrep_node_address = [IP address of Cluster A node 1]
relay_log = DB-A-01-relay-log
auto_increment_offset = 1

# These settings are common for each node in the same cluster
# The server-id is 1001 on each node in Cluster A and 1002 on each node in Cluster B
server-id = 1001
expire_logs_days = 14
datadir = /var/lib/mysql
user = mysql
log_error = error.log
log_slave_updates = 1
gtid_mode = ON
enforce_gtid_consistency = ON
master_info_repository = TABLE
relay_log_info_repository = TABLE

default_storage_engine = InnoDB
binlog_format = ROW
innodb_flush_method = O_DIRECT
innodb_file_per_table = 1
innodb_autoinc_lock_mode = 2

# Prevent creation of MyISAM tables as this engine is not supported
disabled_storage_engines = "MyISAM"

# IP address of each Node in same Cluster - so always has 3 IPs
wsrep_cluster_address = gcomm://x.y.z.A,x.y.z.B,x.y.z.C
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_log_conflicts = 1
wsrep_slave_threads = 8
# Cluster name different in each Cluster
wsrep_cluster_name = DB-A

# Turn this off as we will manually control auto_increment_increment and auto_increment_offset settings
wsrep_auto_increment_control = OFF
# Allow for up to 3 x clusters of 3 x nodes
auto_increment_increment = 9

# SST method
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = "sstuser:sstuserpassword"

# A bunch of other settings for max user connections, FTS indexes, and other settings
# Also a bunch of settings for PMM

Cluster A has been running for some time (both a development/test cluster and a seperate production cluster) so data on those clusters has to remain.

Therefore to configure and setup replication to the second cluster I did the following steps on the development cluster:

  • Made sure gtid_executed and gtid_purged were same on all Nodes in Cluster A
  • Got the second cluster installed and running stand alone to confirm I had it configured correctly
  • Used XtraBackup to do a hot backup of Cluster A Node 1
  • Restored this backup to Cluster B Node 1
  • Bootstrapped Cluster B Node 1 and got Nodes 2 & 3 running
  • Restarted Cluster B Node 1 so no longer running as Bootstrap
  • Started Slave on Cluster B Node 1 pointing to Master as Cluster A Node 1 and verified working
  • Started Slave on Cluster A Node 1 pointing to Master as Cluster B Node 1 and verified working

The method I used to get gtid_executed the same on all Nodes of Cluster A was to pick Node 1 as the “master” and then take Nodes 2 and 3 down in turn and perform a full SST. This wasn’t in sync because the cluster was originally a slave off a 5.6 server which in turn was a slave of an old 5.5 master-master setup and used old log file position replication (same for both development and production - both went from master-master 5.5 → via stand alone 5.6 → Cluster 5.7). I also waited a couple of weeks after migrating everything to the cluster and then turning GTID on before setting up the additional cluster so that logs purged all non-GTID transactions.

Getting gtid_purged set the same required running through the following process on each Node at the same time:


SHOW MASTER LOGS;
# Note down highest log filename
# Run the following at the same time on all 3 nodes
FLUSH LOGS;
PURGE MASTER LOGS TO 'highest log filename + 1';

So all the above appears to have worked - I can write to any node on any cluster and data is replicated correctly. My manual auto increment settings are all working as expected.

So is the above is actually a correct method to set this up? Have I missed anything?

All the examples I’ve seen of doing this online either start with a completely blank cluster setup or use mysqldump to backup the first cluster. I can’t start with a blank setup and my attempts to use mysqldump ran into issues with setting gtid_purged, issues with MyISAM tables when restoring the MySQL DB data - basically I couldn’t get mysqldump to work. So is using XtraBackup OK for this?

Also I’m curious about gtid_purged - does this need to be the same across all the cluster nodes? Again in the various examples of doing master-master between clusters that I’ve found online there is mention that this needs to be in sync on all Nodes in the fist cluster but no explanation as to why.

But getting gtid_purged in sync might be quite hard on my production server as it is getting writes all the time - was pretty easy on the development cluster because I can just pick a time were no devs/testers are doing stuff and run the commands pretty much at the same time.

Note I don’t mean gtid_executed - that being the same and in sync I understand.

Thanks.