How to install DC-DR for galera cluster?

Hi all,
I’m having a 3 nodes galera cluster in DC. I’m planning to create a new cluster in DR site, it is supposed to handle transactions if DC goes down.
I’ve tested DC-DR architecture by using replica between two nodes, one node in DC, one node in DR. But the replication process can be failed if one of these two nodes down.

So, is there any other solution for this DC-DR architecture ?

Thank you.

Hello @hung_le,
On the DR side, you can configure replication failover so that the replica node in DR will switch to another member of the primary cluster in the event the current source member goes offline.

Hi @matthewb , I can setup failover for sources by using function asynchronous_connection_failover_add_source, but Im not sure I can do failover for replicas (because it requires Group Replication). Is it possible to do it ?

That is not correct. You setup failover for replicas using asynchronous_connection_failover_add_source. On a replica, you would execute asynchronous_connection_failover_add_source to add several source candidates. If the first one fails, the replica picks the next source and attempts to connect.

I meant I can setup failover for sources, but not sure failover for replicas, when a replica in DR goes down, other replica in DR can still replicate data from DC (https://dev.mysql.com/doc/refman/8.0/en/replication-asynchronous-connection-failover-replica.html).

https://dev.mysql.com/doc/refman/8.0/en/replication-asynchronous-connection-failover-source.html

Yes, Ive done it. But Is it possible to do failover for replicas (slaves) also ?

@hung_le,
I have provided 2 different documentation links. One is for source failover, the other link is for replica failover. Using these should accomplish what you want.

Hi @matthewb,
I setup 2 Galera clusters, I install DC first, then use Xtrabackup to generate backup from DC and restore in DR. Now DC and DR use the same source_uuid.
Replication of DC and DR uses GTID, and failover for sources.

This is the my.cnf for DC (DR also the same):

#
# my.cnf template for clustercontroller
# Copyright (C) 2011-2017 severalnines.com
#

[MYSQLD]
user=mysql
basedir=/usr/
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=/var/lib/mysql/mysql.pid
port=3306
log_error=/var/log/mysql/mysqld.log
# log_output = FILE

tmpdir=/tmp

#Slow logging    
slow_query_log_file=/var/log/mysql/mysql-slow.log
long_query_time=2
slow_query_log=OFF
log_slow_admin_statements=ON
log_queries_not_using_indexes=OFF
log_throttle_queries_not_using_indexes=1

### INNODB OPTIONS
innodb_buffer_pool_size=8006M
innodb_flush_log_at_trx_commit=2
innodb_file_per_table=1
innodb_data_file_path=ibdata1:100M:autoextend
## You may want to tune the below depending on number of cores and disk sub
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_doublewrite=1
innodb_log_file_size=512M
innodb_log_buffer_size=64M
innodb_buffer_pool_instances=8
innodb_log_files_in_group=2
innodb_thread_concurrency=0
# innodb_file_format = barracuda
innodb_flush_method = O_DIRECT
# innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode=2
## avoid statistics update when doing e.g show tables
innodb_stats_on_metadata=0
default_storage_engine=innodb

# CHARACTER SET
# collation_server = utf8_unicode_ci
# init_connect = 'SET NAMES utf8'
# character_set_server = utf8

# REPLICATION SPECIFIC
server_id=19000
binlog_format=ROW
log_bin=binlog
log_slave_updates=ON
gtid_mode=ON
enforce_gtid_consistency=ON
relay_log=relay-bin
binlog_expire_logs_seconds=604800
max_binlog_size=1073741824
log_bin_trust_function_creators=OFF

# Replication safety (from 5.6 and onwards)
relay_log_recovery=ON


# OTHER THINGS, BUFFERS ETC
tmp_table_size = 64M
max_heap_table_size = 64M
max_allowed_packet = 512M
# sort_buffer_size = 256K
# read_buffer_size = 256K
# read_rnd_buffer_size = 512K
# myisam_sort_buffer_size = 8M
skip_name_resolve
memlock=0
sysdate_is_now=1
# max_connections = 500
max_connections=500
thread_cache_size=512
table_open_cache=1024
lower_case_table_names=1
# 5.6 backwards compatibility (FIXME)
# explicit_defaults_for_timestamp = 1
### Allow partial REVOKEs
partial_revokes=1

##
## WSREP options
##

performance_schema = ON
performance_schema_max_mutex_classes = 0
performance_schema_max_mutex_instances = 0

# Full path to wsrep provider library or 'none'
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_on=ON
wsrep_node_address=192.168.101.11

# Provider specific configuration options
wsrep_provider_options="gcache.size=10240M;gmcast.segment=0;socket.ssl_cert=/etc/mysql/certs/server.crt;socket.ssl_key=/etc/mysql/certs/server.key;socket.ssl_ca=/etc/mysql/certs/server_ca.crt"

# Logical cluster name. Should be the same for all nodes.
wsrep_cluster_name="PROD-HKL"

# Group communication system handle
wsrep_cluster_address=gcomm://192.168.101.11,192.168.101.12,192.168.101.13

# Human_readable node name (non-unique). Hostname by default.
wsrep_node_name=192.168.101.11

# Address for incoming client connections. Autodetect by default.
# wsrep_node_incoming_address = 192.168.101.11

# How many threads will process writesets from other nodes
wsrep_slave_threads=4

# DBUG options for wsrep provider
#wsrep_dbug_option

# Generate fake primary keys for non-PK tables (required for multi-master
# and parallel applying operation)
wsrep_certify_nonPK=1

# Location of the directory with data files. Needed for non-mysqldump
# state snapshot transfers. Defaults to mysql_real_data_home.
#wsrep_data_home_dir=

# Maximum number of rows in write set
wsrep_max_ws_rows=131072

# Maximum size of write set
wsrep_max_ws_size=1073741824

# to enable debug level logging, set this to 1
wsrep_debug=0

# how many times to retry deadlocked autocommits
wsrep_retry_autocommit=1

# change auto_increment_increment and auto_increment_offset automatically
wsrep_auto_increment_control=1

# replicate myisam, not supported in PXC 5.7
wsrep_replicate_myisam=0

# enable "strictly synchronous" semantics for read operations
wsrep_causal_reads=0

# Command to call when node status or cluster membership changes.
# Will be passed all or some of the following options:
# --status  - new status of this node
# --uuid    - UUID of the cluster
# --primary - whether the component is primary or not ("yes"/"no")
# --members - comma-separated list of members
# --index   - index of this node in the list
#wsrep_notify_cmd=

##
## WSREP State Transfer options
##

# State Snapshot Transfer method
# ClusterControl currently DOES NOT support wsrep_sst_method=mysqldump
wsrep_sst_method=xtrabackup-v2

# Address on THIS node to receive SST at. DON'T SET IT TO DONOR ADDRESS!!!
# (SST method dependent. Defaults to the first IP of the first interface)
#wsrep_sst_receive_address=

# Desired SST donor name.
#wsrep_sst_donor=

# Protocol version to use
# wsrep_protocol_version=

# log conflicts
wsrep_log_conflicts=1

ssl_cert=/etc/mysql/certs/server.crt
ssl_key=/etc/mysql/certs/server.key
ssl_ca=/etc/mysql/certs/server_ca.crt
pxc_encrypt_cluster_traffic=ON
[sst]
encrypt=4
# ssl-key = server-key.pem
# ssl-ca = ca.pem
# ssl-cert = server-cert.pem

[MYSQL]
socket=/var/lib/mysql/mysql.sock
# default_character_set = utf8
[client]
socket=/var/lib/mysql/mysql.sock
# default_character_set = utf8

[mysqldump]
max_allowed_packet = 512M
# default_character_set = utf8

# IMPORTANT: The user/password in wsrep_sst_auth must match
# user/password in [xtrabackup]
[xtrabackup]


databases_exclude=lost+found
ssl_mode=DISABLED
no-server-version-check=1
[MYSQLD_SAFE]
# log_error = /var/log/mysqld.log
pid-file=/var/lib/mysql/mysql.pid
basedir=/usr/
# datadir = /var/lib/mysql

!include /etc/mysql/secrets-backup.cnf

The problem: DR can replicate data from DC, but sometimes, it seems that DR has some internal transactions so that the GTID increases without no transaction from client. This makes the GTID from DR sometimes higher than GTID in DC, some records committed by client to DC does not get replicated to DR, there are some logs in DR:

2024-01-29T15:11:27.206041Z 812 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.
2024-01-29T15:17:26.399664Z 812 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.
2024-01-29T15:37:26.095255Z 812 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.

This is the mysql.gtid_executed in DC:

mysql> select * from mysql.gtid_executed;
+--------------------------------------+----------------+--------------+
| source_uuid                          | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| 7534c8d2-be53-11ee-a49c-0b2d1884546e |              1 |        52751 |
| 95c805e8-be53-11ee-8593-37e163c54056 |              1 |            5 |
+--------------------------------------+----------------+--------------+

This is the mysql.gtid_executed in DR:

mysql> select * from mysql.gtid_executed;
+--------------------------------------+----------------+--------------+
| source_uuid                          | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| 7534c8d2-be53-11ee-a49c-0b2d1884546e |              1 |        52752 |
| 95c805e8-be53-11ee-8593-37e163c54056 |              1 |            5 |
+--------------------------------------+----------------+--------------+

In the binlog file of DR, I got this:

# at 1900
#240129 15:18:25 server id 20000  end_log_pos 1977 CRC32 0x250ab1f3     GTID    last_committed=8        sequence_number=9       rbr_only=no     original_committed_timestamp=1706541505598576 immediate_commit_timestamp=1706541505598576     transaction_length=173
# original_commit_timestamp=1706541505598576 (2024-01-29 15:18:25.598576 UTC)
# immediate_commit_timestamp=1706541505598576 (2024-01-29 15:18:25.598576 UTC)
/*!80001 SET @@session.original_commit_timestamp=1706541505598576*//*!*/;
/*!80014 SET @@session.original_server_version=80035*//*!*/;
/*!80014 SET @@session.immediate_server_version=80035*//*!*/;
SET @@SESSION.GTID_NEXT= '7534c8d2-be53-11ee-a49c-0b2d1884546e:52751'/*!*/;
# at 1977
#240129 15:18:25 server id 20000  end_log_pos 2073 CRC32 0x5600f4cb     Query   thread_id=17    exec_time=0     error_code=0
SET TIMESTAMP=1706541505/*!*/;
FLUSH STATUS
/*!*/;
# at 2073
#240129 15:28:25 server id 20000  end_log_pos 2150 CRC32 0xc5410ff7     GTID    last_committed=9        sequence_number=10      rbr_only=no     original_committed_timestamp=1706542105667892 immediate_commit_timestamp=1706542105667892     transaction_length=173
# original_commit_timestamp=1706542105667892 (2024-01-29 15:28:25.667892 UTC)
# immediate_commit_timestamp=1706542105667892 (2024-01-29 15:28:25.667892 UTC)
/*!80001 SET @@session.original_commit_timestamp=1706542105667892*//*!*/;
/*!80014 SET @@session.original_server_version=80035*//*!*/;
/*!80014 SET @@session.immediate_server_version=80035*//*!*/;
SET @@SESSION.GTID_NEXT= '7534c8d2-be53-11ee-a49c-0b2d1884546e:52752'/*!*/;
# at 2150
#240129 15:28:25 server id 20000  end_log_pos 2246 CRC32 0x62f56369     Query   thread_id=17    exec_time=0     error_code=0
SET TIMESTAMP=1706542105/*!*/;
FLUSH STATUS
/*!*/;
# at 2246
#240129 15:38:25 server id 20000  end_log_pos 2323 CRC32 0xdf8a516c     GTID    last_committed=10       sequence_number=11      rbr_only=no     original_committed_timestamp=1706542705109279 immediate_commit_timestamp=1706542705109279     transaction_length=173
# original_commit_timestamp=1706542705109279 (2024-01-29 15:38:25.109279 UTC)
# immediate_commit_timestamp=1706542705109279 (2024-01-29 15:38:25.109279 UTC)
/*!80001 SET @@session.original_commit_timestamp=1706542705109279*//*!*/;
/*!80014 SET @@session.original_server_version=80035*//*!*/;
/*!80014 SET @@session.immediate_server_version=80035*//*!*/;
SET @@SESSION.GTID_NEXT= '7534c8d2-be53-11ee-a49c-0b2d1884546e:52753'/*!*/;
# at 2323
#240129 15:38:25 server id 20000  end_log_pos 2419 CRC32 0x76ea0d48     Query   thread_id=17    exec_time=0     error_code=0
SET TIMESTAMP=1706542705/*!*/;
FLUSH STATUS
/*!*/;

This is not correct. server_uuid should be different between all servers.

Because I use XtraBackup from DC and restore on DR, now DC and DR have the same source_uuid (in GTID term) which is the same as cluster_uuid. Besides, DC and DR has different server_uuid (node_uuid).
This is from DC:

mysql> select * from mysql.wsrep_cluster_members;
+--------------------------------------+--------------------------------------+----------------+-----------------------+
| node_uuid                            | cluster_uuid                         | node_name      | node_incoming_address |
+--------------------------------------+--------------------------------------+----------------+-----------------------+
| 75342fb7-be53-11ee-9c73-ce509c921e71 | 7534c8d2-be53-11ee-a49c-0b2d1884546e | 192.168.101.11 | 192.168.101.11:3306   |
| 9bf32dfe-beb2-11ee-a0a2-23444b91c403 | 7534c8d2-be53-11ee-a49c-0b2d1884546e | 192.168.101.12 | 192.168.101.12:3306   |
| a32caee6-bf13-11ee-a592-eec8c667dc04 | 7534c8d2-be53-11ee-a49c-0b2d1884546e | 192.168.101.13 | 192.168.101.13:3306   |
+--------------------------------------+--------------------------------------+----------------+-----------------------+
3 rows in set (0.00 sec)

mysql> select * from mysql.gtid_executed;
+--------------------------------------+----------------+--------------+
| source_uuid                          | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| 7534c8d2-be53-11ee-a49c-0b2d1884546e |              1 |        52814 |
| 95c805e8-be53-11ee-8593-37e163c54056 |              1 |            5 |
+--------------------------------------+----------------+--------------+
2 rows in set (0.00 sec)

And this is from DR:

mysql> select * from mysql.wsrep_cluster_members;
+--------------------------------------+--------------------------------------+----------------+-----------------------+
| node_uuid                            | cluster_uuid                         | node_name      | node_incoming_address |
+--------------------------------------+--------------------------------------+----------------+-----------------------+
| a83014a2-beb2-11ee-9e53-4ff5b14eef7e | 7534c8d2-be53-11ee-a49c-0b2d1884546e | 192.168.102.12 | 192.168.102.12:3306   |
| ac92a090-bf13-11ee-bab6-03599d14b81b | 7534c8d2-be53-11ee-a49c-0b2d1884546e | 192.168.102.13 | 192.168.102.13:3306   |
| d87c4ba8-be90-11ee-952d-a6dcb09b5025 | 7534c8d2-be53-11ee-a49c-0b2d1884546e | 192.168.102.11 | 192.168.102.11:3306   |
+--------------------------------------+--------------------------------------+----------------+-----------------------+
3 rows in set (0.00 sec)

mysql> select * from mysql.gtid_executed;
+--------------------------------------+----------------+--------------+
| source_uuid                          | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| 7534c8d2-be53-11ee-a49c-0b2d1884546e |              1 |        52815 |
| 95c805e8-be53-11ee-8593-37e163c54056 |              1 |            5 |
+--------------------------------------+----------------+--------------+
2 rows in set (0.00 sec)

Hi @matthewb, I dont know why even without any transaction from client, GTID still increase automatically every 10 minutes.

I suggest you reach out to our support team for further assistance: https://percona.com/services/support

So you think my configuration is fine ? DC and DR use the same cluster_uuid is good ? Is there any way to change cluster_uuid when restore to DR from XtraBackup ?