“Galera Cluster: Same server_id on all nodes and different gtid_domain_id — is this safe?”

I have a MariaDB (Galera) cluster that has been running in production for about 5 years without any noticeable issues. The cluster consists of 5 nodes.

However, I recently noticed an unusual configuration:

  • All nodes in the cluster have the same server_id = 1

  • Each node has a different gtid_domain_id (for example: node1 = 11, node2 = 22, etc.)

  • There are external replicas replicating from this cluster

My questions are:

  1. Is it correct or safe to have the same server_id on all nodes in a Galera cluster?

  2. Is it a valid approach to assign different gtid_domain_id values per node in this setup?

  3. What potential problems or risks could arise from this configuration, especially regarding replication and failover?

  4. Could this setup lead to conflicts, data inconsistency, or issues with GTID-based replication in the future?

I would appreciate any clarification or best practices regarding this kind of configuration.

@Sergey_DSV

  1. Is it correct or safe to have the same server_id on all nodes in a Galera cluster?

It’s totally fine. PXC/Galera uses its own certification-based replication to interact and sync changes rather the the binary logs.

Are you using it for performing any non-Galera-based writes or any kind of multi-source replication on any of the Galera nodes?

Since you are using a MariaDB cluster, let me add the MariaDB manual, which clearly explains the use of server_id in Galera-based setups. I am sure this clears your doubt.

  1. Is it a valid approach to assign different gtid_domain_id values per node in this setup?

Yes, it’s fine to have a different gtid_domain_id. This prevents the node from using the same domain as Galera-based write sets when assigning GTIDs to non-Galera transactions.

On the other hand, wsrep_gtid_domain_id should be the same across all nodes within a cluster so that each node uses the same domain when assigning GTIDs for Galera Cluster-based write sets.

Reference - Using MariaDB GTIDs with MariaDB Galera Cluster | Galera Cluster | MariaDB Documentation

  1. What potential problems or risks could arise from this configuration, especially regarding
    replication and failover?

To ensure the async replica applies all change streams, it is recommended to enable [log_slave_updates/log_replica_updates] on the Galera cluster so binary logs contain all GTID sets. gtid_domain_id is already set with a different value on all 3 nodes, so the TRX origin can still be identified and differentiated.

How exactly are your async nodes connected? Are they always connected to a dedicated Galera node, or do they have some failover there to switch between galera nodes ?

  1. Could this setup lead to conflicts, data inconsistency, or issues with GTID-based replication in the future?

As mentioned, there are some edge case/requirement- Using MariaDB Replication with MariaDB Galera Cluster | Galera Cluster | MariaDB Documentation where having different server-id make sense. You can verify the same if using or not.

I don’t see any other issue.

Further, can you please clarify how exactly your external replicas are connected to the Galera cluster ? Is there any multi-source replication kind of topology also used in this ? Are you using GTID or file/position-based replication?

Please share your Galera/Async configuration file also.

There are replications only from one of the cluster nodes. No data is replicated to the cluster.

I’m resetting the configurations of two cluster nodes. The rest are the same.


[mysqld]
# Don't resolve hostnames. All hostnames are IP's or 'localhost'
skip-name-resolve

# Binary logs will be purged after expire_logs_days days
expire_logs_days = 7

# Binary log will be rotated automatically when the size exceeds this value
max_binlog_size = 100M

# The number of simultaneous clients allowed
max_connections = 3000
max_connect_errors = 1000000
back_log = 512
thread_cache_size = 100

innodb_io_capacity = 2000
innodb_io_capacity_max = 4000


query_cache_type = 0
query_cache_size = 0

thread_stack = 512K
thread_handling = pool-of-threads
thread_pool_size = 16
thread_pool_max_threads = 2000

max_allowed_packet = 512M

[mariadb]
default_time_zone = 'UTC'
sql_mode=ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
key_buffer_size=10M
event_scheduler=ON

innodb_buffer_pool_size = 60G
innodb_buffer_pool_instances = 16
innodb_log_file_size = 2G
innodb_log_files_in_group = 2
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_file_per_table = 1
innodb_open_files = 4096
innodb_read_io_threads = 4
innodb_write_io_threads = 4
innodb_thread_concurrency = 16
innodb_autoinc_lock_mode = 2

sort_buffer_size=2M
read_buffer_size=256K
read_rnd_buffer_size=256K
join_buffer_size=256K
max_heap_table_size = 64M
tmp_table_size = 64M

# Slow query
log_output=FILE
slow_query_log
slow_query_log_file=slow-queries.log
long_query_time=1

# Error log
log_error=/var/log/mysql_error.log

# Tells the slave to log the updates from the slave thread to the binary log
log_slave_updates=ON
log_bin=/var/log/mariadb/log.bin

# What form of binary logging the master will use
binlog_format=ROW

default-storage-engine=innodb
innodb_autoinc_lock_mode=2

gtid_domain_id=11
server_id=1

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
wsrep_gtid_mode=ON
wsrep_gtid_domain_id=1

# Galera Cluster Configuration
wsrep_cluster_name="ringostat_cluster"
wsrep_cluster_address="gcomm://10.0.0.1,10.0.0.2,10.0.0.3,10.0.0.4,10.0.0.5"
# Galera Synchronization Configuration
wsrep_sst_method=rsync

# Galera Node Configuration
wsrep_node_address="10.0.0.1"
wsrep_node_name="dbn1"




[mysqld]
# Don't resolve hostnames. All hostnames are IP's or 'localhost'
skip-name-resolve

# Binary logs will be purged after expire_logs_days days
expire_logs_days = 7

# Binary log will be rotated automatically when the size exceeds this value
max_binlog_size = 100M

# The number of simultaneous clients allowed
max_connections = 3000
max_connect_errors = 1000000
back_log = 512
thread_cache_size = 100

innodb_io_capacity = 2000
innodb_io_capacity_max = 4000

query_cache_type = 0
query_cache_size = 0

thread_stack = 512K
thread_handling = pool-of-threads
thread_pool_size = 16
thread_pool_max_threads = 2000

max_allowed_packet = 512M

[mariadb]
default_time_zone = 'UTC'
sql_mode=ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
key_buffer_size=10M # 2G
event_scheduler=ON

innodb_buffer_pool_size = 60G
innodb_buffer_pool_instances = 16
innodb_log_file_size = 2G
innodb_log_files_in_group = 2
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_file_per_table = 1
innodb_open_files = 4096
innodb_read_io_threads = 4
innodb_write_io_threads = 4
innodb_thread_concurrency = 16
innodb_autoinc_lock_mode = 2

sort_buffer_size=2M
read_buffer_size=256K
read_rnd_buffer_size=256K
join_buffer_size=256K
#max_heap_table_size=16M
max_heap_table_size = 64M
tmp_table_size = 64M

# Slow query
log_output=FILE
slow_query_log
slow_query_log_file=slow-queries.log
long_query_time=1

# Error log
log_error=/var/log/mysql_error.log

# Tells the slave to log the updates from the slave thread to the binary log
log_slave_updates=ON
log_bin=/var/log/mariadb/log.bin

# What form of binary logging the master will use
binlog_format=ROW

default-storage-engine=innodb
innodb_autoinc_lock_mode=2

gtid_domain_id=22
server_id=1

# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so
wsrep_gtid_mode=ON
wsrep_gtid_domain_id=1

# Galera Cluster Configuration
wsrep_cluster_name="ringostat_cluster"
wsrep_cluster_address="gcomm://10.0.0.1,10.0.0.2,10.0.0.3,10.0.0.4,10.0.0.5"
# Galera Synchronization Configuration
wsrep_sst_method=rsync

# Galera Node Configuration
wsrep_node_address="10.0.0.2"
wsrep_node_name="dbn2"

@Sergey_DSV

There are replications only from one of the cluster nodes. No data is replicated to the cluster.

This looks fine.

The other parameters also looks righly placed. Rest, I would suggest going through the official docs previously shared, just in case you have any specific use case.

log_slave_updates=ON

wsrep_gtid_domain_id=1

Just a side note: if you plan any configuration changes, always test in a lower/testing environment to better assess behaviour in advance.

I currently have version 10.4.32. I’m preparing for the update in stages. What issues might I encounter with the update? Did I choose the right update stages?

10.4 → 10.5
10.5 → 10.6
10.6 → 10.11
10.11 → 11.4
11.4 → 11.8