Hello. I’m testing XtraDB cluster 8.0.39 setup with two 3-node clusters in different regions. They’re provisioned on AlmaLinux 9 VMs. I also configured asynchronous replication channels with failover for cross-site multi-master replication. However, I could not find any specific information on the best practice of how to set this up when there are multiple nodes in a cluster, and it’s hard to formulate a specific search on Google for what I’m looking for.
For example, I have 2 XtraDB clusters, A and B. Each cluster has 3 nodes, A1, A2, A3, B1, B2, B3. I have configured the replication channels on A1 to replicate from B1, B2 and B3. Similarly, I have configured B1 to replicate from A1, A2, A3. Now this configuration appears to be local to A1 and B1 only. So in theory, if A1 and/or B1 goes down, then there may be issues in keeping the data up-to-date. Am I supposed to configure A2, A3, B2, B3 in the same exact way so that all 3 nodes for each cluster are all attempting to replicate? Would this cause issues with writes or contention?
Hello @Jimmy_Chen ,
This will probably cause conflicts, and/or duplicate data. You should have only 1 replication channel from cluster A to cluster B, and 1 channel from B to A.
PXC does not use async replication’s GTID. Meaning a commit on A1 is going to replicate as A1, and not replicate as ‘A’. This is going to cause issues when B receives this commits it locally, and then sends it back to A1. Since the GTID will not be from ‘A’, it will execute again, and cause a huge loop.
You need to configure the server_id
of A1/2/3 to be the same, and configure B1/2/3 to have the same server_id, but different from A. This way, when A1 gets a replicated transaction from “B”, it will see the server_id of the trx matches itself and it will ignore it.
Thank you for the reply @matthewb. Just so I fully understand. GTID needs to be turned off. I should configure async replication between A and B clusters, with server-id
unique per cluster but identical between the nodes? This would mean I cannot configure multiple replication channels even if it’s on the same node such as B1 → A1, B2 → A1, B3 → A1? And if that’s the case I also cannot configure async replication failover?
No. I didn’t say that. GTID should be on. I was simply pointing out that PXC does not utilize async’s GTID methodology. The way you configure source/source async replication is easier because of GTID. If you are familiar with that, I was letting you know that’s not the case with PXC.
Correct
You would never do this in the first place. As I said earlier, you should have MAX 2 channels. One channel is A1 → B1, and another channel is B1->A1. That’s it. No other channels anywhere else in the clusters.
Assuming you set up those two channels, a write on A3 would Galera-replicate to A1, write to A1’s binlog, replicate to B1, apply B1, write B1 binlog, replicate to A1. Since the trx coming back from B1 to A1 has A1’s server_id, A1 ignores it.
You can configure this, yes. But this does not mean you have additional channels.
https://dev.mysql.com/doc/refman/8.0/en/replication-asynchronous-connection-failover-replica.html
Even when configuring async replica failover, you still only have 1 channel. That channel is managed/moved by MySQL automatically.