Async replication breaking after another a node joins the cluster

acampoh · August 10, 2021, 8:04am

Hello,

I’m struggling trying to set up async replication between 2 xtradb 5.7.31 clusters. Each cluster has 3 nodes.

If i only start one node of the secondary cluster and start re plication, it works fine, but as soon as i start another node and it joins after performing an SST, the async replication breaks due to a duplicated key.

At first i thought it was because when it performs an SST, it also brings the slave information and it starts the slave in the new node, but i put on my.cnf the skip-slave-start option, but keeps failing.

I have tried also to start the cluster in RO just to be sure nobody else writes, but i still have the same problem.

Last thing i have tried is setup the IGNORE_SERVER_IDS on change master query, but still keeps failing.

the change master query is this one

CHANGE MASTER TO
MASTER_HOST = “MY_SERVER_IP”,
MASTER_PORT = 3306,
MASTER_USER = “replic”,
MASTER_PASSWORD = “what_a_passwd”,
MASTER_AUTO_POSITION = 1, # to auto-position using GTID on slave
IGNORE_SERVER_IDS = (1,2,3);

for the records, both clusters have GTID enabled

i guess i’m missing something, but i cannot see what. Any ideas / suggestions are more than welcomed,

thanks in advance,
Adrián

Michael_Coburn · August 10, 2021, 3:52pm

Hi @acampoh , thanks for posting to the Forums

I have used @yves.trudeau wonderful Replication Manager between two PXC 5.7 clusters

In fact there is a section on ensuring SSTs are done as well as steps to ensure there is only one GTID sequence in the cluster.

Best of luck,

matthewb · August 10, 2021, 4:15pm

Just to understand, when the 2nd node joins the secondary cluster, ‘SHOW SLAVE STATUS’ actually reports information?

acampoh · August 10, 2021, 4:28pm

Yes, it reports all the info about the replication on that node

matthewb · August 10, 2021, 8:34pm

Ok. Most likely because the SST process is using xtrabackup which clones the entire $DATADIR which would include the master.info file. It is surprising that skip-slave-start is not having any affect. That would seem to be a bug in base MySQL. Can you try setting server_id=0 in the joining node along with s-s-s? 0 means “cannot be slave nor master”.

acampoh · August 12, 2021, 3:42am

Hello,

Just found the issue. It wasn’t fully related with replication, it was more with the startup process of the 2nd and 3rd nodes of the cluster.

My mistake was to think that having the first node as RO, the other nodes when they do sst and join were going to inherit the RO variable too, but they didn’t. As we have sentry running there, it was writting and creating a duplicated key error everytime one of these nodes started.

I fixed it once setting the read_only option in my.cnf.

thanks a lot!

Topic		Replies	Views
Master-master asynchronous replication issue between two 5.6.24 PXC clusters Percona XtraDB Cluster 5.x	3	2073	August 7, 2024
Replication between PXC clusters is broken with duplicate key Percona XtraDB Cluster 8.x percona	12	113	January 20, 2025
Slave replication failure after full SST on master Percona XtraDB Cluster 5.x	6	2097	September 21, 2023
Gtid based async replication from Primary to DR xtradb cluster breaks replication on DR nodes Percona XtraDB Cluster 8.x percona	7	1192	October 19, 2022
SST and async slaves Percona XtraDB Cluster 5.x	1	592	October 2, 2014

Async replication breaking after another a node joins the cluster

Related topics