Replication between PXC clusters is broken with duplicate key

anjaneyarajus · January 16, 2025, 6:25am

Hi

we have setup async CR replication between two PXC clusters,

server-id is different on each node of the cluster
gtid_mode and enfore gtid consistency is on

writes are only on one cluster and other cluster is passive.
replication between passive cluster to active cluster was setup yesterday with auto position=1. After sometime replication was broken from passive cluster to active cluster with duplicate key error,
below are the messages
2025-01-16T07:36:17.656087+05:30 1981768 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.
2025-01-16T07:36:17.824175+05:30 1981768 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.
2025-01-16T07:36:17.979463+05:30 1981768 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.
2025-01-16T07:36:18.039976+05:30 1981768 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.

2025-01-16T07:36:18.569761+05:30 1981768 [ERROR] [MY-010584] [Repl] Replica SQL for channel ‘’: Worker 1 failed executing transaction ‘xxxxxxxxxxxxx:778962’ at source log mysql-bin.xxxxxx11, end_log_pos 387794877; Could not execute Write_rows event on table dictat0r_shard_16.table_name; Duplicate entry ‘1623621’ for key ‘table_name.PRIMARY’, Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event’s source log mysql-bin.xxxxxx11, end_log_pos 387794877, Error_code: MY-001062

do we have set same server-id on all nodes of a cluster and ignore server-ids ?

as far as i know, for gtid based replication this might not be required?
please suggest if we have to enable any other parameters?

kedarpercona · January 16, 2025, 6:31am

Hi @anjaneyarajus,

do we have set same server-id on all nodes of a cluster and ignore server-ids ?

No.
You might want to investigate the reason for your duplicate key error…

anjaneyarajus · January 16, 2025, 7:47am

Hi Kedar,

Initially, the replication is between pxc in DC1 (Active cluster) to pxc in DC2 (Passive cluster). Since we want CR, have enabled replication from DC2 (Passive cluster) to DC1 (Active cluster) as well.
for all the nodes in DC1 and DC2 has unique server-ids.
there is only one gtid set and we ran change master with autoposition on node1 of DC1.

DC1 node1 replicates from DC2 Node3 and DC2 Node1 replicates from DC1 Node3

writes are always from DC1 to DC2.
There were no writes on DC2, and replication from DC2 to DC1 stiill broken with duplicate key error which shouldn’t happen.
we were continously seeing below messages in error log, before the duplicate error.
2025-01-16T07:36:17.656087+05:30 1981768 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.

is there any know bug in PXC 8.0.34, where we have to set same server-ids to ensure there is no loop in applying transactions or ignore sever-ds

kedarpercona · January 16, 2025, 9:33am

aah!!! multi master replica in PXC… there is this bug: Percona cluster with async gtid replication

anjaneyarajus · January 16, 2025, 11:30am

Hi Kedar,

but th affected version is 5.7.28-31.41. we are using 8.0.34
still that bug exists in 8.x versions as well?
we also like to know whether ignoring server-ids will fix the issue?

matthewb · January 16, 2025, 2:13pm

What is ‘CR’? I’ve never seen this term used. Is that circular replication? That is very much not recommended for PXC. Marcelo describes the issue in the JIRA above. In summary, commit on PXC1 generates GTID1, which replicates to PXC2, is applied on PXC2 and generates a new GTID2, which replicates back to PXC1, which is applied (since it is a new/different GTID), and causes the duplicate key issue.

PXC’s internal GTID and MySQL’s async GTID are not the same, and are not shared. Any attempt to create source-source replication between 2 PXCs will not work because of this.

You could set ignore-server ids on PXC1 nodes to the ids of all 3 PXC2 nodes, but this is a hack/workaround and not tested/guaranteed to do what you want.

anjaneyarajus · January 16, 2025, 4:18pm

Hi Matthew,

CR is circular replication. we have ignore server_ids of PXC1, which stopped below warning messages
[Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.

Could you please explain why circular replication is not advised in PXC when it is perfectly acceptable in MariaDB galera clusters?

In the event of DR, circular replication between galera clusters is very beneficial.

matthewb · January 16, 2025, 5:00pm

Since only two sources are involved, this isn’t considered “circular”. Circular would require 3 servers (minimum) replicating in a circle (A->B->C->A). When only two sources are concerned, this is known as Source-Source replication (previously Master/Master).

According to MariaDB’s docs, they indicate the workaround/hack of setting the same server_id on each MySQL server within a cluster. This violates the deffiniton of server_id as described in the MySQL docs that each MySQL server must have a unique server_id when part of any replication topology.

If you are ok with the workaround, then S/S replication is acceptable in PXC as well.

anjaneyarajus · January 17, 2025, 4:36am

Hi Matthew,

Replication between MariaDB Galera clusters differs from PXC in that it employs mariadb gtid rather than cluster uuid.

What is recommended way for setting up master-master replication between two PXC clusters,

Which is better: setting up a unique server ID for each cluster node and ignoring that server ID, or maintaining a unique server ID among cluster nodes and ignoring all three?
Given that only the PXC cluster actively accepts writes and the other cluster are passive, is it advised to enable super_read_only to prevent all writes, including those from super users?
Do any known problems arise in PXC if we enable super_read_only on every cluster node?

matthewb · January 17, 2025, 8:46pm

Yes, this is a feature added by MariaDB.

I would configure all nodes in PXC1 to have the same server_id (Ex: 44), and all nodes in PXC2 to have the same server_id, but different from PXC1 (Ex: 55). This way, any write to any node in PXC1 will be binlog-tagged with server_id 44. PXC2 will replicate it, and rewrite to its own binlog using the same server_id 44. PXC1 receives this, but ignores it because it matches its own server_id

On the passive PXC2 cluster, absolutely recommended.

Only replication is allowed to write when super_read_only is enabled. There should be no issues.

anjaneyarajus · January 18, 2025, 4:28am

Hi Matthew,

Thank you for the quick response.

When PXC nodes are running and writes are occurring, is it possible to dynamically change the server-id?
Does changing the server-id dynamically on a cluster that accepts writes cause any known problems or bugs?

matthewb · January 18, 2025, 5:13am

You can change server_id at any time, though personally, I like to restart just to be safe. Since you have PXC, you should be able to perform a rolling restart without any downtime of the application.

anjaneyarajus · January 20, 2025, 6:21am

Hi Matthew,

Thank you for your assistance.

Topic		Replies	Views
Circular replication between mysql server and xtradb cluster Percona XtraDB Cluster 5.x	6	2236	May 27, 2014
Master-Master Replication between PXCs Percona XtraDB Cluster 5.x	9	3243	February 18, 2020
Async replication breaking after another a node joins the cluster Percona XtraDB Cluster 5.x	5	712	August 12, 2021
Master-master asynchronous replication issue between two 5.6.24 PXC clusters Percona XtraDB Cluster 5.x	3	2068	August 7, 2024
PXC didn't skip the executed gtid Percona XtraDB Cluster 5.x	3	705	July 31, 2019

Replication between PXC clusters is broken with duplicate key

Related topics