Replication between PXC clusters is broken with duplicate key

Hi

we have setup async CR replication between two PXC clusters,

server-id is different on each node of the cluster
gtid_mode and enfore gtid consistency is on

writes are only on one cluster and other cluster is passive.
replication between passive cluster to active cluster was setup yesterday with auto position=1. After sometime replication was broken from passive cluster to active cluster with duplicate key error,
below are the messages
2025-01-16T07:36:17.656087+05:30 1981768 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.
2025-01-16T07:36:17.824175+05:30 1981768 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.
2025-01-16T07:36:17.979463+05:30 1981768 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.
2025-01-16T07:36:18.039976+05:30 1981768 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.

2025-01-16T07:36:18.569761+05:30 1981768 [ERROR] [MY-010584] [Repl] Replica SQL for channel ‘’: Worker 1 failed executing transaction ‘xxxxxxxxxxxxx:778962’ at source log mysql-bin.xxxxxx11, end_log_pos 387794877; Could not execute Write_rows event on table dictat0r_shard_16.table_name; Duplicate entry ‘1623621’ for key ‘table_name.PRIMARY’, Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event’s source log mysql-bin.xxxxxx11, end_log_pos 387794877, Error_code: MY-001062

do we have set same server-id on all nodes of a cluster and ignore server-ids ?

as far as i know, for gtid based replication this might not be required?
please suggest if we have to enable any other parameters?

Hi @anjaneyarajus,

do we have set same server-id on all nodes of a cluster and ignore server-ids ?

No.
You might want to investigate the reason for your duplicate key error…

Hi Kedar,

Initially, the replication is between pxc in DC1 (Active cluster) to pxc in DC2 (Passive cluster). Since we want CR, have enabled replication from DC2 (Passive cluster) to DC1 (Active cluster) as well.
for all the nodes in DC1 and DC2 has unique server-ids.
there is only one gtid set and we ran change master with autoposition on node1 of DC1.

DC1 node1 replicates from DC2 Node3 and DC2 Node1 replicates from DC1 Node3

writes are always from DC1 to DC2.
There were no writes on DC2, and replication from DC2 to DC1 stiill broken with duplicate key error which shouldn’t happen.
we were continously seeing below messages in error log, before the duplicate error.
2025-01-16T07:36:17.656087+05:30 1981768 [Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.

is there any know bug in PXC 8.0.34, where we have to set same server-ids to ensure there is no loop in applying transactions or ignore sever-ds

aah!!! multi master replica in PXC… there is this bug: Percona cluster with async gtid replication

Hi Kedar,

but th affected version is 5.7.28-31.41. we are using 8.0.34
still that bug exists in 8.x versions as well?
we also like to know whether ignoring server-ids will fix the issue?

What is ‘CR’? I’ve never seen this term used. Is that circular replication? That is very much not recommended for PXC. Marcelo describes the issue in the JIRA above. In summary, commit on PXC1 generates GTID1, which replicates to PXC2, is applied on PXC2 and generates a new GTID2, which replicates back to PXC1, which is applied (since it is a new/different GTID), and causes the duplicate key issue.

PXC’s internal GTID and MySQL’s async GTID are not the same, and are not shared. Any attempt to create source-source replication between 2 PXCs will not work because of this.

You could set ignore-server ids on PXC1 nodes to the ids of all 3 PXC2 nodes, but this is a hack/workaround and not tested/guaranteed to do what you want.

Hi Matthew,

CR is circular replication. we have ignore server_ids of PXC1, which stopped below warning messages
[Warning] [MY-000000] [WSREP] Pending to replicate MySQL GTID event (probably a stale event). Discarding it now.

Could you please explain why circular replication is not advised in PXC when it is perfectly acceptable in MariaDB galera clusters?

In the event of DR, circular replication between galera clusters is very beneficial.

Since only two sources are involved, this isn’t considered “circular”. Circular would require 3 servers (minimum) replicating in a circle (A->B->C->A). When only two sources are concerned, this is known as Source-Source replication (previously Master/Master).

According to MariaDB’s docs, they indicate the workaround/hack of setting the same server_id on each MySQL server within a cluster. This violates the deffiniton of server_id as described in the MySQL docs that each MySQL server must have a unique server_id when part of any replication topology.

If you are ok with the workaround, then S/S replication is acceptable in PXC as well.

Hi Matthew,

Replication between MariaDB Galera clusters differs from PXC in that it employs mariadb gtid rather than cluster uuid.

What is recommended way for setting up master-master replication between two PXC clusters,

  1. Which is better: setting up a unique server ID for each cluster node and ignoring that server ID, or maintaining a unique server ID among cluster nodes and ignoring all three?
  2. Given that only the PXC cluster actively accepts writes and the other cluster are passive, is it advised to enable super_read_only to prevent all writes, including those from super users?
  3. Do any known problems arise in PXC if we enable super_read_only on every cluster node?

Yes, this is a feature added by MariaDB.

I would configure all nodes in PXC1 to have the same server_id (Ex: 44), and all nodes in PXC2 to have the same server_id, but different from PXC1 (Ex: 55). This way, any write to any node in PXC1 will be binlog-tagged with server_id 44. PXC2 will replicate it, and rewrite to its own binlog using the same server_id 44. PXC1 receives this, but ignores it because it matches its own server_id

On the passive PXC2 cluster, absolutely recommended.

Only replication is allowed to write when super_read_only is enabled. There should be no issues.

Hi Matthew,

Thank you for the quick response.

When PXC nodes are running and writes are occurring, is it possible to dynamically change the server-id?
Does changing the server-id dynamically on a cluster that accepts writes cause any known problems or bugs?

You can change server_id at any time, though personally, I like to restart just to be safe. Since you have PXC, you should be able to perform a rolling restart without any downtime of the application.

Hi Matthew,

Thank you for your assistance.