XtraDB Cluster - Cross Site Circular Replication

Hi everyone! I have two kubernetes clusters, I would like to install Percona XtraDB Cluster on both and also I would like both instances to be synchronized and work as master (Cross-site Circular Replication).

I tried to do this (on test environment) through Helm charts (and my own custom configuration via Terraform) with the following configuration:

Cluster A:
a) (Percona Operator for MySQL installation)
b) XtraDB Cluster:

crVersion: ${operator_tag}
allowUnsafeConfigurations: true # TEST CONFIG

pxc:
  configuration: |
    [mysqld]
    pxc_strict_mode='PERMISSIVE'

  size: 1 # TEST CONFIG
  imagePullPolicy: IfNotPresent
  expose:
    enabled: true
    type: ClusterIP
  replicationChannels:
    - name: cluster_b_to_cluster_a
      isSource: false
      sourcesList:
        - host: cluster-b.${release}-pxc-db-pxc-0.${namespace}.svc.clusterset.local # SUBMARINER NETWORK
          port: 3306
          weight: 100

  resources:
    limits:
      cpu: ${xtradb_cpuLimit}
      memory: ${xtradb_memoryLimit}
    requests:
      cpu: ${xtradb_cpuRequests}
      memory: ${xtradb_memoryRequests}
  persistence:
    enabled: true
    size: ${xtradb_persistenceSize}

haproxy:
  enabled: true
  size: 1 # TEST CONFIG

proxysql:
  enabled: false

logcollector:
  enabled: false

pmm:
  enabled: false

backup:
  enabled: false

secrets:
  passwords:
    root: ${xtradb_rootpwd}
    clustercheck: ${xtradb_clustercheckpwd}
    operator: ${xtradb_operatorpwd}
    replication: ${xtradb_replicationpwd}
    proxyadmin: ${xtradb_proxyadminpwd}

Cluster B:
a) (Percona Operator for MySQL installation)
b) XtraDB Cluster

crVersion: ${operator_tag}
allowUnsafeConfigurations: true # TEST CONFIG

pxc:
  configuration: |
    [mysqld]
    pxc_strict_mode='PERMISSIVE'

  size: 1 # TEST CONFIG
  imagePullPolicy: IfNotPresent
  expose:
    enabled: true
    type: ClusterIP
  replicationChannels:
    - name: cluster_a_to_cluster_b
      isSource: false
      sourcesList:
        - host: cluster-a.${release}-pxc-db-pxc-0.${namespace}.svc.clusterset.local # SUBMARINER NETWORK
          port: 3306
          weight: 100

  resources:
    limits:
      cpu: ${xtradb_cpuLimit}
      memory: ${xtradb_memoryLimit}
    requests:
      cpu: ${xtradb_cpuRequests}
      memory: ${xtradb_memoryRequests}
  persistence:
    enabled: true
    size: ${xtradb_persistenceSize}

haproxy:
  enabled: true
  size: 1 # TEST CONFIG

proxysql:
  enabled: false

logcollector:
  enabled: false

pmm:
  enabled: false

backup:
  enabled: false

secrets:
  passwords:
    root: ${xtradb_rootpwd}
    clustercheck: ${xtradb_clustercheckpwd}
    operator: ${xtradb_operatorpwd}
    replication: ${xtradb_replicationpwd}
    proxyadmin: ${xtradb_proxyadminpwd}

In a nutshell I have defined two replication channels, each of which allows the synchronization of one cluster to another. Result: it works but I very often have synchronization errors like:

Slave I/O for channel ‘cluster_a_to_cluster_b’: Got fatal error 1236 from master when reading data from binary log: ‘Cannot replicate because the master purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new slave from backup. Consider increasing the master’s binary log expiration period. The GTID set sent by the slave is ‘2ee2eb01-fa3b-11ed-a2da-fee5830aa796:1-6, 66a6388b-fa3b-11ed-bd3d-0a580aff2091:1, 6d765e15-fa3b-11ed-93bb-83955a661146:1-6’, and the missing transactions are ‘2aa7bd91-fa3b-11ed-bb93-0a580aff3c4b:1’’, Error_code: MY-013114

I can then successfully restore synchronization through these suggestions: https://www.percona.com/blog/replication-issues-and-binlog-compressor/
In more detail, the workaround I apply is:

STOP SLAVE FOR CHANNEL 'cluster_a_to_cluster_b'; 
CHANGE MASTER TO SOURCE_CONNECTION_AUTO_FAILOVER = 0 FOR CHANNEL 'cluster_a_to_cluster_b'; 
CHANGE MASTER TO MASTER_AUTO_POSITION = 0 FOR CHANNEL 'cluster_a_to_cluster_b'; 
START SLAVE FOR CHANNEL 'cluster_a_to_cluster_b';

So my questions are:

  • is my configuration correct?
  • is there anything else I should specify in order for my solution to be a robust solution?

Interesting post.
Was hoping to have some follow-ups. There’s not much around this topic, hoping someone can help with a follow-up.

I do not believe this is an officially supported configuration.

However… I would note that I would expect to see replicationChannels on each node where isSource was set to true. I believe doing that is what kicks the operator into establishing unique replication channels which yield unique GTIDs and such.

Your isSource false replication channels look fine, but I’d test to see what happens if you add a second replicationChannel to each setup with isSource set to true.