XtraDB Cluster - Cross Site Circular Replication

Hi everyone! I have two kubernetes clusters, I would like to install Percona XtraDB Cluster on both and also I would like both instances to be synchronized and work as master (Cross-site Circular Replication).

I tried to do this (on test environment) through Helm charts (and my own custom configuration via Terraform) with the following configuration:

Cluster A:
a) (Percona Operator for MySQL installation)
b) XtraDB Cluster:

crVersion: ${operator_tag}
allowUnsafeConfigurations: true # TEST CONFIG

pxc:
  configuration: |
    [mysqld]
    pxc_strict_mode='PERMISSIVE'

  size: 1 # TEST CONFIG
  imagePullPolicy: IfNotPresent
  expose:
    enabled: true
    type: ClusterIP
  replicationChannels:
    - name: cluster_b_to_cluster_a
      isSource: false
      sourcesList:
        - host: cluster-b.${release}-pxc-db-pxc-0.${namespace}.svc.clusterset.local # SUBMARINER NETWORK
          port: 3306
          weight: 100

  resources:
    limits:
      cpu: ${xtradb_cpuLimit}
      memory: ${xtradb_memoryLimit}
    requests:
      cpu: ${xtradb_cpuRequests}
      memory: ${xtradb_memoryRequests}
  persistence:
    enabled: true
    size: ${xtradb_persistenceSize}

haproxy:
  enabled: true
  size: 1 # TEST CONFIG

proxysql:
  enabled: false

logcollector:
  enabled: false

pmm:
  enabled: false

backup:
  enabled: false

secrets:
  passwords:
    root: ${xtradb_rootpwd}
    clustercheck: ${xtradb_clustercheckpwd}
    operator: ${xtradb_operatorpwd}
    replication: ${xtradb_replicationpwd}
    proxyadmin: ${xtradb_proxyadminpwd}

Cluster B:
a) (Percona Operator for MySQL installation)
b) XtraDB Cluster

crVersion: ${operator_tag}
allowUnsafeConfigurations: true # TEST CONFIG

pxc:
  configuration: |
    [mysqld]
    pxc_strict_mode='PERMISSIVE'

  size: 1 # TEST CONFIG
  imagePullPolicy: IfNotPresent
  expose:
    enabled: true
    type: ClusterIP
  replicationChannels:
    - name: cluster_a_to_cluster_b
      isSource: false
      sourcesList:
        - host: cluster-a.${release}-pxc-db-pxc-0.${namespace}.svc.clusterset.local # SUBMARINER NETWORK
          port: 3306
          weight: 100

  resources:
    limits:
      cpu: ${xtradb_cpuLimit}
      memory: ${xtradb_memoryLimit}
    requests:
      cpu: ${xtradb_cpuRequests}
      memory: ${xtradb_memoryRequests}
  persistence:
    enabled: true
    size: ${xtradb_persistenceSize}

haproxy:
  enabled: true
  size: 1 # TEST CONFIG

proxysql:
  enabled: false

logcollector:
  enabled: false

pmm:
  enabled: false

backup:
  enabled: false

secrets:
  passwords:
    root: ${xtradb_rootpwd}
    clustercheck: ${xtradb_clustercheckpwd}
    operator: ${xtradb_operatorpwd}
    replication: ${xtradb_replicationpwd}
    proxyadmin: ${xtradb_proxyadminpwd}

In a nutshell I have defined two replication channels, each of which allows the synchronization of one cluster to another. Result: it works but I very often have synchronization errors like:

Slave I/O for channel ‘cluster_a_to_cluster_b’: Got fatal error 1236 from master when reading data from binary log: ‘Cannot replicate because the master purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new slave from backup. Consider increasing the master’s binary log expiration period. The GTID set sent by the slave is ‘2ee2eb01-fa3b-11ed-a2da-fee5830aa796:1-6, 66a6388b-fa3b-11ed-bd3d-0a580aff2091:1, 6d765e15-fa3b-11ed-93bb-83955a661146:1-6’, and the missing transactions are ‘2aa7bd91-fa3b-11ed-bb93-0a580aff3c4b:1’’, Error_code: MY-013114

I can then successfully restore synchronization through these suggestions: https://www.percona.com/blog/replication-issues-and-binlog-compressor/
In more detail, the workaround I apply is:

STOP SLAVE FOR CHANNEL 'cluster_a_to_cluster_b'; 
CHANGE MASTER TO SOURCE_CONNECTION_AUTO_FAILOVER = 0 FOR CHANNEL 'cluster_a_to_cluster_b'; 
CHANGE MASTER TO MASTER_AUTO_POSITION = 0 FOR CHANNEL 'cluster_a_to_cluster_b'; 
START SLAVE FOR CHANNEL 'cluster_a_to_cluster_b';

So my questions are:

  • is my configuration correct?
  • is there anything else I should specify in order for my solution to be a robust solution?