Cross DC/region replication of percona mongodb

Description:

I have deployed 3 replicas of mongodb operator in a specific namespace and I have deployed percona mongodb replicaset of size 3.
My requirement is that, I have different clusters in different regions lets assume chennai and hyderabad. I want to deploy only slaves in hyderabad and master+slaves in chennai. This way I have have cross DC replication.

How can I achieve this.
I used helm chart to deploy the opertor and database.

Steps to Reproduce:

[Step-by-step instructions on how to reproduce the issue, including any specific settings or configurations]

Version:

1.15.0

Logs:

Here is my values file for psmdb-operator.

# Default values for psmdb-operator.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 3

image:
  repository: personal-repo/percona-mongo/percona-server-mongodb-operator
  tag: 1.15.0
  pullPolicy: IfNotPresent

# disableTelemetry: according to
# https://docs.percona.com/percona-operator-for-mongodb/telemetry.html
# this is how you can disable telemetry collection
# default is false which means telemetry will be collected
disableTelemetry: false

# set if you want to specify a namespace to watch
# defaults to `.Release.namespace` if left blank
# watchNamespace:

# set if operator should be deployed in cluster wide mode. defaults to false
# watchAllNamespaces: false
watchAllNamespaces: true

# rbac: settings for deployer RBAC creation
rbac:
  # rbac.create: if false RBAC resources should be in place
  create: true

# serviceAccount: settings for Service Accounts used by the deployer
serviceAccount:
  # serviceAccount.create: Whether to create the Service Accounts or not
  create: true

podAnnotations: {}
  # prometheus.io/scrape: "true"
  # prometheus.io/port: "8080"

podSecurityContext: {}
  # runAsNonRoot: true
  # runAsUser: 2
  # runAsGroup: 2
  # fsGroup: 2
  # fsGroupChangePolicy: "OnRootMismatch"

securityContext: {}
  # allowPrivilegeEscalation: false
  # capabilities:
  #   drop:
  #   - ALL
  # seccompProfile:
  #   type: RuntimeDefault

# set if you want to use a different operator name
# defaults to `percona-server-mongodb-operator`
# operatorName:

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

env:
  resyncPeriod: 5s

resources: {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi

nodeSelector: {}

tolerations: []

# affinity: {}
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: kubernetes.io/hostname


# logStructured: false
# logLevel: "INFO"

logStructured: true
logLevel: "INFO"

Here is my values file for percona-mongodb.

finalizers:
  - delete-psmdb-pods-in-order
nameOverride: ""
fullnameOverride: ""
crVersion: 1.15.0
pause: false
unmanaged: false
allowUnsafeConfigurations: false
multiCluster:
  enabled: false
updateStrategy: SmartUpdate
upgradeOptions:
  versionServiceEndpoint: https://check.percona.com
  apply: disabled
  schedule: "0 2 * * *"
  setFCV: false
image:
  repository: personal-repo/percona-mongo/percona-server-mongodb
  tag: 6.0.9-7
imagePullPolicy: IfNotPresent
secrets: {}
pmm:
  enabled: false
  image:
    repository: percona/pmm-client
    tag: 2.39.0
  serverHost: monitoring-service
replsets:
  - name: rs0
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      exposeType: ClusterIP
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      pvc:
        storageClassName: hdd-jbod-lvm-ext4-0
        resources:
          requests:
            storage: 3Gi
    nonvoting:
      enabled: false
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
        pvc:
          storageClassName: hdd-jbod-lvm-ext4-0
          resources:
            requests:
              storage: 3Gi
    arbiter:
      enabled: false
      size: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
sharding:
  enabled: false
  balancer:
    enabled: false
  configrs:
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      exposeType: ClusterIP
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      pvc:
        storageClassName: hdd-jbod-lvm-ext4-0
        resources:
          requests:
            storage: 3Gi
  mongos:
    size: 2
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    expose:
      exposeType: ClusterIP

backup:
  enabled: false
  image:
    repository: personal-repo/percona-mongo/percona-backup-mongodb
    tag: 2.3.0
  serviceAccountName: percona-server-mongodb-operator
  storages:
  pitr:
    enabled: false
    oplogOnly: false
  tasks:

Expected Result:

I want a cluster of 5 mongodb replicas of which 1 master and 2 slaves are in chennai region and remaining 2 slaves are in hyderabad region.

Actual Result:

[What actually happened when the user encountered the issue]

Additional Information:

[Include any additional information that could be helpful to diagnose the issue, such as browser or device information]

Hi Team,

Is there a way I can achieve cross-region replication using a single operator?

This is the document that I was referring to.

@Shubham_Harilal_Saro hey.

  • If you have 2 kubernetes clusters in different regions - you need 2 operators for cross-region replication. One per each k8s cluster.
  • If you have 1 kubernetes cluster that is somehow spread across these regions - you need just one operator, and you don’t need to configure cross-region (through ExternalNodes), you can just have more replicas and set affinity.
  • If you have 2 kubernetes clusters in different regions, but they are somehow federated - you might not need a second operator.

Hope this helps.

Thanks for your response.
We have 2 separate clusters. Will follow the main site and replica site setup.

Hi @Sergey_Pronin,

I am facing an issue following the document for “Set up Percona Server for MongoDB cross-site replication”.
The replica site pods keeps restarting with below error:

  Normal   Killing    29m (x2 over 32m)   kubelet            Container mongod failed liveness probe, will be restarted
  Normal   Pulled     29m (x3 over 35m)   kubelet            Container image "xyz.myrepo.com/percona-mongo/percona-server-mongodb:6.0.9-7" already present on machine
  Normal   Created    29m (x3 over 35m)   kubelet            Created container mongod
  Normal   Started    29m (x3 over 35m)   kubelet            Started container mongod
  Warning  BackOff    10m (x7 over 11m)   kubelet            Back-off restarting failed container mongod in pod mongodb-db-psmdb-db-rs0-0_mongo-db(7c0bda6c-da2e-4e74-8011-066746f99434)
  Warning  Unhealthy  16s (x40 over 33m)  kubelet            Liveness probe failed: command "/opt/percona/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem --startupDelaySeconds 7200" timed out

Details:

Our Cluster:

I have 2 different K8 clusters and have network connectivity with each other. I want masters and some replicas in one cluster and more replicas only in the other cluster.

Below is what I did,

I download the helm chart for the Percona operator and Percona Mongodb, version 1.15.0

I followed the document, “Set up Percona Server for MongoDB cross-site replication”

I deployed operators in both the clusters using below values.yaml file. (They are successfully up in both clusters)

# Default values for psmdb-operator.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 3

image:
  repository: xyz.myrepo.com/percona-mongo/percona-server-mongodb-operator
  tag: 1.15.0
  pullPolicy: IfNotPresent

# disableTelemetry: according to
# https://docs.percona.com/percona-operator-for-mongodb/telemetry.html
# this is how you can disable telemetry collection
# default is false which means telemetry will be collected
disableTelemetry: false

# set if you want to specify a namespace to watch
# defaults to `.Release.namespace` if left blank
# watchNamespace:

# set if operator should be deployed in cluster wide mode. defaults to false
# watchAllNamespaces: false
watchAllNamespaces: true

# rbac: settings for deployer RBAC creation
rbac:
  # rbac.create: if false RBAC resources should be in place
  create: true

# serviceAccount: settings for Service Accounts used by the deployer
serviceAccount:
  # serviceAccount.create: Whether to create the Service Accounts or not
  create: true

# podAnnotations: {}
  # prometheus.io/scrape: "true"
  # prometheus.io/port: "8080"

podAnnotations:
  mcs.xxxxxxx/enable: "true"

podSecurityContext: {}
  # runAsNonRoot: true
  # runAsUser: 2
  # runAsGroup: 2
  # fsGroup: 2
  # fsGroupChangePolicy: "OnRootMismatch"

securityContext: {}
  # allowPrivilegeEscalation: false
  # capabilities:
  #   drop:
  #   - ALL
  # seccompProfile:
  #   type: RuntimeDefault

# set if you want to use a different operator name
# defaults to `percona-server-mongodb-operator`
# operatorName:

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

env:
  resyncPeriod: 5s

resources: {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi

nodeSelector: {}

tolerations: []

# affinity: {}
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: kubernetes.io/hostname


# logStructured: false
# logLevel: "INFO"

logStructured: true
logLevel: "INFO"

Now on the main site, I deployed the MongoDB using the below values.yaml file and its also up successfully.

finalizers:
  - delete-psmdb-pods-in-order
nameOverride: ""
fullnameOverride: ""
crVersion: 1.15.0
pause: false
unmanaged: false
allowUnsafeConfigurations: false
multiCluster:
  enabled: false
updateStrategy: SmartUpdate
upgradeOptions:
  versionServiceEndpoint: https://check.percona.com
  apply: disabled
  schedule: "0 2 * * *"
  setFCV: false
image:
  repository: xyz.myrepo.com/percona-mongo/percona-server-mongodb
  tag: 6.0.9-7
imagePullPolicy: IfNotPresent
secrets: {}
pmm:
  enabled: false
  image:
    repository: percona/pmm-client
    tag: 2.39.0
  serverHost: monitoring-service
replsets:
  - name: rs0
    labels:
      mcs.xxxxxxx/enable: "true"
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      serviceLabels:
        mcs.xxxxxxx/enable: "true"
      enabled: true
      exposeType: ClusterIP
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      pvc:
        storageClassName: hdd-jbod-lvm-ext4-0
        resources:
          requests:
            storage: 3Gi
    nonvoting:
      enabled: false
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
        pvc:
          storageClassName: hdd-jbod-lvm-ext4-0
          resources:
            requests:
              storage: 3Gi
    arbiter:
      enabled: false
      size: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
sharding:
  enabled: false
  balancer:
    enabled: false
  configrs:
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      exposeType: ClusterIP
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      pvc:
        storageClassName: hdd-jbod-lvm-ext4-0
        resources:
          requests:
            storage: 3Gi
  mongos:
    size: 2
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    expose:
      exposeType: ClusterIP
backup:
  enabled: false
  image:
    repository: xyz.myrepo.com/percona-mongo/percona-backup-mongodb
    tag: 2.3.0
  serviceAccountName: percona-server-mongodb-operator
  storages:
  pitr:
    enabled: false
    oplogOnly: false
  tasks:

Then I copied the below secrets and deployed them to the replica site,
I removed the annotations , creationTimestamp , resourceVersion , selfLink , and uid metadata fields from the resulting file to make it ready for the Replica .

mongodb-db-psmdb-db-secrets                  Opaque                                10     4d20h
mongodb-db-psmdb-db-ssl                      kubernetes.io/tls                     3      4d19h
mongodb-db-psmdb-db-ssl-internal             kubernetes.io/tls                     3      4d19h

Now I deployed the replica site mongodb with below changes in values file.
updateStrategy: OnDelete
backup: false

My values.yaml for replica site.

finalizers:
  - delete-psmdb-pods-in-order
nameOverride: ""
fullnameOverride: ""
crVersion: 1.15.0
pause: false
unmanaged: true
allowUnsafeConfigurations: false
multiCluster:
  enabled: false
updateStrategy: OnDelete
upgradeOptions:
  versionServiceEndpoint: https://check.percona.com
  apply: disabled
  schedule: "0 2 * * *"
  setFCV: false
image:
  repository: xyz.myrepo.com/percona-mongo/percona-server-mongodb
  tag: 6.0.9-7
imagePullPolicy: IfNotPresent
secrets: {}
pmm:
  enabled: false
  image:
    repository: percona/pmm-client
    tag: 2.39.0
  serverHost: monitoring-service
replsets:
  - name: rs0
    labels:
      mcs.xxxxxxx/enable: "true"
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      serviceLabels:
        mcs.xxxxxxx/enable: "true"
      enabled: true
      exposeType: ClusterIP
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      pvc:
        storageClassName: ssd-jbod-lvm-ext4-0
        resources:
          requests:
            storage: 3Gi
    nonvoting:
      enabled: false
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
        pvc:
          storageClassName: ssd-jbod-lvm-ext4-0
          resources:
            requests:
              storage: 3Gi
    arbiter:
      enabled: false
      size: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
sharding:
  enabled: false
  balancer:
    enabled: false
  configrs:
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      exposeType: ClusterIP
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      pvc:
        storageClassName: ssd-jbod-lvm-ext4-0
        resources:
          requests:
            storage: 3Gi
  mongos:
    size: 2
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    expose:
      exposeType: ClusterIP
backup:
  enabled: false
  image:
    repository: xyz.myrepo.com/percona-mongo/percona-backup-mongodb
    tag: 2.3.0
  serviceAccountName: percona-server-mongodb-operator
  storages:
  pitr:
    enabled: false
    oplogOnly: false
  tasks:

On deploying this the replica cluster comes up but the pods keep restarting with the below WARNING.

  Normal   Killing    29m (x2 over 32m)   kubelet            Container mongod failed liveness probe, will be restarted
  Normal   Pulled     29m (x3 over 35m)   kubelet            Container image "xyz.myrepo.com/percona-mongo/percona-server-mongodb:6.0.9-7" already present on machine
  Normal   Created    29m (x3 over 35m)   kubelet            Created container mongod
  Normal   Started    29m (x3 over 35m)   kubelet            Started container mongod
  Warning  BackOff    10m (x7 over 11m)   kubelet            Back-off restarting failed container mongod in pod mongodb-db-psmdb-db-rs0-0_mongo-db(7c0bda6c-da2e-4e74-8011-066746f99434)
  Warning  Unhealthy  16s (x40 over 33m)  kubelet            Liveness probe failed: command "/opt/percona/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem --startupDelaySeconds 7200" timed out

I had made sure that all the 3 main secrets were copied but still, I don’t understand why it fails.

Please help me with this. I don’t see any clear document or video that will show the helm installation for the same. Such a document should be really helpful as I am pretty sure that many of them might have faced this issue.

I also assume that post the replica site is up, I just need to add the IPs or its ClusterIP in the main site values file like below and do the helm upgrade.

replsets:
  - name: rs0
    size: 3
    # externalNodes:
     - host: 34.124.76.XX
     - host: 34.124.76.XX
       port: 27017
       votes: 0
       priority: 0
     - host: 34.124.76.XX

Please help me with this. I don’t see any clear document or video that will show the helm installation for the same. Such a document should be really helpful as I am pretty sure that many of them might have faced this issue.