Cross DC/region replica site of percona mongodb crashing for certificate

Hi @Sergey_Pronin,

I have 2 different k8 cluster clusters in 2 different regions which are reachable to each other.

I am facing an issue following the document for “Set up Percona Server for MongoDB cross-site replication”.

The replica site pods keeps restarting with below error:

  Normal   Killing    29m (x2 over 32m)   kubelet            Container mongod failed liveness probe, will be restarted
  Normal   Pulled     29m (x3 over 35m)   kubelet            Container image "xyz.myrepo.com/percona-mongo/percona-server-mongodb:6.0.9-7" already present on machine
  Normal   Created    29m (x3 over 35m)   kubelet            Created container mongod
  Normal   Started    29m (x3 over 35m)   kubelet            Started container mongod
  Warning  BackOff    10m (x7 over 11m)   kubelet            Back-off restarting failed container mongod in pod mongodb-db-psmdb-db-rs0-0_mongo-db(7c0bda6c-da2e-4e74-8011-066746f99434)
  Warning  Unhealthy  16s (x40 over 33m)  kubelet            Liveness probe failed: command "/opt/percona/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem --startupDelaySeconds 7200" timed out

NOTE: I have made sure that the my-cluster-name-secrets , my-cluster-name-ssl , and my-cluster-name-ssl-internal secrets from main site are available in replica site.

DETAILS

Our Cluster:

I have 2 different K8 clusters and have network connectivity with each other. I want masters and some replicas in one cluster and more replicas only in the other cluster.

Below is what I did,

I download the helm chart for the Percona operator and Percona Mongodb, version 1.15.0

I followed the document, “Set up Percona Server for MongoDB cross-site replication”

I deployed operators in both the clusters using below values.yaml file. (They are successfully up in both clusters)

# Default values for psmdb-operator.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

replicaCount: 3

image:
  repository: xyz.myrepo.com/percona-mongo/percona-server-mongodb-operator
  tag: 1.15.0
  pullPolicy: IfNotPresent

# disableTelemetry: according to
# https://docs.percona.com/percona-operator-for-mongodb/telemetry.html
# this is how you can disable telemetry collection
# default is false which means telemetry will be collected
disableTelemetry: false

# set if you want to specify a namespace to watch
# defaults to `.Release.namespace` if left blank
# watchNamespace:

# set if operator should be deployed in cluster wide mode. defaults to false
# watchAllNamespaces: false
watchAllNamespaces: true

# rbac: settings for deployer RBAC creation
rbac:
  # rbac.create: if false RBAC resources should be in place
  create: true

# serviceAccount: settings for Service Accounts used by the deployer
serviceAccount:
  # serviceAccount.create: Whether to create the Service Accounts or not
  create: true

# podAnnotations: {}
  # prometheus.io/scrape: "true"
  # prometheus.io/port: "8080"

podAnnotations:
  mcs.xxxxxxx/enable: "true"

podSecurityContext: {}
  # runAsNonRoot: true
  # runAsUser: 2
  # runAsGroup: 2
  # fsGroup: 2
  # fsGroupChangePolicy: "OnRootMismatch"

securityContext: {}
  # allowPrivilegeEscalation: false
  # capabilities:
  #   drop:
  #   - ALL
  # seccompProfile:
  #   type: RuntimeDefault

# set if you want to use a different operator name
# defaults to `percona-server-mongodb-operator`
# operatorName:

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

env:
  resyncPeriod: 5s

resources: {}
  # We usually recommend not to specify default resources and to leave this as a conscious
  # choice for the user. This also increases chances charts run on environments with little
  # resources, such as Minikube. If you do want to specify resources, uncomment the following
  # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi

nodeSelector: {}

tolerations: []

# affinity: {}
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: kubernetes.io/hostname


# logStructured: false
# logLevel: "INFO"

logStructured: true
logLevel: "INFO"

Now on the main site, I deployed the MongoDB using the below values.yaml file and its also up successfully.

finalizers:
  - delete-psmdb-pods-in-order
nameOverride: ""
fullnameOverride: ""
crVersion: 1.15.0
pause: false
unmanaged: false
allowUnsafeConfigurations: false
multiCluster:
  enabled: false
updateStrategy: SmartUpdate
upgradeOptions:
  versionServiceEndpoint: https://check.percona.com
  apply: disabled
  schedule: "0 2 * * *"
  setFCV: false
image:
  repository: xyz.myrepo.com/percona-mongo/percona-server-mongodb
  tag: 6.0.9-7
imagePullPolicy: IfNotPresent
secrets: {}
pmm:
  enabled: false
  image:
    repository: percona/pmm-client
    tag: 2.39.0
  serverHost: monitoring-service
replsets:
  - name: rs0
    labels:
      mcs.xxxxxxx/enable: "true"
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      serviceLabels:
        mcs.xxxxxxx/enable: "true"
      enabled: true
      exposeType: ClusterIP
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      pvc:
        storageClassName: hdd-jbod-lvm-ext4-0
        resources:
          requests:
            storage: 3Gi
    nonvoting:
      enabled: false
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
        pvc:
          storageClassName: hdd-jbod-lvm-ext4-0
          resources:
            requests:
              storage: 3Gi
    arbiter:
      enabled: false
      size: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
sharding:
  enabled: false
  balancer:
    enabled: false
  configrs:
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      exposeType: ClusterIP
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      pvc:
        storageClassName: hdd-jbod-lvm-ext4-0
        resources:
          requests:
            storage: 3Gi
  mongos:
    size: 2
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    expose:
      exposeType: ClusterIP
backup:
  enabled: false
  image:
    repository: xyz.myrepo.com/percona-mongo/percona-backup-mongodb
    tag: 2.3.0
  serviceAccountName: percona-server-mongodb-operator
  storages:
  pitr:
    enabled: false
    oplogOnly: false
  tasks:

Then I copied the below secrets and deployed them to the replica site,
I removed the annotations , creationTimestamp , resourceVersion , selfLink , and uid metadata fields from the resulting file to make it ready for the Replica .

mongodb-db-psmdb-db-secrets                  Opaque                                10     4d20h
mongodb-db-psmdb-db-ssl                      kubernetes.io/tls                     3      4d19h
mongodb-db-psmdb-db-ssl-internal             kubernetes.io/tls                     3      4d19h

Now I deployed the replica site mongodb with below changes in values file.
updateStrategy: OnDelete
backup: false

My values.yaml for replica site.

finalizers:
  - delete-psmdb-pods-in-order
nameOverride: ""
fullnameOverride: ""
crVersion: 1.15.0
pause: false
unmanaged: true
allowUnsafeConfigurations: false
multiCluster:
  enabled: false
updateStrategy: OnDelete
upgradeOptions:
  versionServiceEndpoint: https://check.percona.com
  apply: disabled
  schedule: "0 2 * * *"
  setFCV: false
image:
  repository: xyz.myrepo.com/percona-mongo/percona-server-mongodb
  tag: 6.0.9-7
imagePullPolicy: IfNotPresent
secrets: {}
pmm:
  enabled: false
  image:
    repository: percona/pmm-client
    tag: 2.39.0
  serverHost: monitoring-service
replsets:
  - name: rs0
    labels:
      mcs.xxxxxxx/enable: "true"
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      serviceLabels:
        mcs.xxxxxxx/enable: "true"
      enabled: true
      exposeType: ClusterIP
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      pvc:
        storageClassName: ssd-jbod-lvm-ext4-0
        resources:
          requests:
            storage: 3Gi
    nonvoting:
      enabled: false
      size: 3
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
        pvc:
          storageClassName: ssd-jbod-lvm-ext4-0
          resources:
            requests:
              storage: 3Gi
    arbiter:
      enabled: false
      size: 1
      affinity:
        antiAffinityTopologyKey: "kubernetes.io/hostname"
sharding:
  enabled: false
  balancer:
    enabled: false
  configrs:
    size: 3
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      exposeType: ClusterIP
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    volumeSpec:
      pvc:
        storageClassName: ssd-jbod-lvm-ext4-0
        resources:
          requests:
            storage: 3Gi
  mongos:
    size: 2
    affinity:
      antiAffinityTopologyKey: "kubernetes.io/hostname"
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: "300m"
        memory: "0.5G"
      requests:
        cpu: "300m"
        memory: "0.5G"
    expose:
      exposeType: ClusterIP
backup:
  enabled: false
  image:
    repository: xyz.myrepo.com/percona-mongo/percona-backup-mongodb
    tag: 2.3.0
  serviceAccountName: percona-server-mongodb-operator
  storages:
  pitr:
    enabled: false
    oplogOnly: false
  tasks:

On deploying this the replica cluster comes up but the pods keep restarting with the below WARNING.

  Normal   Killing    29m (x2 over 32m)   kubelet            Container mongod failed liveness probe, will be restarted
  Normal   Pulled     29m (x3 over 35m)   kubelet            Container image "xyz.myrepo.com/percona-mongo/percona-server-mongodb:6.0.9-7" already present on machine
  Normal   Created    29m (x3 over 35m)   kubelet            Created container mongod
  Normal   Started    29m (x3 over 35m)   kubelet            Started container mongod
  Warning  BackOff    10m (x7 over 11m)   kubelet            Back-off restarting failed container mongod in pod mongodb-db-psmdb-db-rs0-0_mongo-db(7c0bda6c-da2e-4e74-8011-066746f99434)
  Warning  Unhealthy  16s (x40 over 33m)  kubelet            Liveness probe failed: command "/opt/percona/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem --startupDelaySeconds 7200" timed out

I had made sure that all the 3 main secrets were copied but still, I don’t understand why it fails.

Please help me with this. I don’t see any clear document or video that will show the helm installation for the same. Such a document should be really helpful as I am pretty sure that many of them might have faced this issue.

I also assume that post the replica site is up, I just need to add the IPs or its ClusterIP in the main site values file like below and do the helm upgrade.

replsets:
  - name: rs0
    size: 3
    # externalNodes:
     - host: 34.124.76.XX
     - host: 34.124.76.XX
       port: 27017
       votes: 0
       priority: 0
     - host: 34.124.76.XX

Please help me with this. I don’t see any clear document or video that will show the helm installation for the same. Such a document should be really helpful as I am pretty sure that many of them might have faced this issue.

Hi Team,

Please provide me some update on this. Should be very much helpful.

Hi Team,

Please provide me with some updates on this. Should be very much helpful.