MongoDB Sharded Cluster Stuck Initializing Due to Denylisted Sync Source

Description:

After deploying a MongoDB sharded cluster on Kubernetes, the custom resource remains in the “initializing” state for over 29 hours. The MongoDB pod logs show an error indicating that it “Cannot select sync source which is denylisted,” preventing the replica set from completing initialization.

Steps to Reproduce:

  1. Install the Percona Server for MongoDB Operator into a Kubernetes cluster (e.g., on AWS EKS in us-east-2).
  2. Apply a ShardedCluster custom resource (e.g., psmdb-default-sharded) using kubectl apply -f sharded-cluster.yaml.
  3. Observe the status of the sharded cluster resource with:
    kubectl get psmdbshardedcluster
    
  4. Check the pod logs of one of the mongod instances:
    kubectl logs <mongod-pod-name>
    
    Notice the “Cannot select sync source which is denylisted” message.
  5. Notice that the ShardedCluster custom resource remains in “initializing” status for an extended period.

Version:

  • Percona Server for MongoDB Operator: not specified
  • Kubernetes: not specified
  • MongoDB (inside the sharded cluster): not specified

Logs:

{"t":{"$date":"2025-06-02T12:42:16.935+00:00"},"s":"I","c":"REPL","id":3873115,"svc":"S","ctx":"BackgroundSync","msg":"Cannot select sync source which is denylisted","attr":{"syncSourceCandidate":"psmdb-default-sharded-rs2-1.psmdb-default-sharded-rs2.mongodb.svc.cluster.local:27017"}}

NAME                    ENDPOINT                                                                                                                                                                                                                                 STATUS         AGE
psmdb-default-sharded   k8s-mongodb-psmdbdef-20085bde5c-075266cfc8e2eba7.elb.us-east-2.amazonaws.com,k8s-mongodb-psmdbdef-79f9e52beb-752135ba4ca3a076.elb.us-east-2.amazonaws.com,k8s-mongodb-psmdbdef-a8f80c6d10-ee1dcbf568bb7727.elb.us-east-2.amazonaws.com   initializing   29h

Expected Result:

  • The sharded cluster should finish initializing (replica sets elect a primary, secondaries begin replication) and transition to “Ready” or “Running” status within a few minutes.

Actual Result:

  • The ShardedCluster resource remains in “initializing” state for over 29 hours.
  • The MongoDB log repeatedly logs:
    Cannot select sync source which is denylisted
    
    and no primary is ever elected.

Additional Information:

  • Kubernetes cluster is running in AWS us-east-2 region.
  • The Service DNS for the secondary candidate (psmdb-default-sharded-rs2-1.psmdb-default-sharded-rs2.mongodb.svc.cluster.local:27017) appears in the denylist.
  • No other error messages are present in the logs.
  • Browser/Client: N/A (issue is at the database operator level).
  • Device: N/A.
pod/psmdb-default-sharded-cfg-0       2/2     Running   0          29m
pod/psmdb-default-sharded-cfg-1       2/2     Running   0          30m
pod/psmdb-default-sharded-cfg-2       2/2     Running   0          30m
pod/psmdb-default-sharded-mongos-0    1/1     Running   0          22m
pod/psmdb-default-sharded-mongos-1    1/1     Running   0          22m
pod/psmdb-default-sharded-mongos-2    1/1     Running   0          23m
pod/psmdb-default-sharded-rs0-0       2/2     Running   0          27m
pod/psmdb-default-sharded-rs0-1       2/2     Running   0          28m
pod/psmdb-default-sharded-rs0-2       2/2     Running   0          28m
pod/psmdb-default-sharded-rs1-0       2/2     Running   0          25m
pod/psmdb-default-sharded-rs1-1       2/2     Running   0          26m
pod/psmdb-default-sharded-rs1-2       2/2     Running   0          26m
pod/psmdb-default-sharded-rs2-0       2/2     Running   0          24m
pod/psmdb-default-sharded-rs2-1       2/2     Running   0          24m
pod/psmdb-default-sharded-rs2-2       2/2     Running   0          12m
pod/psmdb-operator-55c946ff4b-8qn4n   1/1     Running   0          36m

NAME                                     TYPE           CLUSTER-IP       EXTERNAL-IP                                                                    PORT(S)           AGE
service/psmdb-default-sharded-cfg        ClusterIP      None             <none>                                                                         27017/TCP         29h
service/psmdb-default-sharded-metrics    ClusterIP      172.20.118.42    <none>                                                                         9216/TCP          29h
service/psmdb-default-sharded-mongos-0   LoadBalancer   172.20.188.196   k8s-mongodb-psmdbdef-20085bde5c-075266cfc8e2eba7.elb.us-east-2.amazonaws.com   27017:31009/TCP   29h
service/psmdb-default-sharded-mongos-1   LoadBalancer   172.20.35.229    k8s-mongodb-psmdbdef-a8f80c6d10-ee1dcbf568bb7727.elb.us-east-2.amazonaws.com   27017:30664/TCP   29h
service/psmdb-default-sharded-mongos-2   LoadBalancer   172.20.152.244   k8s-mongodb-psmdbdef-79f9e52beb-752135ba4ca3a076.elb.us-east-2.amazonaws.com   27017:32004/TCP   29h
service/psmdb-default-sharded-rs0        ClusterIP      None             <none>                                                                         27017/TCP         29h
service/psmdb-default-sharded-rs1        ClusterIP      None             <none>                                                                         27017/TCP         29h
service/psmdb-default-sharded-rs2        ClusterIP      None             <none>                                                                         27017/TCP         29h

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/psmdb-operator   1/1     1            1           29h

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/psmdb-operator-55c946ff4b   1         1         1       36m
replicaset.apps/psmdb-operator-589dfc6d8d   0         0         0       29h

NAME                                            READY   AGE
statefulset.apps/psmdb-default-sharded-cfg      3/3     29h
statefulset.apps/psmdb-default-sharded-mongos   3/3     29h
statefulset.apps/psmdb-default-sharded-rs0      3/3     29h
statefulset.apps/psmdb-default-sharded-rs1      3/3     29h
statefulset.apps/psmdb-default-sharded-rs2      3/3     29h

operator

2025-06-02T12:29:24.179Z        INFO    SmartUpdate     apply changes to secondary pod  {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412", "statefulset": "psmdb-default-sharded-rs2", "replset": "rs2", "pod": "psmdb-default-sharded-rs2-1"}
2025-06-02T12:29:58.316Z        INFO    Pod started     {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412", "pod": "psmdb-default-sharded-rs2-1"}
2025-06-02T12:29:58.349Z        INFO    SmartUpdate     primary pod detected    {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412", "statefulset": "psmdb-default-sharded-rs2", "replset": "rs2", "pod": "psmdb-default-sharded-rs2-0"}
2025-06-02T12:29:58.349Z        INFO    SmartUpdate     doing step down...      {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412", "statefulset": "psmdb-default-sharded-rs2", "replset": "rs2", "force": false}
2025-06-02T12:29:59.701Z        INFO    SmartUpdate     apply changes to primary pod    {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412", "statefulset": "psmdb-default-sharded-rs2", "replset": "rs2", "pod": "psmdb-default-sharded-rs2-0"}
2025-06-02T12:30:34.409Z        INFO    Pod started     {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412", "pod": "psmdb-default-sharded-rs2-0"}
2025-06-02T12:30:34.409Z        INFO    SmartUpdate     smart update finished for statefulset   {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412", "statefulset": "psmdb-default-sharded-rs2", "replset": "rs2"}
2025-06-02T12:30:35.887Z        INFO    StatefulSet is changed, starting smart update   {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412", "name": "psmdb-default-sharded-mongos"}
2025-06-02T12:31:10.539Z        INFO    Pod started     {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412", "pod": "psmdb-default-sharded-mongos-2"}
2025-06-02T12:31:41.276Z        INFO    Pod started     {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412", "pod": "psmdb-default-sharded-mongos-1"}
2025-06-02T12:32:15.616Z        INFO    Pod started     {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412", "pod": "psmdb-default-sharded-mongos-0"}
2025-06-02T12:32:15.626Z        INFO    smart update finished for mongos statefulset    {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412"}
2025-06-02T12:32:15.828Z        INFO    balancer enabled        {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "9433f0e1-a901-4ccd-b891-7dac22e77412"}
2025-06-02T12:35:32.009Z        INFO    cluster is not ready    {"controller": "psmdb-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDB", "PerconaServerMongoDB": {"name":"psmdb-default-sharded","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-default-sharded", "reconcileID": "80520563-e34e-4d6a-be1e-e03d611e23dd", "job": "telemetry/mongodb/psmdb-default-sharded"}
 kubectl -n mongodb describe perconaservermongodb psmdb-default-sharded
Name:         psmdb-default-sharded
Namespace:    mongodb
Labels:       app.kubernetes.io/instance=psmdb-default-sharded
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=psmdb-default-sharded
              app.kubernetes.io/version=1.20.0
              argocd.argoproj.io/instance=psmdb-default-sharded
              helm.sh/chart=psmdb-db-1.20.0
Annotations:  <none>
API Version:  psmdb.percona.com/v1
Kind:         PerconaServerMongoDB
Metadata:
  Creation Timestamp:  2025-06-01T07:22:40Z
  Finalizers:
    percona.com/delete-psmdb-pods-in-order
  Generation:        2
  Resource Version:  21472160
  UID:               2a62c723-61a1-47b2-a975-e3c0ecb35494
Spec:
  Backup:
    Enabled:  true
    Image:    percona/percona-backup-mongodb:2.9.1
    Pitr:
      Enabled:              false
  Cr Version:               1.20.0
  Enable Volume Expansion:  true
  Image:                    percona/percona-server-mongodb:8.0.8-3
  Image Pull Policy:        Always
  Multi Cluster:
    Enabled:  false
  Pause:      false
  Pmm:
    Enabled:      false
    Image:        percona/pmm-client:2.44.1
    Server Host:  monitoring-service
  Replsets:
    Affinity:
      Advanced:
        Pod Anti Affinity:
          Required During Scheduling Ignored During Execution:
            Label Selector:
              Match Labels:
                app.kubernetes.io/replset:  rs0
            Topology Key:                   kubernetes.io/hostname
      Anti Affinity Topology Key:           kubernetes.io/hostname
    Arbiter:
      Affinity:
        Anti Affinity Topology Key:  kubernetes.io/hostname
      Enabled:                       false
      Size:                          1
    Configuration:                   setParameter:
  transactionLifetimeLimitSeconds: 300

    Expose:
      Enabled:  false
      Type:     ClusterIP
    Name:       rs0
    Node Selector:
      Karpenter - Node - Pool:           mongodb-sharded
      karpenter.sh/capacity-type:        on-demand
      node.kubernetes.io/instance-type:  r8g.xlarge
    Nonvoting:
      Affinity:
        Anti Affinity Topology Key:  kubernetes.io/hostname
      Enabled:                       false
      Pod Disruption Budget:
        Max Unavailable:  1
      Resources:
        Limits:
          Cpu:     300m
          Memory:  0.5G
        Requests:
          Cpu:     300m
          Memory:  0.5G
      Size:        3
      Volume Spec:
        Persistent Volume Claim:
          Resources:
            Requests:
              Storage:  3Gi
    Pod Disruption Budget:
      Max Unavailable:  1
    Resources:
      Limits:
        Cpu:     8
        Memory:  8Gi
      Requests:
        Cpu:     300m
        Memory:  500M
    Size:        3
    Tolerations:
      Effect:    NoSchedule
      Key:       karpenter/mongodb-sharded
      Operator:  Exists
    Topology Spread Constraints:
      Label Selector:
        Match Labels:
          app.kubernetes.io/replset:  rs0
      Max Skew:                       1
      Topology Key:                   topology.kubernetes.io/zone
      When Unsatisfiable:             DoNotSchedule
    Volume Spec:
      Persistent Volume Claim:
        Resources:
          Requests:
            Storage:         100Gi
        Storage Class Name:  mongodb
    Affinity:
      Advanced:
        Pod Anti Affinity:
          Required During Scheduling Ignored During Execution:
            Label Selector:
              Match Labels:
                app.kubernetes.io/replset:  rs1
            Topology Key:                   kubernetes.io/hostname
    Configuration:                          setParameter:
  transactionLifetimeLimitSeconds: 300

    Expose:
      Enabled:  false
    Name:       rs1
    Node Selector:
      Karpenter - Node - Pool:           mongodb-sharded
      karpenter.sh/capacity-type:        on-demand
      node.kubernetes.io/instance-type:  r8g.xlarge
    Resources:
      Limits:
        Cpu:     8
        Memory:  8Gi
      Requests:
        Cpu:     300m
        Memory:  500M
    Size:        3
    Tolerations:
      Effect:    NoSchedule
      Key:       karpenter/mongodb-sharded
      Operator:  Exists
    Topology Spread Constraints:
      Label Selector:
        Match Labels:
          app.kubernetes.io/replset:  rs1
      Max Skew:                       1
      Topology Key:                   topology.kubernetes.io/zone
      When Unsatisfiable:             DoNotSchedule
    Volume Spec:
      Persistent Volume Claim:
        Resources:
          Requests:
            Storage:         100Gi
        Storage Class Name:  mongodb
    Affinity:
      Advanced:
        Pod Anti Affinity:
          Required During Scheduling Ignored During Execution:
            Label Selector:
              Match Labels:
                app.kubernetes.io/replset:  rs2
            Topology Key:                   kubernetes.io/hostname
    Configuration:                          setParameter:
  transactionLifetimeLimitSeconds: 300

    Expose:
      Enabled:  false
    Name:       rs2
    Node Selector:
      Karpenter - Node - Pool:           mongodb-sharded
      karpenter.sh/capacity-type:        on-demand
      node.kubernetes.io/instance-type:  r8g.xlarge
    Resources:
      Limits:
        Cpu:     8
        Memory:  8Gi
      Requests:
        Cpu:     300m
        Memory:  500M
    Size:        3
    Tolerations:
      Effect:    NoSchedule
      Key:       karpenter/mongodb-sharded
      Operator:  Exists
    Topology Spread Constraints:
      Label Selector:
        Match Labels:
          app.kubernetes.io/replset:  rs2
      Max Skew:                       1
      Topology Key:                   topology.kubernetes.io/zone
      When Unsatisfiable:             DoNotSchedule
    Volume Spec:
      Persistent Volume Claim:
        Resources:
          Requests:
            Storage:         100Gi
        Storage Class Name:  mongodb
  Secrets:
    Users:  psmdb-default-sharded-secrets
  Sharding:
    Balancer:
      Enabled:  true
    Configsvr Repl Set:
      Affinity:
        Advanced:
          Pod Anti Affinity:
            Required During Scheduling Ignored During Execution:
              Label Selector:
                Match Labels:
                  app.kubernetes.io/component:  cfg
              Topology Key:                     kubernetes.io/hostname
        Anti Affinity Topology Key:             kubernetes.io/hostname
      Expose:
        Enabled:  false
        Type:     ClusterIP
      Node Selector:
        Karpenter - Node - Pool:           mongodb-sharded
        karpenter.sh/capacity-type:        on-demand
        node.kubernetes.io/instance-type:  r8g.xlarge
      Pod Disruption Budget:
        Max Unavailable:  1
      Resources:
        Limits:
          Cpu:     1
          Memory:  2Gi
        Requests:
          Cpu:     300m
          Memory:  0.5G
      Size:        3
      Tolerations:
        Effect:    NoSchedule
        Key:       karpenter/mongodb-sharded
        Operator:  Exists
      Topology Spread Constraints:
        Label Selector:
          Match Labels:
            app.kubernetes.io/component:  cfg
        Max Skew:                         1
        Topology Key:                     topology.kubernetes.io/zone
        When Unsatisfiable:               DoNotSchedule
      Volume Spec:
        Persistent Volume Claim:
          Resources:
            Requests:
              Storage:  3Gi
    Enabled:            true
    Mongos:
      Affinity:
        Advanced:
          Pod Anti Affinity:
            Required During Scheduling Ignored During Execution:
              Label Selector:
                Match Labels:
                  app.kubernetes.io/component:  mongos
              Topology Key:                     kubernetes.io/hostname
        Anti Affinity Topology Key:             kubernetes.io/hostname
      Expose:
        Annotations:
          service.beta.kubernetes.io/aws-load-balancer-ip-address-type:  ipv4
          service.beta.kubernetes.io/aws-load-balancer-nlb-target-type:  ip
          service.beta.kubernetes.io/aws-load-balancer-scheme:           internal
        Service Per Pod:                                                 true
        Type:                                                            LoadBalancer
      Node Selector:
        Karpenter - Node - Pool:           mongodb-sharded
        karpenter.sh/capacity-type:        on-demand
        node.kubernetes.io/instance-type:  r8g.xlarge
      Pod Disruption Budget:
        Max Unavailable:  1
      Resources:
        Limits:
          Cpu:     8
          Memory:  5Gi
        Requests:
          Cpu:     2
          Memory:  2Gi
      Size:        3
      Tolerations:
        Effect:    NoSchedule
        Key:       karpenter/mongodb-sharded
        Operator:  Exists
  Unmanaged:       false
  Unsafe Flags:
    Backup If Unhealthy:       false
    Mongos Size:               false
    Replset Size:              false
    Termination Grace Period:  false
    Tls:                       false
  Update Strategy:             SmartUpdate
  Upgrade Options:
    Apply:                     disabled
    Schedule:                  0 2 * * *
    Set FCV:                   false
    Version Service Endpoint:  https://check.percona.com
Status:
  Conditions:
    Last Transition Time:  2025-06-01T07:22:40Z
    Status:                True
    Type:                  sharding
    Last Transition Time:  2025-06-01T07:22:46Z
    Status:                True
    Type:                  initializing
    Last Transition Time:  2025-06-01T07:26:27Z
    Message:               failed to disable balancer: failed to get mongos connection: ping mongo: server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: 172.20.188.196:27017, Type: Unknown, Last error: dial tcp 172.20.188.196:27017: connect: connection refused }, ] }
    Reason:                ErrorReconcile
    Status:                True
    Type:                  error
    Last Transition Time:  2025-06-01T07:26:38Z
    Message:               rs2: ready
    Reason:                RSReady
    Status:                True
    Type:                  ready
  Host:                    k8s-mongodb-psmdbdef-20085bde5c-075266cfc8e2eba7.elb.us-east-2.amazonaws.com,k8s-mongodb-psmdbdef-79f9e52beb-752135ba4ca3a076.elb.us-east-2.amazonaws.com,k8s-mongodb-psmdbdef-a8f80c6d10-ee1dcbf568bb7727.elb.us-east-2.amazonaws.com
  Mongo Image:             percona/percona-server-mongodb:8.0.8-3
  Mongo Version:           8.0.8-3
  Mongos:
    Ready:              3
    Size:               3
    Status:             ready
  Observed Generation:  2
  Ready:                15
  Replsets:
    Cfg:
      Initialized:  true
      Members:
        psmdb-default-sharded-cfg-0:
          Name:       psmdb-default-sharded-cfg-0.psmdb-default-sharded-cfg.mongodb.svc.cluster.local:27017
          State:      2
          State Str:  SECONDARY
        psmdb-default-sharded-cfg-1:
          Name:       psmdb-default-sharded-cfg-1.psmdb-default-sharded-cfg.mongodb.svc.cluster.local:27017
          State:      1
          State Str:  PRIMARY
        psmdb-default-sharded-cfg-2:
          Name:       psmdb-default-sharded-cfg-2.psmdb-default-sharded-cfg.mongodb.svc.cluster.local:27017
          State:      2
          State Str:  SECONDARY
      Ready:          3
      Size:           3
      Status:         ready
    rs0:
      added_as_shard:  true
      Initialized:     true
      Members:
        psmdb-default-sharded-rs0-0:
          Name:       psmdb-default-sharded-rs0-0.psmdb-default-sharded-rs0.mongodb.svc.cluster.local:27017
          State:      2
          State Str:  SECONDARY
        psmdb-default-sharded-rs0-1:
          Name:       psmdb-default-sharded-rs0-1.psmdb-default-sharded-rs0.mongodb.svc.cluster.local:27017
          State:      1
          State Str:  PRIMARY
        psmdb-default-sharded-rs0-2:
          Name:       psmdb-default-sharded-rs0-2.psmdb-default-sharded-rs0.mongodb.svc.cluster.local:27017
          State:      2
          State Str:  SECONDARY
      Ready:          3
      Size:           3
      Status:         ready
    rs1:
      added_as_shard:  true
      Initialized:     true
      Members:
        psmdb-default-sharded-rs1-0:
          Name:       psmdb-default-sharded-rs1-0.psmdb-default-sharded-rs1.mongodb.svc.cluster.local:27017
          State:      2
          State Str:  SECONDARY
        psmdb-default-sharded-rs1-1:
          Name:       psmdb-default-sharded-rs1-1.psmdb-default-sharded-rs1.mongodb.svc.cluster.local:27017
          State:      1
          State Str:  PRIMARY
        psmdb-default-sharded-rs1-2:
          Name:       psmdb-default-sharded-rs1-2.psmdb-default-sharded-rs1.mongodb.svc.cluster.local:27017
          State:      2
          State Str:  SECONDARY
      Ready:          3
      Size:           3
      Status:         ready
    rs2:
      added_as_shard:  true
      Initialized:     true
      Members:
        psmdb-default-sharded-rs2-0:
          Name:       psmdb-default-sharded-rs2-0.psmdb-default-sharded-rs2.mongodb.svc.cluster.local:27017
          State:      2
          State Str:  SECONDARY
        psmdb-default-sharded-rs2-1:
          Name:       psmdb-default-sharded-rs2-1.psmdb-default-sharded-rs2.mongodb.svc.cluster.local:27017
          State:      1
          State Str:  PRIMARY
        psmdb-default-sharded-rs2-2:
          Name:       psmdb-default-sharded-rs2-2.psmdb-default-sharded-rs2.mongodb.svc.cluster.local:27017
          State:      3
          State Str:  RECOVERING
      Ready:          3
      Size:           3
      Status:         ready
  Size:               15
  State:              initializing
Events:               <none>

This occurs as i was provisioning a fresh cluster and doing a mongosync from a different cluster to this new one.

The shard’s secondary node fell so far behind during the bulk data sync that the primary’s oplog entries rolled off before the secondary could apply them, forcing it into a full initial sync (RECOVERING) instead of simple catch-up. Because it couldn’t become a healthy SECONDARY, the Percona operator left the CR “Progressing” and Argo CD never marked it Healthy.

The solution is to to do a full initial scan by deleting the underlining data. As this is a secondary replica its relatively safe to do. Although mongodb does not have an inbuilt way to do this.

llm reccomended to delete the PVC → PV → Pod of the shard, i also had to do it in this order to let the operation hang otherwise pod is just restarted and rebinds to the same pvc.

Once a new pv/pvc/pod is created it starts an initial sync and should eventually catch up to the primary of that shard. I was able to monitor the progress using kubectl -n mongodb describe perconaservermongodb psmdb-default-sharded and prometheus metricis of the persistent volume.

    rs2:
      added_as_shard:  true
      Initialized:     true
      Members:
        psmdb-default-sharded-rs2-0:
          Name:       psmdb-default-sharded-rs2-0.psmdb-default-sharded-rs2.mongodb.svc.cluster.local:27017
          State:      1
          State Str:  PRIMARY
        psmdb-default-sharded-rs2-1:
          Name:       psmdb-default-sharded-rs2-1.psmdb-default-sharded-rs2.mongodb.svc.cluster.local:27017
          State:      2
          State Str:  SECONDARY
        psmdb-default-sharded-rs2-2:
          Name:       psmdb-default-sharded-rs2-2.psmdb-default-sharded-rs2.mongodb.svc.cluster.local:27017
          State:      2
          State Str:  SECONDARY

I could not find any documentation on percona around this issue, so i don’t know if this is the most suitable solution but it is one that works and will get the state of the current cluster back to ready/healthy.

To prevent this from happening in the future - There are a few solutions i can come across to prevent this from happening in the first place.

  1. Increase the size of your replicas during the mongosync ( seems to work - currently trying this).
  • Update: problem still occurs despite using a larger instance
  1. Set size rs2.size = 1 during the mongosync and increase to 3 once its done.
  2. Increase the size of oplog.