Could not find host matching read preference { mode: "nearest" } for set rs0

Description:

Unable to connect to mongodb after changing load balancer configuration scheme from internet-facing to internal. Is there anyway to recover mongodb from this error? I tried switching the configuration back and redeploying, but still same problem.

kubectl run -i --rm --tty percona-client --image=percona/percona-server-mongodb:5.0 --restart=Never \
  -- mongo "mongodb+srv://${ADMIN_USER}:${ADMIN_PASSWORD}@psmdb-db-rs0.mongodb.svc.cluster.local/admin?replicaSet=rs0&ssl=false"

{“t”:{“$date”:“2023-09-20T14:10:39.819Z”},“s”:“I”, “c”:“NETWORK”, “id”:4333208, “ctx”:“ReplicaSetMonitor-TaskExecutor”,“msg”:“RSM host selection timeout”,“attr”:{“replicaSet”:“rs0”,“error”:“FailedToSatisfyReadPreference: Could not find host matching read preference { mode: "nearest" } for set rs0”}}
Error: Could not find host matching read preference { mode: “nearest” } for set rs0, rs0/psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local:27017 :
connect@src/mongo/shell/mongo.js:372:17
@(connect):2:6

Steps to Reproduce:

    expose:
      enabled: true
      exposeType: LoadBalancer
      serviceAnnotations:
        service.beta.kubernetes.io/aws-load-balancer-scheme: internal

Version:

psmdb-db        mongodb         3               2023-09-20 15:01:26.76669879 +0100 BST  deployed        psmdb-db-1.14.3         1.14.0
psmdb-operator  mongodb         1               2023-09-20 14:12:14.052409889 +0100 BST deployed        psmdb-operator-1.14.2   1.14.0

Logs:

2023-09-20T14:17:35.989Z ERROR Reconciler error {"controller": "psmdb-controller", "object": {"name":"psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-db", "reconcileID": "1dcb0931-3c79-4865-aede-6035aca1cb8d", "error": "reconcile StatefulSet for rs0: failed to run smartUpdate: failed to check active jobs: getting PBM object: create PBM connection to psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local:27017: create mongo connection: mongo ping: server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local:27017, Type: RSGhost, Average RTT: 749120 }, ] }", "errorVerbose": "reconcile StatefulSet for rs0: failed to run smartUpdate: failed to check active jobs: getting PBM object: create PBM connection to psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local:27017: create mongo connection: mongo ping: server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local:27017, Type: RSGhost, Average RTT: 749120 }, ] }\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:412\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235
2023-09-20T14:17:36.031Z INFO StatefulSet is changed, starting smart update {"controller": "psmdb-controller", "object": {"name":"psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-db", "reconcileID": "3b5be707-93b3-4aea-922d-6a8dc71e67ab", "name": "psmdb-db-rs0"}
2023-09-20T14:18:07.050Z ERROR Reconciler error {"controller": "psmdb-controller", "object": {"name":"psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-db", "reconcileID": "3b5be707-93b3-4aea-922d-6a8dc71e67ab", "error": "reconcile StatefulSet for rs0: failed to run smartUpdate: failed to check active jobs: getting PBM object: create PBM connection to psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local:27017: create mongo connection: mongo ping: server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local:27017, Type: RSGhost, Average RTT: 768844 }, ] }", "errorVerbose": "reconcile StatefulSet for rs0: failed to run smartUpdate: failed to check active jobs: getting PBM object: create PBM connection to psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local:27017: create mongo connection: mongo ping: server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: psmdb-db-rs0-0.psmdb-db-rs0.mongodb.svc.cluster.local:27017, Type: RSGhost, Average RTT: 768844 }, ] }\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:412\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235
2023-09-20T14:18:07.132Z INFO StatefulSet is changed, starting smart update {"controller": "psmdb-controller", "object": {"name":"psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-db", "reconcileID": "2fee95ad-48f3-4665-8e8b-e3d2e3396d52", "name": "psmdb-db-rs0"}

Expected Result:

kubectl get all -n mongodb

NAME                                 READY   STATUS    RESTARTS        AGE
pod/psmdb-db-rs0-0                   1/1     Running   6 (2m49s ago)   21m
pod/psmdb-operator-869b9b99d-fm66v   1/1     Running   0               21m

NAME                     TYPE           CLUSTER-IP      EXTERNAL-IP                                                                    PORT(S)           AGE
service/psmdb-db-rs0     ClusterIP      None            <none>                                                                         27017/TCP         61m
service/psmdb-db-rs0-0   LoadBalancer   172.20.23.165   k8s-mongodb-psmdbdbr-03fa115926-cb82804e1fb78c41.elb.eu-west-2.amazonaws.com   27017:30564/TCP   61m

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/psmdb-operator   1/1     1            1           62m

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/psmdb-operator-869b9b99d   1         1         1       62m

NAME                            READY   AGE
statefulset.apps/psmdb-db-rs0   1/1     61m

Expect to be able to connect into mongodb instance

Actual Result:

Additional Information:

kubectl get all -n mongodb

NAME                                 READY   STATUS    RESTARTS        AGE
pod/psmdb-db-rs0-0                   1/1     Running   6 (2m49s ago)   21m
pod/psmdb-operator-869b9b99d-fm66v   1/1     Running   0               21m

NAME                     TYPE           CLUSTER-IP      EXTERNAL-IP                                                                    PORT(S)           AGE
service/psmdb-db-rs0     ClusterIP      None            <none>                                                                         27017/TCP         61m
service/psmdb-db-rs0-0   LoadBalancer   172.20.23.165   k8s-mongodb-psmdbdbr-03fa115926-<redacted>.elb.eu-west-2.amazonaws.com   27017:30564/TCP   61m

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/psmdb-operator   1/1     1            1           62m

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/psmdb-operator-869b9b99d   1         1         1       62m

NAME                            READY   AGE
statefulset.apps/psmdb-db-rs0   1/1     61m

Uninstalling and reinstalling mongodb server still throws the same error, but i noticed this in the operator logs.

2023-09-20T14:58:22.415Z        INFO    initiating replset      {"controller": "psmdb-controller", "object": {"name":"psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-db", "reconcileID": "dda62130-0f5f-4a90-99de-f5ead14b119d", "replset": "rs0", "pod": "psmdb-db-rs0-0"}
2023-09-20T14:58:31.553Z        ERROR   failed to reconcile cluster     {"controller": "psmdb-controller", "object": {"name":"psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-db", "reconcileID": "dda62130-0f5f-4a90-99de-f5ead14b119d", "replset": "rs0", "error": "handleReplsetInit: exec add admin user: command terminated with exit code 1 / Current Mongosh Log ID:\t650b089716c0f0dcef831f23\nConnecting to:\t\tmongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.6.2\nUsing MongoDB:\t\t6.0.4-3\nUsing Mongosh:\t\t1.6.2\n\nFor mongosh info see: https://docs.mongodb.com/mongodb-shell/\n\n / MongoServerError: command createUser requires authentication\n", "errorVerbose": "exec add admin user: command terminated with exit code 1 / Current Mongosh Log ID:\t650b089716c0f0dcef831f23\nConnecting to:\t\tmongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.6.2\nUsing MongoDB:\t\t6.0.4-3\nUsing Mongosh:\t\t1.6.2\n\nFor mongosh info see: https://docs.mongodb.com/mongodb-shell/\n\n / MongoServerError: command createUser requires authentication\n\nhandleReplsetInit\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:99\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:487\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"}
github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile
        /go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:489
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:122
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:323
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235

@Kay_Khan I would suspect it has smth to do with certificates and the way replica set is configured.

Could you please share your values.yaml? I’m mostly interested in how clusterServiceDNSMode is configured. Or you can just share your CR manifest: kubectl get psmdb <CLUSTER> -o yaml

@Sergey_Pronin Please see below the entire values.yaml file aswell as the CR manifest. Just FYI ive left the mongodb intance running for 1 week now as i had to move onto something else. However if there are anymore tests you need to be perform and information to provide please let me know as this problem will be blocking me very soon.

values.yaml

# Default values for psmdb-cluster.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# Platform type: kubernetes, openshift
# platform: kubernetes

# Cluster DNS Suffix
# clusterServiceDNSSuffix: svc.cluster.local
clusterServiceDNSMode: "External"

finalizers:
## Set this if you want that operator deletes the primary pod last
  - delete-psmdb-pods-in-order
## Set this if you want to delete database persistent volumes on cluster deletion
#  - delete-psmdb-pvc

nameOverride: ""
fullnameOverride: ""

crVersion: 1.14.0
pause: false
unmanaged: false
allowUnsafeConfigurations: true
# ignoreAnnotations:
#   - service.beta.kubernetes.io/aws-load-balancer-backend-protocol
# ignoreLabels:
#   - rack
multiCluster:
  enabled: false
  # DNSSuffix: svc.clusterset.local
updateStrategy: SmartUpdate
upgradeOptions:
  versionServiceEndpoint: https://check.percona.com
  apply: disabled
  schedule: "0 2 * * *"
  setFCV: false

image:
  repository: percona/percona-server-mongodb
  tag: 6.0.8-6

imagePullPolicy: Always
# imagePullSecrets: []
# initImage:
#   repository: percona/percona-server-mongodb-operator
#   tag: 1.14.0
# initContainerSecurityContext: {}
# tls:
#   # 90 days in hours
#   certValidityDuration: 2160h
secrets: {}
  # If you set users secret here the operator will use existing one or generate random values
  # If not set the operator generates the default secret with name <cluster_name>-secrets
  # users: my-cluster-name-secrets
  # encryptionKey: my-cluster-name-mongodb-encryption-key

pmm:
  enabled: false
  image:
    repository: percona/pmm-client
    tag: 2.35.0
  serverHost: monitoring-service

replsets:
  - name: rs0
    size: 1
    # externalNodes:
    # - host: 34.124.76.90
    # - host: 34.124.76.91
    #   port: 27017
    #   votes: 0
    #   priority: 0
    # - host: 34.124.76.92
    # configuration: |
    #   operationProfiling:
    #     mode: slowOp
    #   systemLog:
    #     verbosity: 1
    antiAffinityTopologyKey: "kubernetes.io/hostname"
    # tolerations: []
    # priorityClass: ""
    # annotations: {}
    # labels: {}
    nodeSelector:
      acme/node-type: "ops"
    # livenessProbe:
    #   failureThreshold: 4
    #   initialDelaySeconds: 60
    #   periodSeconds: 30
    #   timeoutSeconds: 10
    #   startupDelaySeconds: 7200
    # readinessProbe:
    #   failureThreshold: 8
    #   initialDelaySeconds: 10
    #   periodSeconds: 3
    #   successThreshold: 1
    #   timeoutSeconds: 2
    # runtimeClassName: image-rc
    # storage:
    #   engine: wiredTiger
    #   wiredTiger:
    #     engineConfig:
    #       cacheSizeRatio: 0.5
    #       directoryForIndexes: false
    #       journalCompressor: snappy
    #     collectionConfig:
    #       blockCompressor: snappy
    #     indexConfig:
    #       prefixCompression: true
    #   inMemory:
    #     engineConfig:
    #        inMemorySizeRatio: 0.5
    sidecars:
    - image: percona/mongodb_exporter:0.36
      env:
      - name: EXPORTER_USER
        valueFrom:
          secretKeyRef:
            name: psmdb-db-secrets
            key: MONGODB_CLUSTER_MONITOR_USER
      - name: EXPORTER_PASS
        valueFrom:
          secretKeyRef:
            name: psmdb-db-secrets
            key: MONGODB_CLUSTER_MONITOR_PASSWORD
      - name: POD_IP
        valueFrom:
          fieldRef:
            fieldPath: status.podIP
      - name: MONGODB_URI
        value: "mongodb://$(EXPORTER_USER):$(EXPORTER_PASS)@$(POD_IP):27017"
      args: ["--discovering-mode", "--compatible-mode", "--collect-all", "--mongodb.uri=$(MONGODB_URI)"]
      name: metrics
    #   volumeMounts:
    #     - mountPath: /volume1
    #       name: sidecar-volume-claim
    #     - mountPath: /secret
    #       name: sidecar-secret
    #     - mountPath: /configmap
    #       name: sidecar-config
    # sidecarVolumes:
    # - name: sidecar-secret
    #   secret:
    #     secretName: mysecret
    # - name: sidecar-config
    #   configMap:
    #     name: myconfigmap
    # sidecarPVCs:
    # - apiVersion: v1
    #   kind: PersistentVolumeClaim
    #   metadata:
    #     name: sidecar-volume-claim
    #   spec:
    #     resources:
    #       requests:
    #         storage: 1Gi
    #     volumeMode: Filesystem
    #     accessModes:
    #       - ReadWriteOnce
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: true
      exposeType: LoadBalancer
      # loadBalancerSourceRanges:
      #   - 10.0.0.0/8
      serviceAnnotations:
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
          # service.beta.kubernetes.io/aws-load-balancer-security-groups: sg-0beb09a596e969cfe
          # service.beta.kubernetes.io/aws-load-balancer-manage-backend-security-group-rules: "true"
        # service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
      # serviceLabels: 
      #   some-label: some-key
    nonvoting:
      enabled: false
      # podSecurityContext: {}
      # containerSecurityContext: {}
      size: 3
      # configuration: |
      #   operationProfiling:
      #     mode: slowOp
      #   systemLog:
      #     verbosity: 1
      antiAffinityTopologyKey: "kubernetes.io/hostname"
      # tolerations: []
      # priorityClass: ""
      # annotations: {}
      # labels: {}
      # nodeSelector: {}
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "300m"
          memory: "0.5G"
        requests:
          cpu: "300m"
          memory: "0.5G"
      volumeSpec:
        # emptyDir: {}
        # hostPath:
        #   path: /data
        pvc:
          # annotations:
          #   volume.beta.kubernetes.io/storage-class: example-hostpath
          # labels:
          #   rack: rack-22
          # storageClassName: standard
          # accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 3Gi
    arbiter:
      enabled: false
      size: 1
      antiAffinityTopologyKey: "kubernetes.io/hostname"
      # tolerations: []
      # priorityClass: ""
      # annotations: {}
      # labels: {}
      # nodeSelector: {}
    # schedulerName: ""
    # resources:
    #   limits:
    #     cpu: "300m"
    #     memory: "0.5G"
    #   requests:
    #     cpu: "300m"
    #     memory: "0.5G"
    volumeSpec:
      # emptyDir: {}
      # hostPath:
      #   path: /data
      pvc:
        # annotations:
        #   volume.beta.kubernetes.io/storage-class: example-hostpath
        # labels:
        #   rack: rack-22
        storageClassName: mongodb
        # accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 250Gi

sharding:
  enabled: false 

backup:
  enabled: true
  image:
    repository: percona/percona-backup-mongodb
    tag: 2.0.5
  serviceAccountName: percona-server-mongodb-operator
  #  annotations:
  #  iam.amazonaws.com/role: arn:aws:iam::700849607999:role/acme-test-default-eks-mongodb
  # resources:
  #   limits:
  #     cpu: "300m"
  #     memory: "0.5G"
  #   requests:
  #     cpu: "300m"
  #     memory: "0.5G"
  storages:
    s3-us-east:
      type: s3
      s3:
        bucket: acme-prod-mongodb-backup
        credentialsSecret: prod-aws-mongodb
        region: us-east-2
        prefix: ""
        uploadPartSize: 10485760
        maxUploadParts: 10000
        storageClass: STANDARD
        insecureSkipTLSVerify: false
    # minio:
    #   type: s3
    #   s3:
    #     bucket: MINIO-BACKUP-BUCKET-NAME-HERE
    #     region: us-east-1
    #     credentialsSecret: my-cluster-name-backup-minio
    #     endpointUrl: http://minio.psmdb.svc.cluster.local:9000/minio/
    #     prefix: ""
    #   azure-blob:
    #     type: azure
    #     azure:
    #       container: CONTAINER-NAME
    #       prefix: PREFIX-NAME
    #       credentialsSecret: SECRET-NAME
  pitr:
    enabled: false
    # oplogSpanMin: 10
    # compressionType: gzip
    # compressionLevel: 6
  tasks:
   - name: "daily-s3-backup"
     enabled: true
     schedule: "0 1 * * *"
     keep: 3
     type: logical
     storageName: s3-us-east

  # - name: daily-s3-us-west
  #   enabled: true
  #   schedule: "0 0 * * *"
  #   keep: 3
  #   storageName: s3-us-west
  #   compressionType: gzip
  # - name: weekly-s3-us-west
  #   enabled: false
  #   schedule: "0 0 * * 0"
  #   keep: 5
  #   storageName: s3-us-west
  #   compressionType: gzip
  # - name: weekly-s3-us-west-physical
  #   enabled: false
  #   schedule: "0 5 * * 0"
  #   keep: 5
  #   type: physical
  #   storageName: s3-us-west
  #   compressionType: gzip
  #   compressionLevel: 6

# If you set users here the secret will be constructed by helm with these values
# users:
#   MONGODB_BACKUP_USER: backup
#   MONGODB_BACKUP_PASSWORD: backup123456
#   MONGODB_DATABASE_ADMIN_USER: databaseAdmin
#   MONGODB_DATABASE_ADMIN_PASSWORD: databaseAdmin123456
#   MONGODB_CLUSTER_ADMIN_USER: clusterAdmin
#   MONGODB_CLUSTER_ADMIN_PASSWORD: clusterAdmin123456
#   MONGODB_CLUSTER_MONITOR_USER: clusterMonitor
#   MONGODB_CLUSTER_MONITOR_PASSWORD: clusterMonitor123456
#   MONGODB_USER_ADMIN_USER: userAdmin
#   MONGODB_USER_ADMIN_PASSWORD: userAdmin123456
#   PMM_SERVER_API_KEY: apikey
#   # PMM_SERVER_USER: admin
#   # PMM_SERVER_PASSWORD: admin

kubectl get psmdb psmdb-db -o yaml -n mongodb

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"psmdb.percona.com/v1","kind":"PerconaServerMongoDB"}
    meta.helm.sh/release-name: psmdb-db
    meta.helm.sh/release-namespace: mongodb
  creationTimestamp: "2023-09-25T10:22:17Z"
  finalizers:
  - delete-psmdb-pods-in-order
  generation: 1
  labels:
    app.kubernetes.io/instance: psmdb-db
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: psmdb-db
    app.kubernetes.io/version: 1.14.0
    helm.sh/chart: psmdb-db-1.14.4
  name: psmdb-db
  namespace: mongodb
  resourceVersion: "5398779"
  uid: eaa25b03-4842-4001-b64b-c528510bd5ff
spec:
  allowUnsafeConfigurations: true
  backup:
    enabled: false
    image: percona/percona-backup-mongodb:2.0.5
    pitr:
      enabled: false
    serviceAccountName: percona-server-mongodb-operator
    storages:
      s3-us-east:
        s3:
          bucket: acme-prod-mongodb-backup
          credentialsSecret: prod-aws-mongodb
          insecureSkipTLSVerify: false
          maxUploadParts: 10000
          prefix: ""
          region: us-east-2
          storageClass: STANDARD
          uploadPartSize: 10485760
        type: s3
    tasks:
    - enabled: false
      keep: 3
      name: daily-s3-backup
      schedule: 0 1 * * *
      storageName: s3-us-east
      type: logical
  clusterServiceDNSMode: External
  crVersion: 1.14.0
  image: percona/percona-server-mongodb:6.0.4-3
  imagePullPolicy: Always
  multiCluster:
    enabled: false
  pause: false
  pmm:
    enabled: false
    image: percona/pmm-client:2.35.0
    serverHost: monitoring-service
  replsets:
  - arbiter:
      enabled: false
      size: 1
    expose:
      enabled: true
      exposeType: LoadBalancer
      serviceAnnotations:
        service.beta.kubernetes.io/aws-load-balancer-scheme: internal
    name: rs0
    nodeSelector:
      acme/node-type: ops
    nonvoting:
      enabled: false
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: 300m
          memory: 0.5G
        requests:
          cpu: 300m
          memory: 0.5G
      size: 3
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 3Gi
    podDisruptionBudget:
      maxUnavailable: 1
    sidecars:
    - args:
      - --discovering-mode
      - --compatible-mode
      - --collect-all
      - --mongodb.uri=$(MONGODB_URI)
      env:
      - name: EXPORTER_USER
        valueFrom:
          secretKeyRef:
            key: MONGODB_CLUSTER_MONITOR_USER
            name: psmdb-db-secrets
      - name: EXPORTER_PASS
        valueFrom:
          secretKeyRef:
            key: MONGODB_CLUSTER_MONITOR_PASSWORD
            name: psmdb-db-secrets
      - name: POD_IP
        valueFrom:
          fieldRef:
            fieldPath: status.podIP
      - name: MONGODB_URI
        value: mongodb://$(EXPORTER_USER):$(EXPORTER_PASS)@$(POD_IP):27017
      image: percona/mongodb_exporter:0.36
      name: metrics
    size: 1
    volumeSpec:
      persistentVolumeClaim:
        resources:
          requests:
            storage: 250Gi
        storageClassName: mongodb
  secrets:
    users: psmdb-db-secrets
  sharding:
    configsvrReplSet:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      expose:
        enabled: false
        exposeType: ClusterIP
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: 300m
          memory: 0.5G
        requests:
          cpu: 300m
          memory: 0.5G
      size: 3
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 3Gi
    enabled: false
    mongos:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      expose:
        exposeType: ClusterIP
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: 300m
          memory: 0.5G
        requests:
          cpu: 300m
          memory: 0.5G
      size: 2
  unmanaged: false
  updateStrategy: SmartUpdate
  upgradeOptions:
    apply: disabled
    schedule: 0 2 * * *
    setFCV: false
    versionServiceEndpoint: https://check.percona.com
status:
  conditions:
  - lastTransitionTime: "2023-10-02T08:27:46Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2023-10-02T08:27:53Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2023-10-02T08:28:57Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2023-10-02T08:28:59Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2023-10-02T08:29:40Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2023-10-02T08:29:42Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2023-10-02T08:29:53Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2023-10-02T08:30:02Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2023-10-02T08:30:16Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2023-10-02T08:30:24Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2023-10-02T08:31:32Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2023-10-02T08:31:35Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2023-10-02T08:31:46Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2023-10-02T08:31:59Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2023-10-02T08:32:27Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2023-10-02T08:32:34Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2023-10-02T08:32:46Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2023-10-02T08:32:48Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2023-10-02T08:33:15Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2023-10-02T08:33:17Z"
    status: "True"
    type: ready
  host: k8s-mongodb-psmdbdbr-9ca3fb6f1f-<redacted>.elb.eu-west-2.amazonaws.com:27017
  mongoImage: percona/percona-server-mongodb:6.0.4-3
  mongoVersion: 6.0.4-3
  observedGeneration: 1
  ready: 1
  replsets:
    rs0:
      initialized: true
      ready: 1
      size: 1
      status: ready
  size: 1
  state: ready

Hi,

I remember you mentioned something about certificates, we saw this when attempting to provision a new instance on our production cluster. Any help is appreciated, we are not sure what to do

Liveness probe failed: command "/opt/percona/mongodb-healthcheck k8s liveness --ssl --sslInsecure --sslCAFile /etc/mongodb-ssl/ca.crt --sslPEMKeyFile /tmp/tls.pem --startupDelaySeconds 7200" timed out

Also note we do have cert manager installed

helm ls -n cert-manager

cert-manager    cert-manager    1               2023-05-03 15:07:42.725175923 +0100 BST deployed        cert-manager-v1.11.1    v1.11.1

And we’ve have a mongodb instance running for a few months now which is working fine. But this new one we deployed is failing psmdb-db-internal

helm ls -n mongodb

psmdb-db - Original
psmdb-db-internal - New one which is failing

psmdb-db                mongodb         2               2023-08-03 09:31:58.57105479 +0100 BST  deployed        psmdb-db-1.14.3         1.14.0
psmdb-db-internal       mongodb         2               2023-10-04 15:58:00.373192349 +0100 BST deployed        psmdb-db-1.14.4         1.14.0
psmdb-operator          mongodb         1               2023-05-06 10:35:46.776038271 +0100 BST deployed        psmdb-operator-1.14.2   1.14.0

I found the the issue.

My original instance was externally facing and i had to use clusterServiceDNSMode: "External" to get that to work. I had forgotten about that change, so when i wanted to create a new instance which was internal by default. I had to remember to set clusterServiceDNSMode back to default Internal

thanks for pointing that out.