PSMDB stuck initializing

Our mongo clusters failed to initialize and the operator if failing to authenticate. the psmdb is stuck in an initialization state.

Kubernetes 1.22
Percona Operator 1.13

Not sure what caused the problem but we did switch the users to an external secret and removed the user data from the psmdb resource (or the helm chart overrides)

psmdb resource:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"psmdb.percona.com/v1","kind":"PerconaServerMongoDB"}
  name: mongodb-psmdb-db
  labels:
    app.kubernetes.io/name: psmdb-db
    helm.sh/chart: psmdb-db-1.13.0
    app.kubernetes.io/instance: mongodb
    app.kubernetes.io/version: "1.13.0"
    app.kubernetes.io/managed-by: Helm
  finalizers:
    - delete-psmdb-pods-in-order
    - delete-psmdb-pvc
spec:
  crVersion: 1.13.0
  pause: false
  unmanaged: false
  image: "percona/percona-server-mongodb:5.0.11-10"
  imagePullPolicy: "IfNotPresent"
  multiCluster:
    enabled: false
  secrets:
    users: mongodb-psmdb-db-secrets
    encryptionKey: mongodb-psmdb-db-mongodb-encryption-key
  updateStrategy: SmartUpdate
  upgradeOptions:
    versionServiceEndpoint: https://check.percona.com
    apply: disabled
    schedule: 0 2 * * *
    setFCV: false
  pmm:
    enabled: false
    image: "percona/pmm-client:2.30.0"
    serverHost: monitoring-service
  replsets:
  - name: rs0
    size: 3
    configuration: |
      systemLog:
        verbosity: 0
      auditLog:
        destination: console
        filter: '{ "param.ns" : { $ne: "local.replset.oplogTruncateAfterPoint" }}'
    affinity:
      antiAffinityTopologyKey:
    nodeSelector:
      ffServerType: utility
      ffUse: databases
      ffUseType: worker
    podDisruptionBudget:
      maxUnavailable: 1
    expose:
      enabled: false
      exposeType: ClusterIP
    resources:
      limits:
        memory: 5G
      requests:
        memory: 2G
    volumeSpec:
      persistentVolumeClaim:
        resources:
          requests:
            storage: 5Gi
  sharding:
    enabled: false
    configsvrReplSet:
      size: 3
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      podDisruptionBudget:
        maxUnavailable: 1
      expose:
        enabled: false
        exposeType: ClusterIP
      resources:
        limits:
          cpu: 300m
          memory: 0.5G
        requests:
          cpu: 300m
          memory: 0.5G
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 3Gi
    mongos:
      size: 2
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: 300m
          memory: 0.5G
        requests:
          cpu: 300m
          memory: 0.5G
      expose:
        exposeType: ClusterIP
  backup:
    enabled: false
    image: "percona/percona-backup-mongodb:1.7.0"
    serviceAccountName: percona-server-mongodb-operator
    storages:
      s3-us-east:
        s3:
          bucket: ff-mongo-backup-recovery
          insecureSkipTLSVerify: false
          maxUploadParts: 10000
          prefix: dev
          region: us-east-1
          storageClass: STANDARD
          uploadPartSize: 10485760
        type: s3
    pitr:
      enabled: false
    tasks:
      - compressionType: gzip
        enabled: true
        keep: 3
        name: daily-s3-us-east
        schedule: 0 0 * * *
        storageName: s3-us-east
      - compressionType: gzip
        enabled: false
        keep: 5
        name: weekly-s3-us-east
        schedule: 0 0 * * 0
        storageName: s3-us-east

PSMDB:
mongodb-psmdb-db mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local initializing 16d

PSMDB Status:

 - lastTransitionTime: "2023-01-05T17:48:56Z"
      message: 'create pbm object: create PBM connection to mongodb-psmdb-db-rs0-0.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017,mongodb-psmdb-db-rs0-1.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017,mongodb-psmdb-db-rs0-2.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017:
        create mongo connection: mongo ping: connection() error occured during connection
        handshake: auth error: unable to authenticate using mechanism "SCRAM-SHA-256":
        (AuthenticationFailed) Authentication failed.'
      reason: ErrorReconcile
      status: "True"
      type: error
    - lastTransitionTime: "2023-01-05T17:48:56Z"
      status: "True"
      type: ready
    - lastTransitionTime: "2023-01-05T17:48:56Z"
      status: "True"
      type: initializing
    host: mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local
    mongoImage: percona/percona-server-mongodb:5.0.11-10
    mongoVersion: 5.0.11-10
    observedGeneration: 7
    ready: 3
    replsets:
      rs0:
        initialized: true
        ready: 3
        size: 3
        status: ready
    size: 3
    state: initializing
kind: List
metadata:
  resourceVersion: ""

Any idea how to recover the cluster? the pods are running fine but I can’t upgrade.

1 Like

Hello @Tim ,

could you please share again the steps you performed?

I guess you deployed the operator through a helm chart and then decided to move secrets from values.yaml to a separate file (wise).

But then smth went wrong. Is that right?
Did you change SSL secrets as well?

1 Like

I"m only guessing as a number of changes occurred but certainly we changed the secrets from the those created by the psmdb helm chart, to a similar set managed in an external secret. I changed the overrides to remove the users field in the overrides to “users: {}” and added:

secrets: {}
  users: my-external-mongodb-admin

I’ll try changing the secrets back to see if that helps.

1 Like

The operator can’t connect to the cluster. its failing with the error:

2023-01-09T14:41:09.972Z        ERROR   controller.psmdb-controller     Reconciler error        {"name": "mongodb-psmdb-db", "namespace": "flexibleflyer", "error": "reconcile StatefulSet for rs0: failed to run smartUpdate: failed to check active jobs: getting pbm object: create PBM connection to mongodb-psmdb-db-rs0-2.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017,mongodb-psmdb-db-rs0-0.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017,mongodb-psmdb-db-rs0-1.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017: create mongo connection: mongo ping: connection() error occured during connection handshake: auth error: unable to authenticate using mechanism \"SCRAM-SHA-256\": (AuthenticationFailed) Authentication failed.", "errorVerbose": "reconcile StatefulSet for rs0: failed to run smartUpdate: failed to check active jobs: getting pbm object: create PBM connection to mongodb-psmdb-db-rs0-2.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017,mongodb-psmdb-db-rs0-0.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017,mongodb-psmdb-db-rs0-1.mongodb-psmdb-db-rs0.flexibleflyer.svc.cluster.local:27017: create mongo connection: mongo ping: connection() error occured during connection handshake: auth error: unable to authenticate using mechanism \"SCRAM-SHA-256\": (AuthenticationFailed) Authentication failed.\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:415\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"}

Can someone tell me what account/secret the operator is using to connect? I can try to create or update the secret so that it can connect

1 Like

I think my problem started when I created my user secret. I didn’t create the values as “stringData” but used “data” instead and the operator found the secret, but could not access the values and therefore couldn’t authenticate.

I’m still not sure how to recover the this error, I recreated the problem but still have a cluster stuck in the initialization phase.

1 Like