Psmdb stuck in "Initializing"

The psmdb resource is stuck in initializing.

I tried pausing and unpasusing the cluster, the operator gives following error and goes into initializing again.

2023-01-12T10:22:32.954Z	ERROR	controller_psmdb	failed to reconcile cluster	{"Request.Namespace": "xcloud-psmdb", "Request.Name": "xcloud-psmdb-db", "replset": "rs", "error": "create system users: failed to get mongo client: ping mongo: connection() error occured during connection handshake: dial tcp: lookup xcloud-psmdb-db-rs-1.xcloud-psmdb-db-rs.xcloud-psmdb.svc.cluster.local on 10.0.0.10:53: no such host", "errorVerbose": "connection() error occured during connection handshake: dial tcp: lookup xcloud-psmdb-db-rs-1.xcloud-psmdb-db-rs.xcloud-psmdb.svc.cluster.local on 10.0.0.10:53: no such host\nping mongo\ngithub.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo.Dial\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo/mongo.go:64\ngithub.com/percona/percona-server-mongodb-operator/pkg/psmdb.MongoClient\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/psmdb/client.go:47\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).mongoClientWithRole\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/connections.go:21\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).createOrUpdateSystemUsers\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:671\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:134\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:499\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\nfailed to get mongo client\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).createOrUpdateSystemUsers\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:673\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:134\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:499\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\ncreate system users\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:136\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:499\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
1 Like

Hey @Shreyas_Pandya ,

please share more details about your deployment.

k8s version, operator version, custom resource manifest, what led to this situation, etc.

1 Like

Hi @Sergey_Pronin,

We are facing this issue in multiple deployments of psmdb. The CRD runs fine for a few days and then the status changes to initializing suddenly. unfortunately we can’t provide any operator logs on this since this issue happened long ago.

k8s version: v1.22.11

these are our helm charts

allowUnsafeConfigurations: false
backup:
  enabled: true
  image:
    repository: percona/percona-backup-mongodb
    tag: 1.8.1
  pitr:
    compressionLevel: 6
    compressionType: gzip
    enabled: true
    oplogSpanMin: 4
  serviceAccountName: percona-server-mongodb-operator
  storages:
    azure-blob:
      azure:
        container: percona
        credentialsSecret: percona-backup-sa-creds
        prefix: scheduled
      type: azure
  tasks:
  - compressionType: gzip
    enabled: true
    keep: 30
    name: daily-backup
    schedule: 30 0 * * *
    storageName: azure-blob
  - compressionType: gzip
    enabled: true
    keep: 13
    name: weekly-backup
    schedule: 30 0 * * 0
    storageName: azure-blob
finalizers:
- delete-psmdb-pods-in-order
image:
  repository: percona/percona-server-mongodb
  tag: 4.4.16-16
pause: false
replsets:
- antiAffinityTopologyKey: kubernetes.io/hostname
  name: rs
  nodeSelector:
    app: xcmongo
  resources:
    limits:
      cpu: 4000m
      memory: 16G
    requests:
      cpu: 2000m
      memory: 12G
  size: 3
  volumeSpec:
    pvc:
      resources:
        requests:
          storage: 30Gi
secrets:
  users: redacted-psmdb-db-secrets
sharding:
  enabled: false
updateStrategy: SmartUpdate
upgradeOptions:
  apply: disabled
  schedule: 0 2 * * *
  setFCV: false
  versionServiceEndpoint: https://check.percona.com
users:
  MONGODB_BACKUP_PASSWORD: redacted
  MONGODB_BACKUP_USER: redacted
  MONGODB_CLUSTER_ADMIN_PASSWORD: redacted
  MONGODB_CLUSTER_ADMIN_USER: redacted
  MONGODB_CLUSTER_MONITOR_PASSWORD: redacted
  MONGODB_CLUSTER_MONITOR_USER: redacted
  MONGODB_DATABASE_ADMIN_PASSWORD: redacted
  MONGODB_DATABASE_ADMIN_USER: redacted
  MONGODB_USER_ADMIN_PASSWORD: redacted
  MONGODB_USER_ADMIN_USER: redacted
  PMM_SERVER_API_KEY: redacted

kubectl describe on psmdb crd:

Name:         redacted-psmdb-db
Namespace:    redacted-psmdb
Labels:       app.kubernetes.io/instance=redacted
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=psmdb-db
              app.kubernetes.io/version=1.13.0
              helm.sh/chart=psmdb-db-1.13.0
Annotations:  meta.helm.sh/release-name: redacted
              meta.helm.sh/release-namespace: redacted-psmdb
API Version:  psmdb.percona.com/v1
Kind:         PerconaServerMongoDB
Metadata:
  Creation Timestamp:  2023-07-12T09:20:10Z
  Finalizers:
    delete-psmdb-pods-in-order
  Generation:  1
  Managed Fields:
    API Version:  psmdb.percona.com/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
          f:meta.helm.sh/release-name:
          f:meta.helm.sh/release-namespace:
        f:finalizers:
          .:
          v:"delete-psmdb-pods-in-order":
        f:labels:
          .:
          f:app.kubernetes.io/instance:
          f:app.kubernetes.io/managed-by:
          f:app.kubernetes.io/name:
          f:app.kubernetes.io/version:
          f:helm.sh/chart:
      f:spec:
        .:
        f:backup:
          .:
          f:enabled:
          f:image:
          f:pitr:
            .:
            f:compressionLevel:
            f:compressionType:
            f:enabled:
            f:oplogSpanMin:
          f:serviceAccountName:
          f:storages:
            .:
            f:azure-blob:
              .:
              f:azure:
                .:
                f:container:
                f:credentialsSecret:
                f:prefix:
              f:type:
          f:tasks:
        f:crVersion:
        f:image:
        f:imagePullPolicy:
        f:multiCluster:
          .:
          f:enabled:
        f:pause:
        f:pmm:
          .:
          f:enabled:
          f:image:
          f:serverHost:
        f:replsets:
        f:secrets:
          .:
          f:users:
        f:sharding:
          .:
          f:configsvrReplSet:
            .:
            f:affinity:
              .:
              f:antiAffinityTopologyKey:
            f:expose:
              .:
              f:enabled:
              f:exposeType:
            f:podDisruptionBudget:
              .:
              f:maxUnavailable:
            f:resources:
              .:
              f:limits:
                .:
                f:cpu:
                f:memory:
              f:requests:
                .:
                f:cpu:
                f:memory:
            f:size:
            f:volumeSpec:
              .:
              f:persistentVolumeClaim:
                .:
                f:resources:
                  .:
                  f:requests:
                    .:
                    f:storage:
          f:enabled:
          f:mongos:
            .:
            f:affinity:
              .:
              f:antiAffinityTopologyKey:
            f:expose:
              .:
              f:exposeType:
            f:podDisruptionBudget:
              .:
              f:maxUnavailable:
            f:resources:
              .:
              f:limits:
                .:
                f:cpu:
                f:memory:
              f:requests:
                .:
                f:cpu:
                f:memory:
            f:size:
        f:unmanaged:
        f:updateStrategy:
        f:upgradeOptions:
          .:
          f:apply:
          f:schedule:
          f:setFCV:
          f:versionServiceEndpoint:
    Manager:      terraform-provider-helm_v2.8.0_x5
    Operation:    Update
    Time:         2023-07-12T09:20:10Z
    API Version:  psmdb.percona.com/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:conditions:
        f:host:
        f:mongoImage:
        f:mongoVersion:
        f:observedGeneration:
        f:ready:
        f:replsets:
          .:
          f:rs:
            .:
            f:initialized:
            f:ready:
            f:size:
            f:status:
        f:size:
        f:state:
    Manager:         percona-server-mongodb-operator
    Operation:       Update
    Subresource:     status
    Time:            2023-07-18T15:50:35Z
  Resource Version:  172633694
  UID:               3b8f5c92-e5f5-4b4d-b84a-e7e6bfea7224
Spec:
  Backup:
    Enabled:  true
    Image:    percona/percona-backup-mongodb:1.8.1
    Pitr:
      Compression Level:   6
      Compression Type:    gzip
      Enabled:             true
      Oplog Span Min:      4
    Service Account Name:  percona-server-mongodb-operator
    Storages:
      Azure - Blob:
        Azure:
          Container:           percona
          Credentials Secret:  percona-backup-sa-creds
          Prefix:              scheduled
        Type:                  azure
    Tasks:
      Compression Type:  gzip
      Enabled:           true
      Keep:              30
      Name:              daily-backup
      Schedule:          30 0 * * *
      Storage Name:      azure-blob
      Compression Type:  gzip
      Enabled:           true
      Keep:              13
      Name:              weekly-backup
      Schedule:          30 0 * * 0
      Storage Name:      azure-blob
  Cr Version:            1.13.0
  Image:                 percona/percona-server-mongodb:4.4.16-16
  Image Pull Policy:     Always
  Multi Cluster:
    Enabled:  false
  Pause:      false
  Pmm:
    Enabled:      false
    Image:        percona/pmm-client:2.30.0
    Server Host:  monitoring-service
  Replsets:
    Affinity:
      Anti Affinity Topology Key:  kubernetes.io/hostname
    Name:                          rs
    Node Selector:
      App:  xcmongo
    Resources:
      Limits:
        Cpu:     4000m
        Memory:  16G
      Requests:
        Cpu:     2000m
        Memory:  12G
    Size:        3
    Volume Spec:
      Persistent Volume Claim:
        Resources:
          Requests:
            Storage:  30Gi
  Secrets:
    Users:  redacted-psmdb-db-secrets
  Sharding:
    Configsvr Repl Set:
      Affinity:
        Anti Affinity Topology Key:  kubernetes.io/hostname
      Expose:
        Enabled:      false
        Expose Type:  ClusterIP
      Pod Disruption Budget:
        Max Unavailable:  1
      Resources:
        Limits:
          Cpu:     300m
          Memory:  0.5G
        Requests:
          Cpu:     300m
          Memory:  0.5G
      Size:        3
      Volume Spec:
        Persistent Volume Claim:
          Resources:
            Requests:
              Storage:  3Gi
    Enabled:            false
    Mongos:
      Affinity:
        Anti Affinity Topology Key:  kubernetes.io/hostname
      Expose:
        Expose Type:  ClusterIP
      Pod Disruption Budget:
        Max Unavailable:  1
      Resources:
        Limits:
          Cpu:     300m
          Memory:  0.5G
        Requests:
          Cpu:      300m
          Memory:   0.5G
      Size:         2
  Unmanaged:        false
  Update Strategy:  SmartUpdate
  Upgrade Options:
    Apply:                     disabled
    Schedule:                  0 2 * * *
    Set FCV:                   false
    Version Service Endpoint:  https://check.percona.com
Status:
  Conditions:
    Last Transition Time:  2023-07-27T18:51:16Z
    Message:               rs: ready
    Reason:                RSReady
    Status:                True
    Type:                  ready
    Last Transition Time:  2023-07-27T18:51:16Z
    Status:                True
    Type:                  initializing
    Last Transition Time:  2023-07-28T09:50:44Z
    Message:               rs: ready
    Reason:                RSReady
    Status:                True
    Type:                  ready
    Last Transition Time:  2023-07-28T09:50:44Z
    Status:                True
    Type:                  initializing
    Last Transition Time:  2023-07-28T21:53:07Z
    Message:               rs: ready
    Reason:                RSReady
    Status:                True
    Type:                  ready
    Last Transition Time:  2023-07-28T21:53:07Z
    Status:                True
    Type:                  initializing
    Last Transition Time:  2023-07-29T12:53:26Z
    Message:               rs: ready
    Reason:                RSReady
    Status:                True
    Type:                  ready
    Last Transition Time:  2023-07-29T12:53:26Z
    Status:                True
    Type:                  initializing
    Last Transition Time:  2023-07-30T09:52:41Z
    Message:               rs: ready
    Reason:                RSReady
    Status:                True
    Type:                  ready
    Last Transition Time:  2023-07-30T09:52:41Z
    Status:                True
    Type:                  initializing
    Last Transition Time:  2023-07-30T21:53:07Z
    Message:               rs: ready
    Reason:                RSReady
    Status:                True
    Type:                  ready
    Last Transition Time:  2023-07-30T21:53:07Z
    Status:                True
    Type:                  initializing
    Last Transition Time:  2023-07-31T10:08:43Z
    Message:               rs: ready
    Reason:                RSReady
    Status:                True
    Type:                  ready
    Last Transition Time:  2023-07-31T10:08:43Z
    Status:                True
    Type:                  initializing
    Last Transition Time:  2023-08-01T00:52:06Z
    Message:               rs: ready
    Reason:                RSReady
    Status:                True
    Type:                  ready
    Last Transition Time:  2023-08-01T00:52:06Z
    Status:                True
    Type:                  initializing
    Last Transition Time:  2023-08-01T12:52:36Z
    Message:               rs: ready
    Reason:                RSReady
    Status:                True
    Type:                  ready
    Last Transition Time:  2023-08-01T12:52:36Z
    Status:                True
    Type:                  initializing
    Last Transition Time:  2023-08-02T03:52:26Z
    Message:               rs: ready
    Reason:                RSReady
    Status:                True
    Type:                  ready
    Last Transition Time:  2023-08-02T03:52:26Z
    Status:                True
    Type:                  initializing
  Host:                    redacted-psmdb-db-rs.redacted-psmdb.svc.cluster.local
  Mongo Image:             percona/percona-server-mongodb:4.4.16-16
  Mongo Version:           4.4.16-16
  Observed Generation:     1
  Ready:                   3
  Replsets:
    Rs:
      Initialized:  true
      Ready:        3
      Size:         3
      Status:       ready
  Size:             3
  State:            initializing
Events:             <none>

Initializing state indicates usually that some of the components are not ready. For example, if you restart even one node in a replica set - you get initializing state.

Is it going back to ready state after some time or it is stuck in initializing?

It is stuck in initializing state since 21 days even though the statefulset is in ready state for all three replicas.