Issue with upgrade

Hi,

While upgrading from 1.15.3 to 1.16.0, I’ve got an issue where the second replica isn’t connecting to the first one (I believe). I was manually upgrading.

This is the error I’m getting:

(combined from similar events): Readiness probe failed: 2024-05-27T15:39:51.387Z INFO Running Kubernetes readiness check for component {“component”: “mongod”} 2024-05-27T15:39:51.388Z ERROR Member failed Kubernetes readiness check {“error”: “dial: dial tcp [::1]:27017: connect: connection refused”, “errorVerbose”: “dial tcp [::1]:27017: connect: connection refused\ndial\ngithub.com/percona/percona-server-mongodb-operator/healthcheck.MongodReadinessCheck\n\t/go/src/github.com/percona/percona-server-mongodb-operator/healthcheck/readiness.go:32\nmain.main\n\t/go/src/github.com/percona/percona-server-mongodb-operator/cmd/mongodb-healthcheck/main.go:121\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598”} main.main /go/src/github.com/percona/percona-server-mongodb-operator/cmd/mongodb-healthcheck/main.go:123 runtime.main /usr/local/go/src/runtime/proc.go:250

Steps to Reproduce:

Version:

Lastest version

Logs:

Let me know what you might need. The mongod logs aren’t showing much AFAIK.

Expected Result:

Hoping to get a successful update going.

Actual Result:

Second replica fails to connect.

Any help to troubleshoot this would be greatly appreciated.

Scott

Hi @scott_molinari, could you please provide your CRs. As far as I understand, you have cross-site replication. Am I right?

Hi Slava,

No. No cross-site replication. Just a three node (pod) replica set.

By CRs, do you mean the PerconaServerMongoDB CR?

I’ve added it below (taken out of Lens):

I’m using Rancher with the Helm Chart to create the StatefulSet of Replica Set pods though.

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"psmdb.percona.com/v1","kind":"PerconaServerMongoDB"}
    meta.helm.sh/release-name: psmdb-db
    meta.helm.sh/release-namespace: mongodb
  creationTimestamp: '2023-02-27T08:30:12Z'
  finalizers:
    - delete-psmdb-pods-in-order
  generation: 21
  labels:
    app.kubernetes.io/instance: psmdb-db
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: psmdb-db
    app.kubernetes.io/version: 1.16.0
    helm.sh/chart: psmdb-db-1.16.0
  managedFields:
    - apiVersion: psmdb.percona.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:updateStrategy: {}
          f:upgradeOptions:
            f:apply: {}
            f:schedule: {}
      manager: node-fetch
      operation: Update
      time: '2023-09-14T14:21:16Z'
    - apiVersion: psmdb.percona.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:conditions: {}
          f:host: {}
          f:message: {}
          f:mongoImage: {}
          f:mongoVersion: {}
          f:observedGeneration: {}
          f:ready: {}
          f:replsets:
            .: {}
            f:rs0:
              .: {}
              f:initialized: {}
              f:ready: {}
              f:size: {}
              f:status: {}
          f:size: {}
          f:state: {}
      manager: percona-server-mongodb-operator
      operation: Update
      subresource: status
      time: '2024-05-27T13:32:46Z'
    - apiVersion: psmdb.percona.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
            f:meta.helm.sh/release-name: {}
            f:meta.helm.sh/release-namespace: {}
          f:finalizers:
            .: {}
            v:"delete-psmdb-pods-in-order": {}
          f:labels:
            .: {}
            f:app.kubernetes.io/instance: {}
            f:app.kubernetes.io/managed-by: {}
            f:app.kubernetes.io/name: {}
            f:app.kubernetes.io/version: {}
            f:helm.sh/chart: {}
        f:spec:
          .: {}
          f:backup:
            .: {}
            f:enabled: {}
            f:image: {}
            f:pitr:
              .: {}
              f:enabled: {}
          f:crVersion: {}
          f:image: {}
          f:imagePullPolicy: {}
          f:multiCluster:
            .: {}
            f:enabled: {}
          f:pause: {}
          f:pmm:
            .: {}
            f:enabled: {}
            f:image: {}
            f:serverHost: {}
          f:replsets: {}
          f:secrets:
            .: {}
            f:users: {}
          f:sharding:
            .: {}
            f:configsvrReplSet:
              .: {}
              f:affinity:
                .: {}
                f:antiAffinityTopologyKey: {}
              f:expose:
                .: {}
                f:enabled: {}
                f:exposeType: {}
              f:podDisruptionBudget:
                .: {}
                f:maxUnavailable: {}
              f:resources:
                .: {}
                f:limits:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
                f:requests:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
              f:size: {}
              f:volumeSpec:
                .: {}
                f:persistentVolumeClaim:
                  .: {}
                  f:resources:
                    .: {}
                    f:requests:
                      .: {}
                      f:storage: {}
            f:enabled: {}
            f:mongos:
              .: {}
              f:affinity:
                .: {}
                f:antiAffinityTopologyKey: {}
              f:expose:
                .: {}
                f:exposeType: {}
              f:podDisruptionBudget:
                .: {}
                f:maxUnavailable: {}
              f:resources:
                .: {}
                f:limits:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
                f:requests:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
              f:size: {}
          f:unmanaged: {}
          f:upgradeOptions:
            .: {}
            f:setFCV: {}
            f:versionServiceEndpoint: {}
      manager: helm
      operation: Update
      time: '2024-05-27T15:37:07Z'
  name: psmdb-db
  namespace: mongodb
  resourceVersion: '703827822'
  uid: 403523f2-7107-4a6d-adaf-30abb50012c9
  selfLink: /apis/psmdb.percona.com/v1/namespaces/mongodb/perconaservermongodbs/psmdb-db
status:
  conditions:
    - lastTransitionTime: '2024-02-22T08:30:21Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-03-31T21:47:20Z'
      message: 'rs0: ready'
      reason: RSReady
      status: 'True'
      type: ready
    - lastTransitionTime: '2024-03-31T21:47:20Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-03-31T22:10:08Z'
      message: 'rs0: ready'
      reason: RSReady
      status: 'True'
      type: ready
    - lastTransitionTime: '2024-03-31T22:10:08Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-04-24T03:54:36Z'
      message: 'rs0: ready'
      reason: RSReady
      status: 'True'
      type: ready
    - lastTransitionTime: '2024-04-24T03:54:36Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-04-24T11:52:55Z'
      message: 'rs0: ready'
      reason: RSReady
      status: 'True'
      type: ready
    - lastTransitionTime: '2024-04-24T11:52:55Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-05-06T08:33:13Z'
      message: 'rs0: ready'
      reason: RSReady
      status: 'True'
      type: ready
    - lastTransitionTime: '2024-05-06T08:33:13Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-05-25T09:57:34Z'
      message: 'rs0: ready'
      reason: RSReady
      status: 'True'
      type: ready
    - lastTransitionTime: '2024-05-25T09:57:34Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-05-27T07:35:17Z'
      message: >-
        reconcile StatefulSet for rs0: reconcile PVCs for psmdb-db-rs0: resize
        volumes if needed: requested storage (3Gi) is less than actual storage
        (10Gi)
      reason: ErrorReconcile
      status: 'True'
      type: error
    - lastTransitionTime: '2024-05-27T07:55:51Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-05-27T07:55:52Z'
      message: >-
        reconcile StatefulSet for rs0: get StatefulSet for replset rs0: failed
        to get ssl annotations: waiting for TLS secret
      reason: ErrorReconcile
      status: 'True'
      type: error
    - lastTransitionTime: '2024-05-27T07:55:54Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-05-27T08:38:23Z'
      message: >-
        wrong psmdb options: replset rs0 VolumeSpec: volume.resources.storage
        can't be empty
      reason: ErrorReconcile
      status: 'True'
      type: error
    - lastTransitionTime: '2024-05-27T08:40:26Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-05-27T13:32:46Z'
      message: >-
        reconcile StatefulSet for rs0: update StatefulSet psmdb-db-rs0:
        StatefulSet.apps "psmdb-db-rs0" is invalid: spec: Forbidden: updates to
        statefulset spec for fields other than 'replicas', 'template',
        'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and
        'minReadySeconds' are forbidden
      reason: ErrorReconcile
      status: 'True'
      type: error
  host: psmdb-db-rs0.mongodb.svc.cluster.local
  message: >-
    Error: reconcile StatefulSet for rs0: update StatefulSet psmdb-db-rs0:
    StatefulSet.apps "psmdb-db-rs0" is invalid: spec: Forbidden: updates to
    statefulset spec for fields other than 'replicas', 'template',
    'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and
    'minReadySeconds' are forbidden
  mongoImage: percona/percona-server-mongodb:6.0.9-7
  mongoVersion: 6.0.9-7
  observedGeneration: 18
  ready: 1
  replsets:
    rs0:
      initialized: true
      ready: 1
      size: 3
      status: initializing
  size: 3
  state: error
spec:
  backup:
    enabled: true
    image: percona/percona-backup-mongodb:2.4.1
    pitr:
      enabled: false
  crVersion: 1.16.0
  image: percona/percona-server-mongodb:6.0.15
  imagePullPolicy: Always
  multiCluster:
    enabled: false
  pause: false
  pmm:
    enabled: false
    image: percona/pmm-client:2.41.2
    serverHost: monitoring-service
  replsets:
    - affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      arbiter:
        affinity:
          antiAffinityTopologyKey: kubernetes.io/hostname
        enabled: false
        size: 1
      expose:
        enabled: false
        exposeType: ClusterIP
      name: rs0
      nonvoting:
        affinity:
          antiAffinityTopologyKey: kubernetes.io/hostname
        enabled: false
        podDisruptionBudget:
          maxUnavailable: 1
        resources:
          limits:
            cpu: 800m
            memory: 1.5G
          requests:
            cpu: 800m
            memory: 1.5G
        size: 3
        volumeSpec:
          persistentVolumeClaim:
            resources:
              requests:
                storage: 3Gi
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: 800m
          memory: 1.5G
        requests:
          cpu: 800m
          memory: 1.5G
      size: 3
      volumeSpec:
        persistentVolumeClaim:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi
          storageClassName: local-path
  secrets:
    users: psmdb-db-secrets
  sharding:
    configsvrReplSet:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      expose:
        enabled: false
        exposeType: ClusterIP
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: 300m
          memory: 0.5G
        requests:
          cpu: 300m
          memory: 0.5G
      size: 3
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 3Gi
    enabled: false
    mongos:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      expose:
        exposeType: ClusterIP
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: 300m
          memory: 0.5G
        requests:
          cpu: 300m
          memory: 0.5G
      size: 2
  unmanaged: false
  updateStrategy: OnDelete
  upgradeOptions:
    apply: 6.0.8-6
    schedule: 19 14 * * *
    setFCV: false
    versionServiceEndpoint: https://check.percona.com

Let me know if you need anything else.

Scott

@Slava_Sarzhan - Pinging, as I forgot to in my post above.

Scott

@scott_molinari how did you update your cluster? I see a lot of errors like:

    - lastTransitionTime: '2024-05-27T07:35:17Z'
      message: >-
        reconcile StatefulSet for rs0: reconcile PVCs for psmdb-db-rs0: resize
        volumes if needed: requested storage (3Gi) is less than actual storage
        (10Gi)

or

  message: >-
    Error: reconcile StatefulSet for rs0: update StatefulSet psmdb-db-rs0:
    StatefulSet.apps "psmdb-db-rs0" is invalid: spec: Forbidden: updates to
    statefulset spec for fields other than 'replicas', 'template',
    'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and
    'minReadySeconds' are forbidden
    - lastTransitionTime: '2024-05-27T07:55:52Z'
      message: >-
        reconcile StatefulSet for rs0: get StatefulSet for replset rs0: failed
        to get ssl annotations: waiting for TLS secret
      reason: ErrorReconcile
      status: 'True'
      type: error

I need to understand how to reproduce it. P.S. Please make sure that you do not change k8sobjects directly. You need to work only with CR.

Hi Slava @Slava_Sarzhan,

I was doing the manual upgrade process.

I upgraded the Helm charts. First the operator, then the replica set. Then I started to delete the pods, starting with the 02 pod. That seemed to work, I waited a little bit for the cluster to balance and then I deleted the 01 pod. That is when the errors as stated above started.

Then I started experimenting to try and fix the situation. So, the errors you are seeing might (also) be coming from my experimentation.

The good thing is, this is a POC system and nothing mission critical. But, I’d still really would appreciate any kind of support on fixing this, cause we do need to continue our work. :slight_smile:

Scott

@Slava_Sarzhan

So, I have gotten a step closer, I believe. Seems an orphaned pvc was the issue above. I’m now upgraded to 1.16.

However, I’m now stuck in an initialization loop in the operator.

The operator logs keeps showing this every 20 seconds or so:

	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:222
2024-06-02T10:13:08.485Z	INFO	initiating replset	{"controller": "psmdb-controller", "object": {"name":"psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-db", "reconcileID": "e768cca6-bf03-4ad2-897f-dd66bd7c81b0", "replset": "rs0", "pod": "psmdb-db-rs0-0"}
2024-06-02T10:13:17.677Z	ERROR	failed to reconcile cluster	{"controller": "psmdb-controller", "object": {"name":"psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-db", "reconcileID": "e768cca6-bf03-4ad2-897f-dd66bd7c81b0", "replset": "rs0", "error": "handleReplsetInit: exec add admin user: command terminated with exit code 137 /  / ", "errorVerbose": "exec add admin user: command terminated with exit code 137 /  / \nhandleReplsetInit\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:100\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileReplsets\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:551\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:402\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:222\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}
github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileReplsets
	/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:553
github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile
	/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:402
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:222
2024-06-02T10:13:17.936Z	ERROR	failed to send telemetry to https://check.percona.com	{"controller": "psmdb-controller", "object": {"name":"psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-db", "reconcileID": "40b2ee2a-7678-48ce-a05a-2dc304c0d3fe", "error": "[GET /versions/v1/{product}/{operatorVersion}/{apply}][500] VersionService_Apply default  &{Code:13 Details:[] Message:failed to parse version: Disabled}"}
github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).ensureVersion
	/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/version.go:356
github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile
	/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:362
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:222

And the pods all restart about every 3 minutes and 15 seconds.

I also seem to have that reconcile error with TLS. Not sure why.

This now the PerconaServerMongoDB CR:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"psmdb.percona.com/v1","kind":"PerconaServerMongoDB"}
    meta.helm.sh/release-name: psmdb-db
    meta.helm.sh/release-namespace: mongodb
  creationTimestamp: '2024-06-02T10:26:31Z'
  finalizers:
    - delete-psmdb-pods-in-order
  generation: 1
  labels:
    app.kubernetes.io/instance: psmdb-db
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: psmdb-db
    app.kubernetes.io/version: 1.16.0
    helm.sh/chart: psmdb-db-1.16.0
  managedFields:
    - apiVersion: psmdb.percona.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
            f:meta.helm.sh/release-name: {}
            f:meta.helm.sh/release-namespace: {}
          f:finalizers:
            .: {}
            v:"delete-psmdb-pods-in-order": {}
          f:labels:
            .: {}
            f:app.kubernetes.io/instance: {}
            f:app.kubernetes.io/managed-by: {}
            f:app.kubernetes.io/name: {}
            f:app.kubernetes.io/version: {}
            f:helm.sh/chart: {}
        f:spec:
          .: {}
          f:backup:
            .: {}
            f:enabled: {}
            f:image: {}
            f:pitr:
              .: {}
              f:enabled: {}
          f:crVersion: {}
          f:image: {}
          f:imagePullPolicy: {}
          f:multiCluster:
            .: {}
            f:enabled: {}
          f:pause: {}
          f:pmm:
            .: {}
            f:enabled: {}
            f:image: {}
            f:serverHost: {}
          f:replsets: {}
          f:secrets:
            .: {}
            f:users: {}
          f:sharding:
            .: {}
            f:balancer:
              .: {}
              f:enabled: {}
            f:configsvrReplSet:
              .: {}
              f:affinity:
                .: {}
                f:antiAffinityTopologyKey: {}
              f:expose:
                .: {}
                f:enabled: {}
                f:exposeType: {}
              f:podDisruptionBudget:
                .: {}
                f:maxUnavailable: {}
              f:resources:
                .: {}
                f:limits:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
                f:requests:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
              f:size: {}
              f:volumeSpec:
                .: {}
                f:persistentVolumeClaim:
                  .: {}
                  f:resources:
                    .: {}
                    f:requests:
                      .: {}
                      f:storage: {}
            f:enabled: {}
            f:mongos:
              .: {}
              f:affinity:
                .: {}
                f:antiAffinityTopologyKey: {}
              f:expose:
                .: {}
                f:exposeType: {}
              f:podDisruptionBudget:
                .: {}
                f:maxUnavailable: {}
              f:resources:
                .: {}
                f:limits:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
                f:requests:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
              f:size: {}
          f:unmanaged: {}
          f:updateStrategy: {}
          f:upgradeOptions:
            .: {}
            f:apply: {}
            f:schedule: {}
            f:setFCV: {}
            f:versionServiceEndpoint: {}
      manager: helm
      operation: Update
      time: '2024-06-02T10:26:31Z'
    - apiVersion: psmdb.percona.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:conditions: {}
          f:host: {}
          f:observedGeneration: {}
          f:ready: {}
          f:replsets:
            .: {}
            f:rs0:
              .: {}
              f:ready: {}
              f:size: {}
              f:status: {}
          f:size: {}
          f:state: {}
      manager: percona-server-mongodb-operator
      operation: Update
      subresource: status
      time: '2024-06-02T10:30:25Z'
  name: psmdb-db
  namespace: mongodb
  resourceVersion: '709924067'
  uid: cff9bcf2-14fc-4d8e-9f44-755f652bf871
  selfLink: /apis/psmdb.percona.com/v1/namespaces/mongodb/perconaservermongodbs/psmdb-db
status:
  conditions:
    - lastTransitionTime: '2024-06-02T10:26:34Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-06-02T10:26:35Z'
      message: >-
        reconcile StatefulSet for rs0: get StatefulSet for replset rs0: failed
        to get ssl annotations: waiting for TLS secret
      reason: ErrorReconcile
      status: 'True'
      type: error
    - lastTransitionTime: '2024-06-02T10:26:36Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-06-02T10:27:30Z'
      message: 'rs0: ready'
      reason: RSReady
      status: 'True'
      type: ready
    - lastTransitionTime: '2024-06-02T10:27:30Z'
      status: 'True'
      type: initializing
    - lastTransitionTime: '2024-06-02T10:30:25Z'
      message: 'rs0: ready'
      reason: RSReady
      status: 'True'
      type: ready
    - lastTransitionTime: '2024-06-02T10:30:25Z'
      status: 'True'
      type: initializing
  host: psmdb-db-rs0.mongodb.svc.cluster.local
  observedGeneration: 1
  ready: 3
  replsets:
    rs0:
      ready: 3
      size: 3
      status: ready
  size: 3
  state: initializing
spec:
  backup:
    enabled: true
    image: percona/percona-backup-mongodb:2.4.1
    pitr:
      enabled: false
  crVersion: 1.16.0
  image: percona/percona-server-mongodb:6.0.15
  imagePullPolicy: Always
  multiCluster:
    enabled: false
  pause: false
  pmm:
    enabled: false
    image: percona/pmm-client:2.41.2
    serverHost: monitoring-service
  replsets:
    - affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      arbiter:
        affinity:
          antiAffinityTopologyKey: kubernetes.io/hostname
        enabled: false
        size: 1
      expose:
        enabled: false
        exposeType: ClusterIP
      name: rs0
      nonvoting:
        affinity:
          antiAffinityTopologyKey: kubernetes.io/hostname
        enabled: false
        podDisruptionBudget:
          maxUnavailable: 1
        resources:
          limits:
            cpu: 800m
            memory: 1G
          requests:
            cpu: 800m
            memory: 1G
        size: 3
        volumeSpec:
          persistentVolumeClaim:
            resources:
              requests:
                storage: 3Gi
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: 300m
          memory: 0.5G
        requests:
          cpu: 300m
          memory: 0.5G
      size: 3
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 10Gi
  secrets:
    users: psmdb-db-secrets
  sharding:
    balancer:
      enabled: false
    configsvrReplSet:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      expose:
        enabled: false
        exposeType: ClusterIP
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: 300m
          memory: 0.5G
        requests:
          cpu: 300m
          memory: 0.5G
      size: 3
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 3Gi
    enabled: false
    mongos:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      expose:
        exposeType: ClusterIP
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: 300m
          memory: 0.5G
        requests:
          cpu: 300m
          memory: 0.5G
      size: 1
  unmanaged: false
  updateStrategy: OnDelete
  upgradeOptions:
    apply: Disabled
    schedule: 34 9 * * *
    setFCV: false
    versionServiceEndpoint: https://check.percona.com

Any tips to troubleshoot or repair would be greatly appreciated.

Scott

About 45 hours later, it seems the replica set has gotten into a stable state. However, the CD is still in initializing state. :thinking:

I’m restarting the pods to see what happens.

Scott

Do you have any errors in operator’s log? You can use it to understand why you have init state.

@Slava_Sarzhan - This is repeating in the operator logs:

2024-06-04T17:44:58.218Z	ERROR	failed to send telemetry to https://check.percona.com	{"controller": "psmdb-controller", "object": {"name":"psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "psmdb-db", "reconcileID": "1f0f7e03-10d9-4404-a504-7100bf1188a3", "error": "[GET /versions/v1/{product}/{operatorVersion}/{apply}][500] VersionService_Apply default  &{Code:13 Details:[] Message:failed to parse version: Disabled}"}
github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).ensureVersion
	/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/version.go:356
github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile
	/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:362
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:222

Not sure what to make of it myself.

Scott

Hi Slava,

Welp. Good news and bad news.

To get things working again, I basically had to start with a new setup. So, the cluster is working again.

At the same time, I zapped my database. Not a big deal, as everything was POC.

I can’t put a finger on why the update of the charts didn’t work. I learned a lot from this “excursion”. I guess the saying it true. One learns best by making mistakes.

Scott

Scott