Mongodb node loops on restart with an OOM

Description:

I have three MongoDB clusters with the same configuration deployed on AKS. The issue I am encountering today is identified on all three clusters. Initially, the clusters work fine, but after some time, the situation gets complicated. Suddenly, without any specific action on our part, one of the nodes in the MongoDB replicaset starts to restart due to OOM (Out Of Memory) issues. It keeps restarting for a while and then stabilizes. This situation worries me for the future. After checking the monitoring of my cluster, I noticed that the node in question does not exceed the memory allocated in my CR.yaml file, and there are no particular errors in my Percona operator or in the logs of my nodes.

Could you please help me locate the problem or tell me if my configuration has errors that could cause this type of behavior?

Version:

cr version 1.16.0

Configuration deployed :

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  name: mongodb
  finalizers:
    - delete-psmdb-pods-in-order
    #- delete-psmdb-pvc
spec:
#  platform: openshift
#  clusterServiceDNSSuffix: svc.cluster.local
  clusterServiceDNSMode: "External"
#  pause: true
#  unmanaged: false
  crVersion: 1.16.0
  image: percona/percona-server-mongodb:7.0.8-5
  imagePullPolicy: Always
  tls:
    mode: disabled
#    # 90 days in hours
#    certValidityDuration: 2160h
#    allowInvalidCertificates: true
#    issuerConf:
#      name: special-selfsigned-issuer
#      kind: ClusterIssuer
#      group: cert-manager.io
#  imagePullSecrets:
#    - name: private-registry-credentials
#  initImage: percona/percona-server-mongodb-operator:1.16.0
#  initContainerSecurityContext: {}
  unsafeFlags:
    tls: true
    replsetSize: true
    mongosSize: true
    terminationGracePeriod: true
    backupIfUnhealthy: true
  updateStrategy: SmartUpdate
#  ignoreAnnotations:
#    - service.beta.kubernetes.io/aws-load-balancer-backend-protocol
#  ignoreLabels:
#    - rack
#  multiCluster:
#    enabled: true
#    DNSSuffix: svc.clusterset.local
  upgradeOptions:
    versionServiceEndpoint: https://check.percona.com
    apply: disabled
    schedule: "0 2 * * *"
    setFCV: false
  secrets:
    users: rs-cluster-name-secrets
    encryptionKey: my-cluster-name-mongodb-encryption-key
#    vault: my-cluster-name-vault
#    ldapSecret: my-ldap-secret
#    sse: my-cluster-name-sse
  pmm:
    enabled: false
    image: percona/pmm-client:2.41.2
    serverHost: monitoring-service
#    mongodParams: --environment=ENVIRONMENT
#    mongosParams: --environment=ENVIRONMENT
  replsets:
  - name: rs0
    size: 3
#    terminationGracePeriodSeconds: 300
#    serviceAccountName: default
#    topologySpreadConstraints:
#      - labelSelector:
#          matchLabels:
#            app.kubernetes.io/name: percona-server-mongodb
#        maxSkew: 1
#        topologyKey: kubernetes.io/hostname
#        whenUnsatisfiable: DoNotSchedule
#    externalNodes:
#    - host: 34.124.76.90
#    - host: 34.124.76.91
#      port: 27017
#      votes: 0
#      priority: 0
#    - host: 34.124.76.92
#    # for more configuration fields refer to https://docs.mongodb.com/manual/reference/configuration-options/
#    configuration: |
#      operationProfiling:
#        mode: slowOp
#      systemLog:
#        verbosity: 1
#      storage:
#        engine: wiredTiger
#        wiredTiger:
#          engineConfig:
#            directoryForIndexes: false
#            journalCompressor: snappy
#          collectionConfig:
#            blockCompressor: snappy
#          indexConfig:
#            prefixCompression: true
    affinity:
      antiAffinityTopologyKey: none
#      advanced:
#        nodeAffinity:
#          requiredDuringSchedulingIgnoredDuringExecution:
#            nodeSelectorTerms:
#            - matchExpressions:
#              - key: kubernetes.io/e2e-az-name
#                operator: In
#                values:
#                - e2e-az1
#                - e2e-az2
#    tolerations:
#    - key: "node.alpha.kubernetes.io/unreachable"
#      operator: "Exists"
#      effect: "NoExecute"
#      tolerationSeconds: 6000
#    priorityClassName: high-priority
#    annotations:
#      iam.amazonaws.com/role: role-arn
#    labels:
#      rack: rack-22
#    nodeSelector:
#      disktype: ssd
#    storage:
#      engine: wiredTiger
#      wiredTiger:
#        engineConfig:
#          cacheSizeRatio: 0.5
#          directoryForIndexes: false
#          journalCompressor: snappy
#        collectionConfig:
#          blockCompressor: snappy
#        indexConfig:
#          prefixCompression: true
#      inMemory:
#        engineConfig:
#           inMemorySizeRatio: 0.5
#    livenessProbe:
#      failureThreshold: 4
#      initialDelaySeconds: 60
#      periodSeconds: 30
#      timeoutSeconds: 10
#      startupDelaySeconds: 7200
#    readinessProbe:
#      failureThreshold: 8
#      initialDelaySeconds: 10
#      periodSeconds: 3
#      successThreshold: 1
#      timeoutSeconds: 2
#    containerSecurityContext:
#      privileged: false
#    podSecurityContext:
#      runAsUser: 1001
#      runAsGroup: 1001
#      supplementalGroups: [1001]
#    runtimeClassName: image-rc
    sidecars:
    - image: percona/mongodb_exporter:2.35.0
      args: ["--mongodb.uri=mongodb://clusterMonitor:$MONGODB_CLUSTER_MONITOR_PASSWORD@localhost:27017", "--collect-all"]
      name: rs-sidecar-1
      envFrom:
      - secretRef:
          name: rs-cluster-name-secrets
      ports:
      - containerPort: 9216
        name: sidecar-port
#      volumeMounts:
#        - mountPath: /volume1
#          name: sidecar-volume-claim
#        - mountPath: /secret
#          name: sidecar-secret
#        - mountPath: /configmap
#          name: sidecar-config
#    sidecarVolumes:
#    - name: sidecar-secret
#      secret:
#        secretName: mysecret
#    - name: sidecar-config
#      configMap:
#        name: myconfigmap
#    sidecarPVCs:
#    - apiVersion: v1
#      kind: PersistentVolumeClaim
#      metadata:
#        name: sidecar-volume-claim
#      spec:
#        resources:
#          requests:
#            storage: 1Gi
#        volumeMode: Filesystem
#        accessModes:
#          - ReadWriteOnce
    podDisruptionBudget:
      maxUnavailable: 1
#      minAvailable: 0
#    splitHorizons:
#      my-cluster-name-rs0-0:
#        external: rs0-0.mycluster.xyz
#        external-2: rs0-0.mycluster2.xyz
#      my-cluster-name-rs0-1:
#        external: rs0-1.mycluster.xyz
#        external-2: rs0-1.mycluster2.xyz
#      my-cluster-name-rs0-2:
#        external: rs0-2.mycluster.xyz
#        external-2: rs0-2.mycluster2.xyz
    expose:
      enabled: true
      exposeType: NodePort
#      loadBalancerSourceRanges:
#        - 10.0.0.0/8
#      serviceAnnotations:
#        service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
#      serviceLabels:
#        rack: rack-22
    resources:
      limits:
        memory: "2Gi"
      requests:
        memory: "2Gi"
    volumeSpec:
#      emptyDir: {}
#      hostPath:
#        path: /data
#        type: Directory
      persistentVolumeClaim:
#        annotations:
#          volume.beta.kubernetes.io/storage-class: example-hostpath
        labels:
        storageClassName: managed-csi-xfs
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 30Gi

    nonvoting:
      enabled: false
#      podSecurityContext: {}
#      containerSecurityContext: {}
      size: 3
#      # for more configuration fields refer to https://docs.mongodb.com/manual/reference/configuration-options/
#      configuration: |
#        operationProfiling:
#          mode: slowOp
#        systemLog:
#          verbosity: 1
      affinity:
        antiAffinityTopologyKey: none
#        advanced:
#          nodeAffinity:
#            requiredDuringSchedulingIgnoredDuringExecution:
#              nodeSelectorTerms:
#              - matchExpressions:
#                - key: kubernetes.io/e2e-az-name
#                  operator: In
#                  values:
#                  - e2e-az1
#                  - e2e-az2
#      tolerations:
#      - key: "node.alpha.kubernetes.io/unreachable"
#        operator: "Exists"
#        effect: "NoExecute"
#        tolerationSeconds: 6000
#      priorityClassName: high-priority
#      annotations:
#        iam.amazonaws.com/role: role-arn
#      labels:
#        rack: rack-22
#      nodeSelector:
#        disktype: ssd
      podDisruptionBudget:
        maxUnavailable: 1
#        minAvailable: 0
      resources:
        limits:
          memory: "2Gi"
        requests:
          memory: "2Gi"
      volumeSpec:
#        emptyDir: {}
#        hostPath:
#          path: /data
#          type: Directory
        persistentVolumeClaim:
          storageClassName: managed-csi-xfs
          accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 30Gi
    arbiter:
      enabled: false
      size: 1
      affinity:
        antiAffinityTopologyKey: none
#        advanced:
#          nodeAffinity:
#            requiredDuringSchedulingIgnoredDuringExecution:
#              nodeSelectorTerms:
#              - matchExpressions:
#                - key: kubernetes.io/e2e-az-name
#                  operator: In
#                  values:
#                  - e2e-az1
#                  - e2e-az2
#      tolerations:
#      - key: "node.alpha.kubernetes.io/unreachable"
#        operator: "Exists"
#        effect: "NoExecute"
#        tolerationSeconds: 6000
#      priorityClassName: high-priority
#      annotations:
#        iam.amazonaws.com/role: role-arn
#      labels:
#        rack: rack-22
#      nodeSelector:
#        disktype: ssd
#    schedulerName: "default"
      resources:
        limits:
          memory: "2Gi"
        requests:
          memory: "2Gi"
#    hostAliases:
#    - ip: "10.10.0.2"
#      hostnames:
#      - "host1"
#      - "host2"

  sharding:
    enabled: false
#    balancer:
#      enabled: true
    configsvrReplSet:
      size: 3
#      terminationGracePeriodSeconds: 300
#      serviceAccountName: default
#      topologySpreadConstraints:
#        - labelSelector:
#            matchLabels:
#              app.kubernetes.io/name: percona-server-mongodb
#          maxSkew: 1
#          topologyKey: kubernetes.io/hostname
#          whenUnsatisfiable: DoNotSchedule
#      externalNodes:
#      - host: 34.124.76.93
#      - host: 34.124.76.94
#        port: 27017
#        votes: 0
#        priority: 0
#      - host: 34.124.76.95
#      # for more configuration fields refer to https://docs.mongodb.com/manual/reference/configuration-options/
#      configuration: |
#        operationProfiling:
#          mode: slowOp
#        systemLog:
#           verbosity: 1
      affinity:
        antiAffinityTopologyKey: none
#        advanced:
#          nodeAffinity:
#            requiredDuringSchedulingIgnoredDuringExecution:
#              nodeSelectorTerms:
#              - matchExpressions:
#                - key: kubernetes.io/e2e-az-name
#                  operator: In
#                  values:
#                  - e2e-az1
#                  - e2e-az2
#      tolerations:
#      - key: "node.alpha.kubernetes.io/unreachable"
#        operator: "Exists"
#        effect: "NoExecute"
#        tolerationSeconds: 6000
#      priorityClassName: high-priority
#      annotations:
#        iam.amazonaws.com/role: role-arn
#      labels:
#        rack: rack-22
#      nodeSelector:
#        disktype: ssd
#      livenessProbe:
#        failureThreshold: 4
#        initialDelaySeconds: 60
#        periodSeconds: 30
#        timeoutSeconds: 10
#        startupDelaySeconds: 7200
#      readinessProbe:
#        failureThreshold: 3
#        initialDelaySeconds: 10
#        periodSeconds: 3
#        successThreshold: 1
#        timeoutSeconds: 2
#      containerSecurityContext:
#        privileged: false
#      podSecurityContext:
#        runAsUser: 1001
#        runAsGroup: 1001
#        supplementalGroups: [1001]
#      runtimeClassName: image-rc
#      sidecars:
#      - image: busybox
#        command: ["/bin/sh"]
#        args: ["-c", "while true; do echo echo $(date -u) 'test' >> /dev/null; sleep 5;done"]
#        name: rs-sidecar-1
      podDisruptionBudget:
        maxUnavailable: 1
      expose:
        enabled: false
        exposeType: ClusterIP
#        loadBalancerSourceRanges:
#          - 10.0.0.0/8
#        serviceAnnotations:
#          service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
#        serviceLabels:
#          rack: rack-22
      resources:
        limits:
          memory: "2Gi"
        requests:
          memory: "2Gi"
      volumeSpec:
#       emptyDir: {}
#       hostPath:
#         path: /data
#         type: Directory
        persistentVolumeClaim:
          storageClassName: managed-csi-xfs
          accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 30Gi
#      hostAliases:
#      - ip: "10.10.0.2"
#        hostnames:
#        - "host1"
#        - "host2"

    mongos:
      size: 3
#      terminationGracePeriodSeconds: 300
#      serviceAccountName: default
#      topologySpreadConstraints:
#        - labelSelector:
#            matchLabels:
#              app.kubernetes.io/name: percona-server-mongodb
#          maxSkew: 1
#          topologyKey: kubernetes.io/hostname
#          whenUnsatisfiable: DoNotSchedule
#      # for more configuration fields refer to https://docs.mongodb.com/manual/reference/configuration-options/
#      configuration: |
#        systemLog:
#           verbosity: 1
      affinity:
        antiAffinityTopologyKey: none
#        advanced:
#          nodeAffinity:
#            requiredDuringSchedulingIgnoredDuringExecution:
#              nodeSelectorTerms:
#              - matchExpressions:
#                - key: kubernetes.io/e2e-az-name
#                  operator: In
#                  values:
#                  - e2e-az1
#                  - e2e-az2
#      tolerations:
#      - key: "node.alpha.kubernetes.io/unreachable"
#        operator: "Exists"
#        effect: "NoExecute"
#        tolerationSeconds: 6000
#      priorityClassName: high-priority
#      annotations:
#        iam.amazonaws.com/role: role-arn
#      labels:
#        rack: rack-22
#      nodeSelector:
#        disktype: ssd
#      livenessProbe:
#        failureThreshold: 4
#        initialDelaySeconds: 60
#        periodSeconds: 30
#        timeoutSeconds: 10
#        startupDelaySeconds: 7200
#      readinessProbe:
#        failureThreshold: 3
#        initialDelaySeconds: 10
#        periodSeconds: 3
#        successThreshold: 1
#        timeoutSeconds: 2
#      containerSecurityContext:
#        privileged: false
#      podSecurityContext:
#        runAsUser: 1001
#        runAsGroup: 1001
#        supplementalGroups: [1001]
#      runtimeClassName: image-rc
#      sidecars:
#      - image: busybox
#        command: ["/bin/sh"]
#        args: ["-c", "while true; do echo echo $(date -u) 'test' >> /dev/null; sleep 5;done"]
#        name: rs-sidecar-1
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          memory: "2Gi"
        requests:
          memory: "2Gi"
      expose:
        exposeType: ClusterIP
#        servicePerPod: true
#        loadBalancerSourceRanges:
#          - 10.0.0.0/8
#        serviceAnnotations:
#          service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
#        serviceLabels:
#          rack: rack-22
#        nodePort: 32017
#      hostAliases:
#      - ip: "10.10.0.2"
#        hostnames:
#        - "host1"
#        - "host2"

  backup:
    enabled: true
    image: percona/percona-backup-mongodb:2.4.1
#    annotations:
#      iam.amazonaws.com/role: role-arn
    resources:
      limits:
        memory: "2Gi"
      requests:
        memory: "2Gi"
    storages:
#      s3-us-west:
#        type: s3
#        s3:
#          bucket: S3-BACKUP-BUCKET-NAME-HERE
#          credentialsSecret: my-cluster-name-backup-s3
#          serverSideEncryption:
#            kmsKeyID: 1234abcd-12ab-34cd-56ef-1234567890ab
#            sseAlgorithm: aws:kms
#            sseCustomerAlgorithm: AES256
#            sseCustomerKey: Y3VzdG9tZXIta2V5
#          retryer:
#            numMaxRetries: 3
#            minRetryDelay: 30ms
#            maxRetryDelay: 5m
#          region: us-west-2
#          prefix: ""
#          uploadPartSize: 10485760
#          maxUploadParts: 10000
#          storageClass: STANDARD
#          insecureSkipTLSVerify: false
#      minio:
#        type: s3
#        s3:
#          bucket: MINIO-BACKUP-BUCKET-NAME-HERE
#          region: us-east-1
#          credentialsSecret: my-cluster-name-backup-minio
#          endpointUrl: http://minio.psmdb.svc.cluster.local:9000/minio/
#          insecureSkipTLSVerify: false
#          prefix: ""
      azure-blob:
        type: azure
        azure:
          container: mongo-backup
          prefix: ""
          endpointUrl: https://$AZURE_STORAGE_ACCOUNT_NAME.blob.core.windows.net
          credentialsSecret: my-cluster-name-azure-secret
#    pitr:
#      enabled: false
#      oplogOnly: false
#      oplogSpanMin: 10
#      compressionType: gzip
#      compressionLevel: 6
#    configuration:
#      backupOptions:
#        priority:
#          "localhost:28019": 2.5
#          "localhost:27018": 2.5
#        timeouts:
#          startingStatus: 33
#        oplogSpanMin: 10
#      restoreOptions:
#        batchSize: 500
#        numInsertionWorkers: 10
#        numDownloadWorkers: 4
#        maxDownloadBufferMb: 0
#        downloadChunkMb: 32
#        mongodLocation: /usr/bin/mongo
#        mongodLocationMap:
#          "node01:2017": /usr/bin/mongo
#          "node03:27017": /usr/bin/mongo
    tasks:
      - name: daily-azure
        enabled: true
        schedule: "0 0 * * *"
        keep: 180
        storageName: azure-blob
        compressionType: gzip
        compressionLevel: 6
#      - name: weekly-azure
#        enabled: false
#        schedule: "0 3 * * 0"
#        keep: 24
#        storageName: azure-blob
#        compressionType: gzip
#       compressionLevel: 6
#      - name: weekly-s3-us-west-physical
#        enabled: false
#        schedule: "0 5 * * 0"
#        keep: 5
#        type: physical
#        storageName: s3-us-west
#        compressionType: gzip
#        compressionLevel: 6

Actual pod status :

pod/mongodb-rs0-rs-0                            3/3     Running   457 (2d5h ago)   6d16h
pod/mongodb-rs0-rs-1                            3/3     Running   0                6d16h
pod/mongodb-rs0-rs-2                            3/3     Running   0                6d16h
pod/percona-server-mongodb-operator-85779f9cd7-qqhwt   1/1     Running   2 (29h ago)      6d16h

Logs:

**Log sequence a when you see from 2024-06-22T04:40:03.518Z  node restard**
```
2024-06-22T00:00:10.052Z	INFO	Starting backup	{"controller": "psmdbbackup-controller", "object": {"name":"cron-mongodb-20240622000000-g7dqd","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "cron-mongodb-20240622000000-g7dqd", "reconcileID": "c0e50a28-8c39-48ad-b40f-8e9bdbc77a94", "backup": "cron-mongodb-20240622000000-g7dqd", "storage": "azure-blob"}
02:00:10.052
2024-06-22T00:00:10.052Z	INFO	Setting PBM config	{"controller": "psmdbbackup-controller", "object": {"name":"cron-mongodb-20240622000000-g7dqd","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "cron-mongodb-20240622000000-g7dqd", "reconcileID": "c0e50a28-8c39-48ad-b40f-8e9bdbc77a94", "backup": "mongodb"}
02:00:21.060
2024-06-22T00:00:21.060Z	INFO	Sending backup command	{"controller": "psmdbbackup-controller", "object": {"name":"cron-mongodb-20240622000000-g7dqd","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "cron-mongodb-20240622000000-g7dqd", "reconcileID": "c0e50a28-8c39-48ad-b40f-8e9bdbc77a94", "backupCmd": "backup [name: 2024-06-22T00:00:21Z, compression: gzip (level: 6)] <ts: 0>"}
06:40:03.518
2024-06-22T04:40:03.518Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "93adbd30-367c-45b4-8254-c892de39d0b9", "previous": "ready", "current": "initializing"}
06:40:09.052
2024-06-22T04:40:09.052Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "11f3f103-0129-47ba-a1a7-d1ad34f03539", "previous": "initializing", "current": "ready"}
07:23:24.826
2024-06-22T05:23:24.826Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "d47e59c1-e661-4611-a1a0-5f8493063dbf", "previous": "ready", "current": "initializing"}
07:23:35.839
2024-06-22T05:23:35.839Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "fbfa57a6-24a2-4070-84e0-cb9c6c6d1765", "previous": "initializing", "current": "ready"}
07:34:19.686
2024-06-22T05:34:19.686Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "0daa1629-110d-46bd-8c11-b6c1e9768593", "previous": "ready", "current": "initializing"}
07:34:30.662
2024-06-22T05:34:30.662Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "03a8e13c-27c2-47f7-9451-cf6526df13fc", "previous": "initializing", "current": "ready"}
07:45:24.771
2024-06-22T05:45:24.771Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "f02df10c-3618-4e05-8127-02a79f0c166c", "previous": "ready", "current": "initializing"}
07:45:41.190
2024-06-22T05:45:41.190Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "a8fa50b7-d6da-4aac-ad08-bfc95839a3a3", "previous": "initializing", "current": "ready"}
07:56:19.692
2024-06-22T05:56:19.692Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "9dd201f4-d066-4bec-9d7a-1773c65ca630", "previous": "ready", "current": "initializing"}
07:56:19.692
2024-06-22T05:56:19.692Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "9dd201f4-d066-4bec-9d7a-1773c65ca630", "previous": "ready", "current": "initializing"}
07:56:30.617
2024-06-22T05:56:30.617Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "50883511-1381-4388-9017-bda3448239fa", "previous": "initializing", "current": "ready"}
08:07:09.405
2024-06-22T06:07:09.405Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "bcb2d846-ef1d-4ca0-8056-e9a73292983f", "previous": "ready", "current": "initializing"}
08:07:25.860
2024-06-22T06:07:25.860Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "cc242b11-a0bb-4465-a767-9650e6ace335", "previous": "initializing", "current": "ready"}
08:18:05.579
2024-06-22T06:18:05.579Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "3de4a93a-c9c0-4b6d-bfc8-4c2fa1a384b5", "previous": "ready", "current": "initializing"}
08:18:16.501
2024-06-22T06:18:16.500Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "bed4d14c-f7b0-4953-a554-1d142ba1778b", "previous": "initializing", "current": "ready"}
09:01:02.303
2024-06-22T07:01:02.303Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "87a3ea4d-1dc3-4b21-a702-e75d73c206ba", "previous": "ready", "current": "initializing"}
09:01:13.858
2024-06-22T07:01:13.858Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "63d07c75-a514-40e5-999c-0faae46e5ef2", "previous": "initializing", "current": "ready"}
09:11:53.450
2024-06-22T07:11:53.449Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "79bb7de2-d921-46d9-983d-855268e0fa8f", "previous": "ready", "current": "initializing"}
09:12:04.362
2024-06-22T07:12:04.362Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "6d93cfcb-50a7-43e1-bd45-c44a18617e23", "previous": "initializing", "current": "ready"}
09:44:05.825
2024-06-22T07:44:05.825Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "978ecf82-282d-4436-a244-248a93330f3e", "previous": "ready", "current": "initializing"}
09:44:22.397
2024-06-22T07:44:22.397Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "4a1f8d60-73f8-449c-9208-4c38ac2c2ccd", "previous": "initializing", "current": "ready"}
09:55:00.365
2024-06-22T07:55:00.365Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "d1df9122-7b7d-4831-938f-843d8bc9f459", "previous": "ready", "current": "initializing"}
09:55:11.434
2024-06-22T07:55:11.434Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "ee086fe9-ad02-4a82-8158-6b2daac646f6", "previous": "initializing", "current": "ready"}
10:05:50.352
2024-06-22T08:05:50.352Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "06843007-fe0d-4b60-a183-7c0913654948", "previous": "ready", "current": "initializing"}
10:06:01.391
2024-06-22T08:06:01.391Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "05a96d54-5c8a-410c-8f46-12a01d4ed9c8", "previous": "initializing", "current": "ready"}
10:15:54.876
2024-06-22T08:15:54.876Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "5e8d1378-b4d4-44e4-911f-da7bd7650003", "previous": "ready", "current": "initializing"}
10:16:11.398
2024-06-22T08:16:11.398Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "910c497a-4f2b-4a6b-b458-dacc23fc3b9b", "previous": "initializing", "current": "ready"}
10:48:15.995
2024-06-22T08:48:15.994Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "dde1129e-67ee-40ab-8ffa-0ff66b00c76d", "previous": "ready", "current": "initializing"}
10:48:38.005
2024-06-22T08:48:38.005Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "4f775858-0461-4781-ad31-953f5088a69e", "previous": "initializing", "current": "ready"}
10:59:20.997
2024-06-22T08:59:20.997Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "fb787e13-b78c-4b5c-994b-1e9fc1ac912a", "previous": "ready", "current": "initializing"}
10:59:37.497
2024-06-22T08:59:37.497Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "340ec906-e740-4be7-a657-cca039e3fc26", "previous": "initializing", "current": "ready"}
11:31:36.118
2024-06-22T09:31:36.118Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "22956a59-4dd9-4cfc-8283-0002d5d7a778", "previous": "ready", "current": "initializing"}
11:31:47.083
2024-06-22T09:31:47.083Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "03d8ce91-4b8d-446e-bf47-2c521a410511", "previous": "initializing", "current": "ready"}
12:02:53.621
2024-06-22T10:02:53.621Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "4301a707-419a-4cf2-84ae-ac6f646cb69e", "previous": "ready", "current": "initializing"}
12:03:10.094
2024-06-22T10:03:10.093Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "28ba0b00-4a22-41d2-8816-4f034f0f15cf", "previous": "initializing", "current": "ready"}
12:25:23.696
2024-06-22T10:25:23.696Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "abbedeab-be81-4996-a2a3-213ceb05038b", "previous": "ready", "current": "initializing"}
12:25:40.133

TRONCATE LINE 

2024-06-22T12:34:39.674Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "b5cf64e5-3afd-474c-b74c-9bfc894dbe1a", "previous": "initializing", "current": "ready"}
14:37:26.062
2024-06-22T12:37:26.062Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "db852c57-303b-4335-90dc-11fdf17581e8", "previous": "ready", "current": "initializing"}
14:38:26.233
2024-06-22T12:38:26.233Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "1b86cd23-579f-40f8-afc7-23360604152c", "previous": "initializing", "current": "ready"}
14:41:26.734
2024-06-22T12:41:26.734Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "491f2a0c-ad74-40be-94cf-f4d63c543eff", "previous": "ready", "current": "initializing"}
14:43:01.646
2024-06-22T12:43:01.646Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "0c103aad-fd1e-4cf0-9ef3-0a2336d38891", "previous": "initializing", "current": "ready"}
14:45:55.739
2024-06-22T12:45:55.739Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "1ed14134-94da-4f07-a75e-9004d8478a13", "previous": "ready", "current": "initializing"}
14:48:54.173
2024-06-22T12:48:54.173Z	INFO	Cluster state changed	{"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "fa43d32d-75e1-4b9a-ad07-fe2ed3dc910c", "previous": "initializing", "current": "ready"}```

My cluster node is currentlly up with no restart from 2024-06-24T08:24:09.072Z

here is log sequence on percona controleur :

10:18:07.494

2024-06-24T08:18:07.494Z INFO Cluster state changed {"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "2cb79a10-fbb4-4704-b414-eed1ad1eba9a", "previous": "ready", "current": "initializing"}

10:23:35.316

2024-06-24T08:23:35.316Z INFO Cluster state changed {"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "323beb55-2be4-475d-9f3d-ab8de8c33b49", "previous": "initializing", "current": "ready"}

10:24:09.072

2024-06-24T08:24:09.072Z INFO Cluster state changed {"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "3b932a6b-82a1-4379-bbf0-e88a778145d6", "previous": "ready", "current": "initializing"}

10:29:40.427

2024-06-24T08:29:40.426Z INFO Cluster state changed {"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "93fe7232-7394-40f3-b6b2-d0b6d584a61c", "previous": "initializing", "current": "ready"}

10:30:33.287

2024-06-24T08:30:33.287Z INFO Stopping and waiting for non leader election runnables

10:30:33.287

2024-06-24T08:30:33.287Z INFO Stopping and waiting for leader election runnables

10:30:33.349

2024-06-24T08:30:33.337Z ERROR failed to update cluster status {"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "a03caae5-2492-470b-b92d-6eca3f45387d", "replset": "rs0", "error": "write status: client rate limiter Wait returned an error: context canceled", "errorVerbose": "client rate limiter Wait returned an error: context canceled\nwrite status\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).writeStatus\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/status.go:259\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).updateStatus\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/status.go:52\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile.func1\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:269\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:388\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:222\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}

10:30:33.349

github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile.func1

10:30:33.349

/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:271

10:30:33.349

github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile

10:30:33.349

/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:388

10:30:33.349

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile

10:30:33.349

/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:114

10:30:33.349

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler

10:30:33.349

/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:311

10:30:33.349

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem

10:30:33.349

/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:261

10:30:33.349

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2

10:30:33.349

/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:222

10:30:33.349

2024-06-24T08:30:33.345Z ERROR Reconciler error {"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "a03caae5-2492-470b-b92d-6eca3f45387d", "error": "ensure mongo Key my-cluster-name-mongodb-encryption-key: get key: Get \"https://10.0.0.1:443/api/v1/namespaces/production-mongo/secrets/my-cluster-name-mongodb-encryption-key\": context canceled", "errorVerbose": "Get \"https://10.0.0.1:443/api/v1/namespaces/production-mongo/secrets/my-cluster-name-mongodb-encryption-key\": context canceled\nget key\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).ensureSecurityKey\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:791\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:385\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:222\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695\nensure mongo Key my-cluster-name-mongodb-encryption-key\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:387\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:261\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:222\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}

10:30:33.349

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler

10:30:33.349

/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:324

10:30:33.349

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem

10:30:33.349

/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:261

10:30:33.349

sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2

10:30:33.349

/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.1/pkg/internal/controller/controller.go:222

10:30:33.349

2024-06-24T08:30:33.345Z INFO Shutdown signal received, waiting for all workers to finish {"controller": "psmdbrestore-controller"}

10:30:33.349

2024-06-24T08:30:33.345Z INFO Shutdown signal received, waiting for all workers to finish {"controller": "psmdbbackup-controller"}

10:30:33.349

2024-06-24T08:30:33.345Z INFO Shutdown signal received, waiting for all workers to finish {"controller": "psmdb-controller"}

10:30:33.349

2024-06-24T08:30:33.345Z INFO All workers finished {"controller": "psmdb-controller"}

10:30:33.349

2024-06-24T08:30:33.346Z INFO All workers finished {"controller": "psmdbbackup-controller"}

10:30:33.349

2024-06-24T08:30:33.346Z INFO All workers finished {"controller": "psmdbrestore-controller"}

10:30:33.349

2024-06-24T08:30:33.346Z INFO Stopping and waiting for caches

10:30:33.349

E0624 08:30:33.346403 1 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.0/tools/cache/reflector.go:232: Failed to watch *v1.StatefulSet: Get "https://10.0.0.1:443/apis/apps/v1/namespaces/production-mongo/statefulsets?allowWatchBookmarks=true&resourceVersion=2458955&timeoutSeconds=527&watch=true": context canceled

10:30:33.349

W0624 08:30:33.346478 1 reflector.go:470] pkg/mod/k8s.io/client-go@v0.30.0/tools/cache/reflector.go:232: watch of *v1.PerconaServerMongoDB ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding

10:30:33.349

W0624 08:30:33.346523 1 reflector.go:470] pkg/mod/k8s.io/client-go@v0.30.0/tools/cache/reflector.go:232: watch of *v1.Pod ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding

10:30:33.349

W0624 08:30:33.346586 1 reflector.go:470] pkg/mod/k8s.io/client-go@v0.30.0/tools/cache/reflector.go:232: watch of *v1.PerconaServerMongoDBRestore ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding

10:30:33.349

W0624 08:30:33.346630 1 reflector.go:470] pkg/mod/k8s.io/client-go@v0.30.0/tools/cache/reflector.go:232: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding

10:30:33.349

W0624 08:30:33.346681 1 reflector.go:470] pkg/mod/k8s.io/client-go@v0.30.0/tools/cache/reflector.go:232: watch of *v1.PerconaServerMongoDBBackup ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding

10:30:33.349

2024-06-24T08:30:33.346Z INFO Stopping and waiting for webhooks

10:30:33.349

2024-06-24T08:30:33.346Z INFO Stopping and waiting for HTTP servers

10:30:33.349

2024-06-24T08:30:33.346Z INFO shutting down server {"name": "health probe", "addr": "[::]:8081"}

10:30:33.359

2024-06-24T08:30:33.346Z INFO controller-runtime.metrics Shutting down metrics server with timeout of 1 minute

10:30:36.524

2024-06-24T08:30:36.514Z INFO Wait completed, proceeding to shutdown the manager

10:30:36.524

E0624 08:30:36.524640 1 leaderelection.go:340] Failed to update lock optimitically: Put "https://10.0.0.1:443/apis/coordination.k8s.io/v1/namespaces/production-mongo/leases/08db0feb.percona.com": context canceled, falling back to slow path

10:30:36.524

E0624 08:30:36.524742 1 leaderelection.go:347] error retrieving resource lock production-mongo/08db0feb.percona.com: client rate limiter Wait returned an error: context canceled

10:30:36.524

I0624 08:30:36.524758 1 leaderelection.go:285] failed to renew lease production-mongo/08db0feb.percona.com: timed out waiting for the condition

10:31:12.850

2024-06-24T08:31:12.850Z INFO setup Manager starting up {"gitCommit": "54e1b18dd9dac8e0ed5929bb2c91318cd6829a48", "gitBranch": "release-1-16-0", "goVersion": "go1.22.3", "os": "linux", "arch": "amd64"}

10:31:13.950

2024-06-24T08:31:13.950Z INFO server version {"platform": "kubernetes", "version": "v1.28.9"}

10:31:14.181

2024-06-24T08:31:14.181Z INFO controller-runtime.metrics Starting metrics server

10:31:14.181

2024-06-24T08:31:14.181Z INFO controller-runtime.metrics Serving metrics server {"bindAddress": ":8080", "secure": false}

10:31:14.181

2024-06-24T08:31:14.181Z INFO starting server {"name": "health probe", "addr": "[::]:8081"}

10:31:14.182

I0624 08:31:14.181982 1 leaderelection.go:250] attempting to acquire leader lease production-mongo/08db0feb.percona.com...

10:31:29.764

I0624 08:31:29.764851 1 leaderelection.go:260] successfully acquired lease production-mongo/08db0feb.percona.com

10:31:29.765

2024-06-24T08:31:29.765Z INFO Starting EventSource {"controller": "psmdb-controller", "source": "kind source: *v1.PerconaServerMongoDB"}

10:31:29.766

2024-06-24T08:31:29.766Z INFO Starting Controller {"controller": "psmdb-controller"}

10:31:29.784

2024-06-24T08:31:29.775Z INFO Starting EventSource {"controller": "psmdbrestore-controller", "source": "kind source: *v1.Pod"}

10:31:29.784

2024-06-24T08:31:29.776Z INFO Starting EventSource {"controller": "psmdbbackup-controller", "source": "kind source: *v1.PerconaServerMongoDBBackup"}

10:31:29.784

2024-06-24T08:31:29.776Z INFO Starting Controller {"controller": "psmdbbackup-controller"}

10:31:29.784

2024-06-24T08:31:29.775Z INFO Starting EventSource {"controller": "psmdbrestore-controller", "source": "kind source: *v1.PerconaServerMongoDBRestore"}

10:31:29.784

2024-06-24T08:31:29.775Z INFO Starting Controller {"controller": "psmdbrestore-controller"}

10:31:29.784

2024-06-24T08:31:29.776Z INFO Starting EventSource {"controller": "psmdbbackup-controller", "source": "kind source: *v1.Pod"}

10:31:31.445

2024-06-24T08:31:31.444Z INFO Starting workers {"controller": "psmdbbackup-controller", "worker count": 1}

10:31:31.545

2024-06-24T08:31:31.545Z INFO Starting workers {"controller": "psmdb-controller", "worker count": 1}

10:31:31.606

2024-06-24T08:31:31.606Z INFO Starting workers {"controller": "psmdbrestore-controller", "worker count": 1}

10:31:32.475

2024-06-24T08:31:32.475Z INFO Creating or updating backup job {"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "359131cc-356e-46c6-810c-98159d0986eb", "name": "daily-azure", "namespace": "production-mongo", "schedule": "0 0 * * *"}

10:31:32.475

2024-06-24T08:31:32.475Z INFO Creating or updating backup job {"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "359131cc-356e-46c6-810c-98159d0986eb", "name": "daily-azure", "namespace": "production-mongo", "schedule": "0 0 * * *"}

10:31:39.196

2024-06-24T08:31:39.177Z INFO add new job {"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "359131cc-356e-46c6-810c-98159d0986eb", "name": "ensure-version/production-mongo/mongodb", "schedule": "0 2 * * *"}

10:31:41.557

2024-06-24T08:31:41.556Z INFO Cluster state changed {"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "359131cc-356e-46c6-810c-98159d0986eb", "previous": "ready", "current": "initializing"}

10:35:50.221

2024-06-24T08:35:50.220Z INFO Cluster state changed {"controller": "psmdb-controller", "object": {"name":"mongodb","namespace":"production-mongo"}, "namespace": "production-mongo", "name": "mongodb", "reconcileID": "17192c14-6f2c-4665-9fb8-d2d8f0c68a8b", "previous": "initializing", "current": "ready"}

@ra2 sorry it took so long.

  1. I see you have mongodb exporter, so there is some level of monitoring in place. Do you have any graphs that monitor RAM utilization? Is it slowly growing and then you hit OOM? Or it is a sudden memory consumption?
  2. Nodes are quite small - 2 GBs of RAM. And I don’t see any wiredtiger cache size tuning.

Here is our usual take on it:

By default internal cache of the WiredTiger engine is set to larger of 256MB or (Total Memory of mongo container - 1GB ) * WiredTiger Cache size ratio. By default value of WiredTigerCachesize ratio is 0.5.

In the case of container, total memory is the limit of the mongo container.

If the limits are not set in the mongo container, all the memory available on the kubernetes node is considered as Total Memory available and internal cache is calculated using the above formula. This does not provide the actual memory available and can kill the container due to OOM( out of memory). Hence it is important to set Limits for the mongo container.

Set the cache size ratio to the following.

For example:

replsets:
- storage:
    engine: wiredTiger
    wiredTiger:
      engineConfig:
        cacheSizeRatio: 0.5

For smaller systems (up to 2000 simultaneous connections) with less than 8G of memory, use a cacheSizeRatio of 0.25.

For medium to large systems use the 0.5 ratio.