Older backups are not getting deleted, with the keep argument specified

Mike_Devresse · October 17, 2024, 10:49am

Description:

I’ve setup two mongo clusters on different environments using the psmdb-db and psmdb-operator helm charts. I’ve setup for both of those cluster the following configuration concerning backups:

backup:
            enabled: true
            storages:
              s3-eu-west-3:
                type: s3
                s3:
                  bucket: xxxxxxxxxxx
                  region: eu-west-3
                  credentialsSecret: s3-credentials
            tasks:
              - name: hourly
                enabled: true
                schedule: "0 * * * *"
                keep: 24
                storageName: s3-eu-west-3
                compressionType: gzip
              - name: daily
                enabled: true
                schedule: "0 0 * * *"
                keep: 7
                storageName: s3-eu-west-3
                compressionType: gzip
              - name: weekly
                enabled: true
                schedule: "0 0 * * 0"
                keep: 5
                storageName: s3-eu-west-3
                compressionType: gzip
              - name: monthly
                enabled: true
                schedule: "0 0 1 * *"
                keep: 6
                storageName: s3-eu-west-3
                compressionType: gzip

My backups are working fine but they are not getting deleted which cause lots of usage on my aws s3. Plus, I am not sure if all tasks are being executed or only hourly, since they are not named differently. For instance, on my s3, I have hourly backups dating from almost a year! I checked my logs (will add them), and it never states that it will delete or remove anything which is weird. On one of my environment, I tried executing the pbm cleanup command and it did well so it doesn’t seem to have to do with permissions.

Steps to Reproduce:

Create a mongodb cluster using the previously stated backup configuration

Version:

Percona psmdb-operator and psmdb-db charts 1.17.0

Logs:

operator:

2024-10-17T10:00:27.567Z	INFO	balancer enabled	{"controller": "psmdb-controller", "object": {"name":"mongodb-psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "mongodb-psmdb-db", "reconcileID": "14bfbd08-82ce-4172-bad4-d1a07a2db384"}
2024-10-17T10:00:21.267Z	INFO	Sending backup command	{"controller": "psmdbbackup-controller", "object": {"name":"cron-mongodb-psmdb-db-20241017100000-gtfr7","namespace":"mongodb"}, "namespace": "mongodb", "name": "cron-mongodb-psmdb-db-20241017100000-gtfr7", "reconcileID": "3fb2453d-75f5-4c45-8042-1ae4b84bcc43", "backupCmd": "backup [name: 2024-10-17T10:00:21Z, compression: gzip (level: default)] <ts: 0>"}
2024-10-17T10:00:10.244Z	INFO	Setting PBM config	{"controller": "psmdbbackup-controller", "object": {"name":"cron-mongodb-psmdb-db-20241017100000-gtfr7","namespace":"mongodb"}, "namespace": "mongodb", "name": "cron-mongodb-psmdb-db-20241017100000-gtfr7", "reconcileID": "3fb2453d-75f5-4c45-8042-1ae4b84bcc43", "backup": "mongodb-psmdb-db"}
2024-10-17T10:00:10.244Z	INFO	Starting backup	{"controller": "psmdbbackup-controller", "object": {"name":"cron-mongodb-psmdb-db-20241017100000-gtfr7","namespace":"mongodb"}, "namespace": "mongodb", "name": "cron-mongodb-psmdb-db-20241017100000-gtfr7", "reconcileID": "3fb2453d-75f5-4c45-8042-1ae4b84bcc43", "backup": "cron-mongodb-psmdb-db-20241017100000-gtfr7", "storage": "s3-eu-west-3"}
2024-10-17T09:05:58.727Z	INFO	add new job	{"controller": "psmdb-controller", "object": {"name":"mongodb-psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "mongodb-psmdb-db", "reconcileID": "f0d7662f-310b-4ee7-bb9b-fc435e90ff79", "name": "ensure-version/mongodb/mongodb-psmdb-db", "schedule": "0 2 * * *"}
2024-10-17T09:05:54.381Z	INFO	Creating or updating backup job	{"controller": "psmdb-controller", "object": {"name":"mongodb-psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "mongodb-psmdb-db", "reconcileID": "f0d7662f-310b-4ee7-bb9b-fc435e90ff79", "name": "monthly", "namespace": "mongodb", "schedule": "0 0 1 * *"}
2024-10-17T09:05:54.381Z	INFO	Creating or updating backup job	{"controller": "psmdb-controller", "object": {"name":"mongodb-psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "mongodb-psmdb-db", "reconcileID": "f0d7662f-310b-4ee7-bb9b-fc435e90ff79", "name": "weekly", "namespace": "mongodb", "schedule": "0 0 * * 0"}
2024-10-17T09:05:54.381Z	INFO	Creating or updating backup job	{"controller": "psmdb-controller", "object": {"name":"mongodb-psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "mongodb-psmdb-db", "reconcileID": "f0d7662f-310b-4ee7-bb9b-fc435e90ff79", "name": "daily", "namespace": "mongodb", "schedule": "0 0 * * *"}
2024-10-17T09:05:54.381Z	INFO	Creating or updating backup job	{"controller": "psmdb-controller", "object": {"name":"mongodb-psmdb-db","namespace":"mongodb"}, "namespace": "mongodb", "name": "mongodb-psmdb-db", "reconcileID": "f0d7662f-310b-4ee7-bb9b-fc435e90ff79", "name": "hourly", "namespace": "mongodb", "schedule": "0 * * * *"}
2024-10-17T09:05:54.261Z	INFO	Starting workers	{"controller": "psmdbrestore-controller", "worker count": 1}
2024-10-17T09:05:54.261Z	INFO	Starting workers	{"controller": "psmdbbackup-controller", "worker count": 1}
2024-10-17T09:05:54.251Z	INFO	Starting workers	{"controller": "psmdb-controller", "worker count": 1}
2024-10-17T09:05:53.794Z	INFO	Starting Controller	{"controller": "psmdbbackup-controller"}
2024-10-17T09:05:53.794Z	INFO	Starting EventSource	{"controller": "psmdbbackup-controller", "source": "kind source: *v1.Pod"}
2024-10-17T09:05:53.794Z	INFO	Starting EventSource	{"controller": "psmdbbackup-controller", "source": "kind source: *v1.PerconaServerMongoDBBackup"}
2024-10-17T09:05:53.794Z	INFO	Starting Controller	{"controller": "psmdb-controller"}
2024-10-17T09:05:53.793Z	INFO	Starting Controller	{"controller": "psmdbrestore-controller"}
2024-10-17T09:05:53.793Z	INFO	Starting EventSource	{"controller": "psmdbrestore-controller", "source": "kind source: *v1.Pod"}
2024-10-17T09:05:53.793Z	INFO	Starting EventSource	{"controller": "psmdbrestore-controller", "source": "kind source: *v1.PerconaServerMongoDBRestore"}
2024-10-17T09:05:53.793Z	INFO	Starting EventSource	{"controller": "psmdb-controller", "source": "kind source: *v1.PerconaServerMongoDB"}
I1017 09:05:53.793075       1 leaderelection.go:260] successfully acquired lease mongodb/08db0feb.percona.com
I1017 09:05:38.594356       1 leaderelection.go:250] attempting to acquire leader lease mongodb/08db0feb.percona.com...
2024-10-17T09:05:38.594Z	INFO	starting server	{"name": "health probe", "addr": "[::]:8081"}
2024-10-17T09:05:38.593Z	INFO	controller-runtime.metrics	Serving metrics server	{"bindAddress": ":8080", "secure": false}
2024-10-17T09:05:38.593Z	INFO	controller-runtime.metrics	Starting metrics server
2024-10-17T09:05:38.581Z	INFO	server version	{"platform": "kubernetes", "version": "v1.30.4-eks-a737599"}
2024-10-17T09:05:38.551Z	INFO	setup	Manager starting up	{"gitCommit": "5019408f1fe40483fc5effaf61ab3f672765b189", "gitBranch": "release-1-17-0", "goVersion": "go1.22.6", "os": "linux", "arch": "amd64"}

Expected Result:

Deletion of older backups

Actual Result:

Older backups are still there

Additional Information:

If you need any additional info please feel free to ask, I will provide everyting needed.

It seems to be linked to this issue

Sergey_Pronin · October 18, 2024, 7:52am

Hey @Mike_Devresse ,

thanks! This is indeed related to Backup files are not deleted from Azure blob storage - #8 by Sergey_Pronin

We have this JIRA issue to track it: Issue navigator - Percona JIRA

We see the issue in the latest (1.17) version of the operator, but not in 1.16.

Mike_Devresse · October 18, 2024, 7:59am

Hi, I’m not sure it is not present in 1.16, I will try to test it, but my hourly rule is here since almost a year, and it never cleaned anything.

gmautner · March 4, 2025, 7:35pm

Even though the psmdb-operator has been upgraded to 1.19.1, older backups are still not getting deleted for me.

Maybe it’s related to the fact that I’m performing physical backups?

Here’s my manifest:

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
  finalizers:
  - percona.com/delete-psmdb-pods-in-order
  - percona.com/delete-psmdb-pvc
  name: wekan
  namespace: kanban
spec:
  backup:
    enabled: true
    image: percona/percona-backup-mongodb:2.8.0-multi
    pitr:
      compressionType: snappy
      enabled: true
      oplogOnly: true
    resources:
      limits:
        cpu: "1"
        memory: 2Gi
      requests:
        cpu: 150m
        memory: 2Gi
    storages:
      s3-backup:
        s3:
          bucket: lwsa-idp-testbed-dev-backup-cs-02-44hqhw45
          credentialsSecret: kanban-blueprint-backup-creds
          endpointUrl: https://poc.object-storage.sh:9000
          prefix: psmdb/kanban-kanban-blueprint-wekan/backup
          region: us-east-1
          retryer:
            maxRetryDelay: 5m
            minRetryDelay: 10s
            numMaxRetries: 10
        type: s3
    tasks:
    - compressionType: snappy
      enabled: true
      keep: 1
      name: daily
      schedule: 21 8 * * 1-6
      storageName: s3-backup
      type: physical
    - compressionType: snappy
      enabled: true
      keep: 1
      name: weekly
      schedule: 21 8 * * 0
      storageName: s3-backup
      type: physical
    - compressionType: snappy
      enabled: true
      keep: 1
      name: monthly
      schedule: 51 8 6 * *
      storageName: s3-backup
      type: physical
    - compressionType: snappy
      enabled: true
      keep: 5
      name: yearly
      schedule: 51 8 20 10 *
      storageName: s3-backup
      type: physical
  crVersion: 1.19.0
  enableVolumeExpansion: true
  image: percona/percona-server-mongodb:7.0.15-9-multi
  pause: false
  pmm:
    enabled: false
    image: percona/pmm-client:2.44.0
  replsets:
  - affinity:
      antiAffinityTopologyKey: kubernetes.io/hostname
    annotations:
      k8up.io/backup: "false"
    arbiter:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      enabled: false
      resources:
        limits:
          cpu: 500m
          memory: 512Mi
        requests:
          cpu: 40m
          memory: 512Mi
      size: 1
    expose:
      enabled: false
      type: ClusterIP
    labels:
      lwsa.cloud/psmdb-instance: wekan
    name: rs0
    nonvoting:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      enabled: false
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: 500m
          memory: 512Mi
        requests:
          cpu: 40m
          memory: 512Mi
      size: 1
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 10Gi
    podDisruptionBudget:
      maxUnavailable: 1
    resources:
      limits:
        cpu: "1"
        memory: 2Gi
      requests:
        cpu: 150m
        memory: 2Gi
    size: 1
    terminationGracePeriodSeconds: 90
    volumeSpec:
      persistentVolumeClaim:
        accessModes:
        - ReadWriteOnce
        annotations:
          k8up.io/backup: "false"
        resources:
          requests:
            storage: 100Gi
        storageClassName: cloudstack-volumes
  secrets:
    encryptionKey: wekan-psmdb-secrets-encryption-key
    users: wekan-psmdb-secrets-system
  sharding:
    configsvrReplSet:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      expose:
        enabled: false
        type: ClusterIP
      name: cf0
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "1"
          memory: 2Gi
        requests:
          cpu: 150m
          memory: 2Gi
      size: 1
      volumeSpec:
        persistentVolumeClaim:
          resources:
            requests:
              storage: 10Gi
    enabled: true
    mongos:
      affinity:
        antiAffinityTopologyKey: kubernetes.io/hostname
      expose:
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: lwsa/env=dev,lwsa/project=lwsa-idp-testbed
          service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
        loadBalancerSourceRanges:
        - 189.0.0.0/8
        - 186.0.0.0/8
        - 187.0.0.0/8
        - 185.0.0.0/8
        - 177.0.0.0/8
        - 179.0.0.0/8
        - 191.0.0.0/8
        - 70.0.0.0/8
        type: LoadBalancer
      podDisruptionBudget:
        maxUnavailable: 1
      resources:
        limits:
          cpu: "2"
          memory: 4Gi
        requests:
          cpu: 300m
          memory: 4Gi
      size: 1
  unmanaged: false
  unsafeFlags:
    mongosSize: true
    replsetSize: true
  updateStrategy: SmartUpdate
  upgradeOptions:
    apply: Disabled
    schedule: 0 8 * * *
    setFCV: false
    versionServiceEndpoint: https://check.percona.com
  users:
  - db: app_data
    name: app_owner
    passwordSecretRef:
      key: password
      name: wekan-psmdb-secrets-app
    roles:
    - db: app_data
      name: dbOwner
status:
  conditions:
  - lastTransitionTime: "2025-01-17T06:36:29Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2025-01-17T22:13:34Z"
    message: 'failed to get mongos connection: ping mongo: server selection error:
      server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr:
      10.43.213.53:27017, Type: Unknown, Last error: dial tcp 10.43.213.53:27017:
      connect: connection refused }, ] }'
    reason: ErrorReconcile
    status: "True"
    type: error
  - lastTransitionTime: "2025-01-17T22:16:37Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2025-01-18T05:43:19Z"
    message: 'dial: ping mongo: server selection error: context deadline exceeded,
      current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: wekan-rs0-0.wekan-rs0.kanban.svc.cluster.local:27017,
      Type: Unknown, Last error: dial tcp: lookup wekan-rs0-0.wekan-rs0.kanban.svc.cluster.local
      on 10.43.0.10:53: no such host }, ] }'
    reason: ErrorReconcile
    status: "True"
    type: error
  - lastTransitionTime: "2025-01-18T05:44:36Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2025-01-19T05:26:27Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2025-01-19T05:26:59Z"
    message: |-
      handle ReplicaSetNoPrimary: get standalone mongo client: ping mongo: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: wekan-cfg-0.wekan-cfg.kanban.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp: lookup wekan-cfg-0.wekan-cfg.kanban.svc.cluster.local on 10.43.0.10:53: read udp 10.0.8.217:58962->10.43.0.10:53: read: connection refused }, ] }
      dial: ping mongo: server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: wekan-rs0-0.wekan-rs0.kanban.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp: lookup wekan-rs0-0.wekan-rs0.kanban.svc.cluster.local on 10.43.0.10:53: read udp 10.0.8.217:42459->10.43.0.10:53: read: connection refused }, ] }
    reason: ErrorReconcile
    status: "True"
    type: error
  - lastTransitionTime: "2025-01-19T05:27:24Z"
    reason: MongosReady
    status: "True"
    type: ready
  - lastTransitionTime: "2025-01-28T06:22:00Z"
    message: 'handle ReplicaSetNoPrimary: no mongod containers in running state'
    reason: ErrorReconcile
    status: "True"
    type: error
  - lastTransitionTime: "2025-01-28T09:02:23Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2025-01-28T15:07:58Z"
    message: |-
      handle ReplicaSetNoPrimary: get standalone mongo client: ping mongo: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: wekan-cfg-0.wekan-cfg.kanban.svc.cluster.local:27017, Type: Unknown, Last error: dial tcp: lookup wekan-cfg-0.wekan-cfg.kanban.svc.cluster.local on 10.43.0.10:53: no such host }, ] }
      handle ReplicaSetNoPrimary: no mongod containers in running state
    reason: ErrorReconcile
    status: "True"
    type: error
  - lastTransitionTime: "2025-01-28T15:08:24Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2025-01-28T15:20:12Z"
    status: "True"
    type: initializing
  - lastTransitionTime: "2025-01-28T15:20:44Z"
    reason: MongosReady
    status: "True"
    type: ready
  - lastTransitionTime: "2025-01-28T15:22:40Z"
    message: 'handle ReplicaSetNoPrimary: get standalone mongo client: ping mongo:
      server selection error: server selection timeout, current topology: { Type:
      Single, Servers: [{ Addr: wekan-cfg-0.wekan-cfg.kanban.svc.cluster.local:27017,
      Type: Unknown, Last error: dial tcp: lookup wekan-cfg-0.wekan-cfg.kanban.svc.cluster.local
      on 10.43.0.10:53: read udp 10.0.8.217:57467->10.43.0.10:53: read: connection
      refused }, ] }'
    reason: ErrorReconcile
    status: "True"
    type: error
  - lastTransitionTime: "2025-01-28T15:22:45Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2025-01-29T05:12:13Z"
    message: 'failed to get mongos connection: ping mongo: server selection error:
      server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr:
      10.43.213.53:27017, Type: Unknown, Last error: dial tcp 10.43.213.53:27017:
      connect: connection refused }, ] }'
    reason: ErrorReconcile
    status: "True"
    type: error
  - lastTransitionTime: "2025-01-29T05:12:37Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2025-01-29T06:27:24Z"
    message: 'TLS secrets handler: "check cert-manager: Post "https://10.43.0.1:443/apis/cert-manager.io/v1/namespaces/kanban/certificaterequests?dryRun=All":
      unexpected EOF". Please create your TLS secret wekan-ssl manually or setup cert-manager
      correctly'
    reason: ErrorReconcile
    status: "True"
    type: error
  - lastTransitionTime: "2025-01-29T06:29:43Z"
    status: "True"
    type: ready
  - lastTransitionTime: "2025-02-02T02:58:47Z"
    status: "True"
    type: sharding
  host: 191.252.224.96
  mongoImage: percona/percona-server-mongodb:7.0.15-9-multi
  mongoVersion: 7.0.15-9
  mongos:
    ready: 1
    size: 1
    status: ready
  observedGeneration: 4
  ready: 3
  replsets:
    cfg:
      initialized: true
      members:
        wekan-cfg-0:
          name: wekan-cfg-0.wekan-cfg.kanban.svc.cluster.local:27017
          state: 1
          stateStr: PRIMARY
      ready: 1
      size: 1
      status: ready
    rs0:
      added_as_shard: true
      initialized: true
      members:
        wekan-rs0-0:
          name: wekan-rs0-0.wekan-rs0.kanban.svc.cluster.local:27017
          state: 1
          stateStr: PRIMARY
      ready: 1
      size: 1
      status: ready
  size: 3
  state: ready

Topic		Replies	Views
Backup files are not deleted from Azure blob storage Percona Operator for MongoDB	7	113	October 18, 2024
percona operator ignores backup scheduler changes Percona Operator for MongoDB	4	788	March 5, 2020
Percona MongoDB operator backup failure Percona Operator for MongoDB percona , mongodb	4	1293	October 18, 2022
Old backup jobs in operator even after uninstallation and namespace deletion Percona XtraDB Cluster 8.x	5	713	November 12, 2021
Unable to delete backup Percona Backup for MongoDB	2	1196	October 3, 2022