Description:
In an OKD SNO environment, installing the operator and running it works correctly. When making backups to a local minio, backups work fine, but the set retention is not applied, resulting in a large number of old backups.
Steps to Reproduce:
- Install an OKD SNO cluster
- Through the OperatorHub section of the web console, install “Percona Distribution for MongoDB Operator” 1.17.0 provided by Percona
- Follow the installation steps and use the following CRD when required:
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
name: prod
finalizers:
- percona.com/delete-psmdb-pods-in-order
spec:
replsets:
- expose:
enabled: false
exposeType: ClusterIP
resources:
limits:
cpu: 300m
memory: 0.5G
requests:
cpu: 300m
memory: 0.5G
affinity:
antiAffinityTopologyKey: none
arbiter:
affinity:
antiAffinityTopologyKey: none
enabled: false
size: 0
nonvoting:
resources:
limits:
cpu: 300m
memory: 0.5G
requests:
cpu: 300m
memory: 0.5G
affinity:
antiAffinityTopologyKey: none
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 3Gi
podDisruptionBudget: {}
enabled: false
size: 1
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 5Gi
podDisruptionBudget: {}
name: rs0
size: 3
multiCluster:
enabled: false
DNSSuffix: svc.clusterset.local
upgradeOptions:
versionServiceEndpoint: 'https://check.percona.com'
apply: disabled
schedule: 0 2 * * *
setFCV: false
backup:
pitr:
enabled: false
compressionType: gzip
compressionLevel: 6
enabled: true
image: percona/percona-backup-mongodb:2.5.0
serviceAccountName: percona-server-mongodb-operator
storages:
minio:
type: s3
s3:
bucket: pmdb-prod-backup
region: us-east-1
credentialsSecret: prod-backup-minio-s3
endpointUrl: http://minio-api.minio.svc.cluster.local:9000
insecureSkipTLSVerify: false
tasks:
- compressionType: gzip
enabled: true
keep: 3
name: daily-backup
schedule: 0 0 * * *
storageName: minio
secrets:
users: pmdb-mongodb-prod
encryptionKey: my-cluster-name-mongodb-encryption-key
sharding:
configsvrReplSet:
expose:
enabled: false
exposeType: ClusterIP
resources:
limits:
cpu: 300m
memory: 0.5G
requests:
cpu: 300m
memory: 0.5G
affinity:
antiAffinityTopologyKey: none
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 3Gi
podDisruptionBudget: {}
size: 1
mongos:
expose:
exposeType: ClusterIP
resources:
limits:
cpu: 300m
memory: 0.5G
requests:
cpu: 300m
memory: 0.5G
affinity:
antiAffinityTopologyKey: none
podDisruptionBudget: {}
size: 1
enabled: false
pmm:
enabled: false
image: percona/pmm-client:2.42.0
serverHost: monitoring-service
crVersion: 1.17.0
image: percona/percona-server-mongodb:7.0.12-7
imagePullPolicy: Always
allowUnsafeConfigurations: false
updateStrategy: SmartUpdate
- Everything works correctly, the backups are created on a local Minio also installed in the same cluster with a basic configuration
- Backups retention is not applied after 3 days (.spec.backup.tasks[0].keep)
Version:
Operator version: percona-server-mongodb-operator.v1.17.0 (installed through OperatorHub)
OKD 4.15 version 4.15.0-0.okd-2024-03-10-010116 with kubernetes v1.28.2-3598+6e2789bbd58938-dirty
Logs:
The operator pod has many restarts:
# oc -n prod-pmdb get pods
NAME READY STATUS RESTARTS AGE
percona-server-mongodb-operator-65cd6cfc48-ktwds 1/1 Running 373 (14h ago) 258d
prod-rs0-0 2/2 Running 3 (65d ago) 258d
prod-rs0-1 2/2 Running 2 258d
prod-rs0-2 2/2 Running 3 (84d ago) 258d
The current logs keep repeating the same messages:
<I'll post the logs in a separate message or otherwise the forum won't let me post>
The pod on restart fails with exit code 2:
oc -n prod-pmdb describe pod percona-server-mongodb-operator-65cd6cfc48-ktwds
...
Command:
percona-server-mongodb-operator
State: Running
Started: Wed, 30 Jul 2025 18:07:47 -0400
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Tue, 29 Jul 2025 10:59:52 -0400
Finished: Wed, 30 Jul 2025 18:07:45 -0400
Ready: True
Restart Count: 373
...
These are the last logs for when it was restarted:
<I'll post the logs in a separate message or otherwise the forum won't let me post>
Expected Result:
The retention should be applied and only the last 3 days of backups should be kept.
Actual Result:
The retention is not applied and the backups are not deleted.
Additional Information:
The cluster is running on a single node (OKD SNO), but it shouldn’t matter from an operator perspective.
The CR has a status of “Ready”:
# oc -n prod-pmdb get PerconaServerMongoDB prod
NAME ENDPOINT STATUS AGE
prod prod-rs0.prod-pmdb.svc.cluster.local ready 258d
Full CR with describe:
# oc -n prod-pmdb describe PerconaServerMongoDB prod
Name: prod
Namespace: prod-pmdb
Labels: <none>
Annotations: <none>
API Version: psmdb.percona.com/v1
Kind: PerconaServerMongoDB
Metadata:
Creation Timestamp: 2024-11-15T09:06:09Z
Finalizers:
percona.com/delete-psmdb-pods-in-order
Generation: 1
Resource Version: 95156491
UID: ab68cd17-20b3-4e3e-88af-7c12077eb0f3
Spec:
Allow Unsafe Configurations: false
Backup:
Enabled: true
Image: percona/percona-backup-mongodb:2.5.0
Pitr:
Compression Level: 6
Compression Type: gzip
Enabled: false
Service Account Name: percona-server-mongodb-operator
Storages:
Minio:
s3:
Bucket: pmdb-prod-backup
Credentials Secret: prod-backup-minio-s3
Endpoint URL: http://minio-api.minio.svc.cluster.local:9000
Insecure Skip TLS Verify: false
Region: us-east-1
Type: s3
Tasks:
Compression Type: gzip
Enabled: true
Keep: 3
Name: daily-backup
Schedule: 0 0 * * *
Storage Name: minio
Cr Version: 1.17.0
Image: percona/percona-server-mongodb:7.0.12-7
Image Pull Policy: Always
Multi Cluster:
DNS Suffix: svc.clusterset.local
Enabled: false
Pmm:
Enabled: false
Image: percona/pmm-client:2.42.0
Server Host: monitoring-service
Replsets:
Affinity:
Anti Affinity Topology Key: none
Arbiter:
Affinity:
Anti Affinity Topology Key: none
Enabled: false
Size: 0
Expose:
Enabled: false
Expose Type: ClusterIP
Name: rs0
Nonvoting:
Affinity:
Anti Affinity Topology Key: none
Enabled: false
Pod Disruption Budget:
Resources:
Limits:
Cpu: 300m
Memory: 0.5G
Requests:
Cpu: 300m
Memory: 0.5G
Size: 1
Volume Spec:
Persistent Volume Claim:
Resources:
Requests:
Storage: 3Gi
Pod Disruption Budget:
Resources:
Limits:
Cpu: 300m
Memory: 0.5G
Requests:
Cpu: 300m
Memory: 0.5G
Size: 3
Volume Spec:
Persistent Volume Claim:
Resources:
Requests:
Storage: 5Gi
Secrets:
Encryption Key: my-cluster-name-mongodb-encryption-key
Users: pmdb-mongodb-prod
Sharding:
Configsvr Repl Set:
Affinity:
Anti Affinity Topology Key: none
Expose:
Enabled: false
Expose Type: ClusterIP
Pod Disruption Budget:
Resources:
Limits:
Cpu: 300m
Memory: 0.5G
Requests:
Cpu: 300m
Memory: 0.5G
Size: 1
Volume Spec:
Persistent Volume Claim:
Resources:
Requests:
Storage: 3Gi
Enabled: false
Mongos:
Affinity:
Anti Affinity Topology Key: none
Expose:
Expose Type: ClusterIP
Pod Disruption Budget:
Resources:
Limits:
Cpu: 300m
Memory: 0.5G
Requests:
Cpu: 300m
Memory: 0.5G
Size: 1
Update Strategy: SmartUpdate
Upgrade Options:
Apply: disabled
Schedule: 0 2 * * *
Set FCV: false
Version Service Endpoint: https://check.percona.com
Status:
Conditions:
Last Transition Time: 2024-11-15T09:06:11Z
Status: True
Type: initializing
Last Transition Time: 2024-11-15T09:07:12Z
Message: rs0: ready
Reason: RSReady
Status: True
Type: ready
Last Transition Time: 2024-11-15T09:07:12Z
Status: True
Type: initializing
Last Transition Time: 2024-11-15T09:07:23Z
Status: True
Type: ready
Last Transition Time: 2025-05-08T00:00:52Z
Status: True
Type: initializing
Last Transition Time: 2025-05-08T00:01:23Z
Message: rs0: ready
Reason: RSReady
Status: True
Type: ready
Last Transition Time: 2025-05-08T00:02:36Z
Status: True
Type: initializing
Last Transition Time: 2025-05-08T00:02:47Z
Message: rs0: ready
Reason: RSReady
Status: True
Type: ready
Last Transition Time: 2025-05-08T00:02:47Z
Status: True
Type: initializing
Last Transition Time: 2025-05-08T00:02:47Z
Status: True
Type: ready
Last Transition Time: 2025-05-27T00:07:34Z
Status: True
Type: initializing
Last Transition Time: 2025-05-27T00:07:50Z
Message: rs0: ready
Reason: RSReady
Status: True
Type: ready
Last Transition Time: 2025-07-31T13:15:50Z
Status: True
Type: initializing
Last Transition Time: 2025-07-31T13:16:06Z
Message: rs0: ready
Reason: RSReady
Status: True
Type: ready
Last Transition Time: 2025-07-31T13:16:06Z
Status: True
Type: initializing
Last Transition Time: 2025-07-31T13:16:11Z
Status: True
Type: ready
Host: prod-rs0.prod-pmdb.svc.cluster.local
Mongo Image: percona/percona-server-mongodb:7.0.12-7
Mongo Version: 7.0.12-7
Observed Generation: 1
Ready: 3
Replsets:
rs0:
Initialized: true
Ready: 3
Size: 3
Status: ready
Size: 3
State: ready
Events: <none>
The backups are really small as the total database size is <1GB on both disk and DB.
As of now there are more than 200+ backups ready but not deleted:
<I'll post the output in a separate message or otherwise the forum won't let me post>
Here is the full YAML of the oldest backup:
# oc -n prod-pmdb get psmdb-backup cron-prod-20241116000000-wcj8f
NAME CLUSTER STORAGE DESTINATION TYPE STATUS COMPLETED AGE
cron-prod-20241116000000-wcj8f prod minio s3://pmdb-prod-backup/2024-11-16T00:00:21Z logical ready 257d 257d
# oc -n prod-pmdb get psmdb-backup cron-prod-20241116000000-wcj8f -o yaml
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBBackup
metadata:
creationTimestamp: "2024-11-16T00:00:00Z"
finalizers:
- percona.com/delete-backup
generateName: cron-prod-20241116000000-
generation: 1
labels:
app.kubernetes.io/instance: prod
app.kubernetes.io/managed-by: percona-server-mongodb-operator
app.kubernetes.io/name: percona-server-mongodb
app.kubernetes.io/part-of: percona-server-mongodb
percona.com/backup-ancestor: daily-backup
percona.com/backup-type: cron
percona.com/cluster: prod
name: cron-prod-20241116000000-wcj8f
namespace: prod-pmdb
resourceVersion: "2570408"
uid: 6f2f9b3b-8a55-43a4-b39b-f5fc6f7b2b2e
spec:
clusterName: prod
compressionType: gzip
storageName: minio
type: logical
status:
completed: "2024-11-16T00:00:31Z"
destination: s3://pmdb-prod-backup/2024-11-16T00:00:21Z
lastTransition: "2024-11-16T00:00:31Z"
pbmName: "2024-11-16T00:00:21Z"
pbmPod: prod-rs0-0
pbmPods:
rs0: prod-rs0-0
replsetNames:
- rs0
s3:
bucket: pmdb-prod-backup
credentialsSecret: prod-backup-minio-s3
endpointUrl: http://minio-api.minio.svc.cluster.local:9000
region: us-east-1
serverSideEncryption: {}
start: "2024-11-16T00:00:21Z"
state: ready
storageName: minio
type: logical
And if I try to delete it, it works correctly:
# oc -n prod-pmdb delete psmdb-backup cron-prod-20241116000000-wcj8f
^C
# oc -n prod-pmdb delete psmdb-backup cron-prod-20241116000000-wcj8f
Error from server (NotFound): perconaservermongodbbackups.psmdb.percona.com "cron-prod-20241116000000-wcj8f" not found
As soon as possible I’d like to replicate this behaviour in a specific lab, but I’m opening an issue in the meantime in case it could be of help and in case anybody could suggest any more specific troubleshooting steps.