Description:
I’ve been trying to get the Percona Operator for MongoDB running in our production environment. So far, I’ve migrated all data from Atlas to our hosted DB using Percona operator. I encountered a few issues setting up backups to S3 but that’s also working perfectly now. The issue being primarily this one which I resolved by disabling TLS.
The final step before I push this to production would be to ensure restoration of backups to another cluster also works correctly. This is where I’ve encountered issue after issue.
The first issue I encountered was {"Error": "Backup '2025-01-23T22:00:21Z' not found"}
. I resolved this by manually inserting the <backup-name>.pbm.json
(with the appropriate S3 credentials & fixing the timestamps) into the admin.pbmBackups
collection.
After this, the backup ran (running state) for a few minutes (6m30s almost always), after which it went into the error state, with this error:
prepare data: connect to mongo: mongo failed with [F] Fatal assertion / 2025-01-24T15:12:16.944+00:00, connect err: ping: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: localhost:28030, Type: Unknown, Last error: dial tcp [::1]:28030: connect: connection refused }, ] }
All this while, the rs0
pod has 100% memory usage & its PV’s used space continues growing. I let this happen for a few hours, eventually the usage reached 1.1TB/1.5TB after which it suddenly dropped to 10GB. It stayed like this for another 2 hours, after which I shut this pod down.
Steps to Reproduce:
The cluster to restore’s config:
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
name: mongodb-2
namespace: mongodb
spec:
crVersion: 1.19.0
image: percona/percona-server-mongodb:8.0.4-1-multi
tls:
mode: disabled
unsafeFlags:
tls: true
replsetSize: true
mongosSize: true
upgradeOptions:
apply: disabled
schedule: "0 2 * * *"
secrets:
users: mongodb-2
replsets:
- name: rs0
size: 1
affinity:
antiAffinityTopologyKey: 'none'
# if using even number of nodes, set arbiter to true
arbiter:
enabled: false
size: 1
configuration: |
replication:
# 25GB
oplogSizeMB: 25800
podDisruptionBudget:
maxUnavailable: 1
resources:
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 4
memory: 16Gi
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 1500Gi
storageClassName: gp2-xfs
sharding:
enabled: true
configsvrReplSet:
size: 1
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 1
memory: 2Gi
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 25Gi
storageClassName: gp2-xfs
mongos:
size: 1
backup:
enabled: true
image: percona/percona-backup-mongodb:2.8.0-multi
pitr:
enabled: false
oplogOnly: false
resources:
limits:
cpu: 2
memory: 8Gi
requests:
cpu: 500m
memory: 2Gi
storages:
s3-bkp:
type: s3
s3:
bucket: mongodb-backup
region: ap-east-1
credentialsSecret: mongodb-backups
The restore:
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
name: mongodb-2-restore
namespace: mongodb
spec:
clusterName: mongodb-2
storageName: s3-bkp
# specify S3 bucket name & source
backupSource:
type: physical
# the file path is easy to find, either from the S3 bucket
# or via the `describe psmdb-backup` command
destination: s3://mongodb-backup/2025-01-23T22:00:21Z
Here’s the cluster’s config that produced the backup (was a scheduled backup). It was also upgraded from MongoDB 7 to 8, backup was made after the upgrade. Followed the instructions here. This cluster works perfectly.
apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDB
metadata:
name: mongodb
namespace: mongodb
spec:
crVersion: 1.19.0
image: percona/percona-server-mongodb:8.0.4-1-multi
tls:
mode: disabled
unsafeFlags:
tls: true
replsetSize: true
mongosSize: true
upgradeOptions:
apply: disabled
schedule: "0 2 * * *"
secrets:
users: mongodb
replsets:
- name: rs0
size: 1
affinity:
antiAffinityTopologyKey: 'none'
# if using even number of nodes, set arbiter to true
arbiter:
enabled: false
size: 1
configuration: |
replication:
# 25GB
oplogSizeMB: 25800
podDisruptionBudget:
maxUnavailable: 1
resources:
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 8
memory: 16Gi
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 2000Gi
storageClassName: gp2-xfs
sharding:
enabled: true
configsvrReplSet:
size: 1
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 1
memory: 2Gi
volumeSpec:
persistentVolumeClaim:
resources:
requests:
storage: 25Gi
storageClassName: gp2-xfs
mongos:
size: 1
users:
- name: service-im
db: instant-messaging
passwordSecretRef:
name: service-im-password
key: password
roles:
- name: dbOwner
db: instant-messaging
- name: clusterMonitor
db: admin
backup:
enabled: true
image: percona/percona-backup-mongodb:2.8.0-multi
pitr:
enabled: true
oplogOnly: true
storages:
s3-bkp:
type: s3
s3:
bucket: mongodb-backup
region: ap-east-1
credentialsSecret: mongodb-backups
resources:
limits:
cpu: 2
memory: 8Gi
requests:
cpu: 500m
memory: 2Gi
tasks:
- name: "mongodb-backup"
enabled: true
# At 0 minutes past the hour, every 12 hours, starting at 10:00 AM
# https://crontab.cronhub.io
schedule: "0 10/12 * * *"
keep: 8
type: physical
storageName: s3-bkp
Version:
Operator: 1.19.0
MongoDB: 8.0.4
Logs:
Full logs of the restore cluster after startup to error message.
mongodb-2-rs0.log (154.3 KB)
Expected Result:
Well, I expect the restore to complete smoothly
Actual Result:
The errors keep occurring as outlined in the description
Additional Information:
Do note, the backup is sizeable – 1.1TB