Dear Percona Team,
I am facing an issue with backups on a cluster with data (less 1GB).
Long-story short: Backup fails with exit code -1 and stays in running
state. Lock on MongoDB persists. Deleting backup which is in running
state does nothing.
Steps to Reproduce:
Spin up fresh PSMDB cluster. Backup fresh cluster - OK (did it 5 times). Run mongorestore
from another cluster to restore data:
mongorestore --db invite --host test-rs0.test.svc.cluster.local:27017 --authenticationDatabase invite --username 'mongodb-user' --password 'password' --gzip --archive=/bitnami/mongodb/backups/full-dev-dump-02-03-2023.archive.gz --drop --noIndexRestore
Wait 10-15 minutes. Initiate a new backup… and here starts some random behavior. It will do backup successfully once, or most likely it will fail with exit code -1
and no additional useful info.
Version:
PSMDB Operator 1.14.0
PSMDB 5.0.15-13, 6.0.4-3, 6.0.5-4
PBM 2.0.3, 2.0.5, 2.2.0
Logs:
2023-07-18T08:37:03.060+0000 Mux close namespace abc.Ski
2023/07/18 08:37:03 [entrypoint] `pbm-agent` exited with code -1
2023/07/18 08:37:03 [entrypoint] restart in 5 sec
pbm status
sh-4.4$ pbm status
Cluster:
========
rs0:
- rs0/test-rs0-0.test-rs0.test.svc.cluster.local:27017 [P]: pbm-agent v2.0.5 OK
- rs0/test-rs0-1.test-rs0.test.svc.cluster.local:27017 [S]: pbm-agent v2.0.5 OK
- rs0/test-rs0-2.test-rs0.test.svc.cluster.local:27017 [S]: pbm-agent v2.0.5 OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
(none)
Backups:
========
S3 eu-central-1 s3://backup-abc-test-com/scheduled
Snapshots:
2023-07-18T08:36:33Z 0.00B <logical> [ERROR: Backup stuck at `running` stage, last beat ts: 1689669420] [2023-07-18T08:36:37Z]
2023-07-18T08:33:10Z 160.53MB <logical> [restore_to_time: 2023-07-18T08:34:04Z]
2023-07-18T08:14:11Z 27.18KB <logical> [restore_to_time: 2023-07-18T08:14:16Z]
sh-4.4$
test-rs0-2-backup-agent.log (5.9 KB)
.yaml renamed to .log
test-percona.yaml.log (8.0 KB)
test-backup3.yaml.log (1.6 KB)
Please help to investigate.