Description:
I have raised a mongodb cluster using the operator.
backup setup looks like this:
backup:
enabled: true
image: perconalab/percona-server-mongodb-operator:main-backup
petr:
enabled: false
tasks:
-name: "daily-night-backup"
enabled: true
schedule: "0 16 * * *"
keep: 14
type: logical
storageName: minio
compressionType: none
When the backup starts automatically or when I run it manually, I get an error at the time of creating the backup:
check cluster for dump done: convergeCluster: lost shard rs0, last beat ts: 1691656133
Version:
percona-server-mongodb-operator: 1.12.0
percona-server-mongodb: 5.0.7-6
backup-agent: 2.2.1
Logs:
pbm status:
Cluster:
========
rs0:
- rs0/mongodb-rs0-1.mongodb-rs0.infra.svc.k8s.us:27017 [P]: pbm-agent v2.2.1 OK
- rs0/mongodb-rs0-2.mongodb-rs0.infra.svc.k8s.us:27017 [S]: pbm-agent v2.2.1 OK
- rs0/mongodb-rs0-0.mongodb-rs0.infra.svc.k8s.us:27017 [S]: pbm-agent v2.2.1 OK
cfg:
- cfg/mongodb-cfg-1.mongodb-cfg.infra.svc.k8s.us:27017 [P]: pbm-agent v2.2.1 OK
- cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017 [S]: pbm-agent v2.2.1 OK
- cfg/mongodb-cfg-2.mongodb-cfg.infra.svc.k8s.us:27017 [S]: pbm-agent v2.2.1 OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
(none)
Backups:
========
S3 us-east-1 s3://https://s3.us-west-004.backblazeb2.com/mongo-data-test
Snapshots:
2023-08-10T08:28:47Z 10.93KB <logical> [ERROR: check cluster for dump done: convergeCluster: lost shard rs0, last beat ts: 1691656133] [2023-08-10T08:29:24Z]
2023-08-09T16:00:33Z 10.93KB <logical> [ERROR: check cluster for dump done: convergeCluster: lost shard rs0, last beat ts: 1691596839] [2023-08-09T16:01:10Z]
2023-08-08T16:00:24Z 10.93KB <logical> [ERROR: check cluster for dump done: convergeCluster: lost shard rs0, last beat ts: 1691510430] [2023-08-08T16:01:01Z]
but every time at the moment of backup, the node that performs the backup crashes with the above error
pbm logs -t 1000 -s D --event=backup/2023-08-10T08:28:47Z
2023-08-10T08:28:47Z D [cfg/mongodb-cfg-1.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] init backup meta
2023-08-10T08:28:47Z D [cfg/mongodb-cfg-1.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] nomination list for rs0: [[mongodb-rs0-2.mongodb-rs0.infra.svc.k8s.us:27017 mongodb-rs0-0.mongodb-rs0.infra.svc.k8s.us:27017] [mong
odb-rs0-1.mongodb-rs0.infra.svc.k8s.us:27017]]
2023-08-10T08:28:47Z D [cfg/mongodb-cfg-1.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] nomination list for cfg: [[mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017 mongodb-cfg-1.mongodb-cfg.infra.svc.k8s.us:27017 mongod
b-cfg-2.mongodb-cfg.infra.svc.k8s.us:27017]]
2023-08-10T08:28:47Z D [cfg/mongodb-cfg-1.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] nomination cfg, set candidates [mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017 mongodb-cfg-1.mongodb-cfg.infra.svc.k8s.us:27017 m
ongodb-cfg-2.mongodb-cfg.infra.svc.k8s.us:27017]
2023-08-10T08:28:47Z D [cfg/mongodb-cfg-1.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] nomination rs0, set candidates [mongodb-rs0-2.mongodb-rs0.infra.svc.k8s.us:27017 mongodb-rs0-0.mongodb-rs0.infra.svc.k8s.us:27017]
2023-08-10T08:28:48Z I [rs0/mongodb-rs0-0.mongodb-rs0.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] backup started
2023-08-10T08:28:48Z D [rs0/mongodb-rs0-1.mongodb-rs0.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] skip after nomination, probably started by another node
2023-08-10T08:28:48Z I [cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] backup started
2023-08-10T08:28:48Z D [cfg/mongodb-cfg-1.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] skip after nomination, probably started by another node
2023-08-10T08:28:48Z D [cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] waiting for balancer off
2023-08-10T08:28:48Z D [cfg/mongodb-cfg-2.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] skip after nomination, probably started by another node
2023-08-10T08:28:48Z D [rs0/mongodb-rs0-2.mongodb-rs0.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] skip after nomination, probably started by another node
2023-08-10T08:28:48Z D [cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] balancer status: off
2023-08-10T08:28:51Z D [rs0/mongodb-rs0-0.mongodb-rs0.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] wait for tmp users {1691656131 10}
2023-08-10T08:28:52Z D [cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] wait for tmp users {1691656132 8}
2023-08-10T08:28:52Z D [cfg/mongodb-cfg-1.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] bcp nomination: rs0 won by mongodb-rs0-0.mongodb-rs0.infra.svc.k8s.us:27017
2023-08-10T08:28:57Z I [cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] mongodump finished, waiting for the oplog
2023-08-10T08:29:24Z I [cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] dropping tmp collections
2023-08-10T08:29:24Z I [cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] mark RS as error `check cluster for dump done: convergeCluster: lost shard rs0, last beat ts: 1691656133`: <nil>
2023-08-10T08:29:24Z I [cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] mark backup as error `check cluster for dump done: convergeCluster: lost shard rs0, last beat ts: 1691656133`: <nil>
2023-08-10T08:29:24Z D [cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] set balancer on
2023-08-10T08:29:24Z E [cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] backup: check cluster for dump done: convergeCluster: lost shard rs0, last beat ts: 1691656133
2023-08-10T08:29:24Z D [cfg/mongodb-cfg-0.mongodb-cfg.infra.svc.k8s.us:27017] [backup/2023-08-10T08:28:47Z] releasing lock