Hey all,
Back in march, our nightly logical backup started failing. Initial issues pointed to the default startingStatus of 33 seconds. I have now changed backup.timeouts.startingStatus to 300 seconds. The backups kick off and appear to run well for 2 of the 3 replicasets but the backup as a whole fails because of one replicaset that keeps failing. Below is what I believe to be all of the pertinent information but happy to provide further details if needed.
Environment is K8s using helm running in an AWS EKS cluster.
3 replicasets running mongo 7.0.12-7 with PBM running 2.9.2, although pbm was running 2.5.0 and I upgraded to 2.9.1 as part of the troubleshooting process.
The below can be replicated simply by running pbm backup --type=logical --compression=zstd --compression-level=2
on any of the config servers. It is always rs2 that fails. It should also be noted that rs2 was recently added but the yaml config for adding it is the same as the other two replicasets.
Another odd thing is that you can see a gap in the node stats panel of the MongoDB ReplSet Summary Dashboard in PMM within a few minutes of the backup command being issued and it seems to correlate with the timing of the failure of the rs2 replicaset.
This also seems to only impact logical backups, the physical backups run just fine.
cfg container log
2025-04-21T17:42:18.000+0000 I got command backup [name: 2025-04-21T17:42:17Z, compression: zstd (level: 2)] <ts: 1745257337>, opid: 680683790dd8b6d4e9d0ea1f
2025-04-21T17:42:18.000+0000 I got epoch {1745193610 67}
2025-04-21T17:42:18.000+0000 I [backup/2025-04-21T17:42:17Z] backup started
2025-04-21T17:42:18.000+0000 D [backup/2025-04-21T17:42:17Z] waiting for balancer off
2025-04-21T17:42:19.000+0000 D [backup/2025-04-21T17:42:17Z] balancer is disabled
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] wait for tmp users {1745257417 413}
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] dumping up to 4 collections in parallel
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.chunks.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.collections.zst" [size hint: 20480 (20.00KB); part size: 10485760 (10.0
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.settings.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10.00MB
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.databases.zst" [size hint: 53248 (52.00KB); part size: 10485760 (10.00M
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.databases" done (size: 24601)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.collections" done (size: 193)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.settings" done (size: 273)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.shards.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.chunks" done (size: 969)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.tags" done (size: 0)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.version.zst" [size hint: 20480 (20.00KB); part size: 10485760 (10.00MB)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmRestores" done (size: 0)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmRRoles.zst" [size hint: 0 (0.00B); part size: 10485760 (10.00MB)]
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmAgents.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10.00MB
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.shards" done (size: 958)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.version" done (size: 37)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.system.version.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmPITR" done (size: 0)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmConfig.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10.00MB
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmRRoles" done (size: 648)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.system.users.zst" [size hint: 45056 (44.00KB); part size: 10485760 (10.0
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.system.version" done (size: 510)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmAgents" done (size: 4018)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmLockOp" done (size: 0)
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmConfig" done (size: 1040)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmCmd.zst" [size hint: 53248 (52.00KB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmRUsers.zst" [size hint: 0 (0.00B); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.system.users" done (size: 13207)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmOpLog.zst" [size hint: 90112 (88.00KB); part size: 10485760 (10.00MB)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmBackups.zst" [size hint: 282624 (276.00KB); part size: 10485760 (10.0
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmCmd" done (size: 53343)
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmPITRChunks" done (size: 0)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmLog.zst" [size hint: 9117696 (8.70MB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmRUsers" done (size: 13207)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.system.roles.zst" [size hint: 20480 (20.00KB); part size: 10485760 (10.0
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmBackups" done (size: 139806)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmLock.zst" [size hint: 32768 (32.00KB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmOpLog" done (size: 191108)
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.system.roles" done (size: 648)
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmLock" done (size: 850)
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmLog" done (size: 52433970)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/meta.pbm" [size hint: 0 (0.00B); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/metadata.json" [size hint: 13443 (13.13KB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump finished, waiting for the oplog
2025-04-21T17:44:41.000+0000 I [backup/2025-04-21T17:42:17Z] dropping tmp collections
2025-04-21T17:44:41.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/oplog/20250421174218-232.20250421174441-354.zst" [size hint: -1 (unknown); par
2025-04-21T17:44:41.000+0000 I [backup/2025-04-21T17:42:17Z] created chunk 2025-04-21T17:42:18 - 2025-04-21T17:44:41
2025-04-21T17:44:42.000+0000 I [backup/2025-04-21T17:42:17Z] mark RS as error `check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)`: <nil
2025-04-21T17:44:42.000+0000 I [backup/2025-04-21T17:42:17Z] mark backup as error `check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)`:
2025-04-21T17:44:42.000+0000 D [backup/2025-04-21T17:42:17Z] set balancer on
2025-04-21T17:44:42.000+0000 E [backup/2025-04-21T17:42:17Z] backup: check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)
2025-04-21T17:44:42.000+0000 D [backup/2025-04-21T17:42:17Z] releasing lock
pbm status
Cluster:
========
rs2:
- psmdb-psmdb-db-rs2-0.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs2-2.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017 [P]: pbm-agent [v2.9.1] OK
rs1:
- psmdb-psmdb-db-rs1-0.psmdb-psmdb-db-rs1.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs1-1.psmdb-psmdb-db-rs1.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs1-2.psmdb-psmdb-db-rs1.percona-psmdb-prod.svc.cluster.local:27017 [P]: pbm-agent [v2.9.1] OK
cfg:
- psmdb-psmdb-db-cfg-0.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-cfg-1.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017 [P]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-cfg-2.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
rs0:
- psmdb-psmdb-db-rs0-0.psmdb-psmdb-db-rs0.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs0-1.psmdb-psmdb-db-rs0.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs0-2.psmdb-psmdb-db-rs0.percona-psmdb-prod.svc.cluster.local:27017 [P]: pbm-agent [v2.9.1] OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
Snapshot backup "2025-04-21T17:42:17Z", started at 2025-04-21T17:42:18Z. Status: error. [op id: 680683790dd8b6d4e9d0ea1f]
Backups:
========
S3 us-east-2 s3:///<my_bucket>/logical
Snapshots:
2025-04-21T17:42:17Z 0.00B <logical> [ERROR: check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)] [2025-04-21T17:44:42Z]
pbm describe-backup 2025-04-21T17:42:17Z
Error: get snapshot size: missed file
pbm logs -e backup/2025-04-21T17:42:17Z -t 0 | grep rs2
2025-04-21T17:42:19Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] backup started
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "admin.system.roles" done (size: 648)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmRUsers" done (size: 3210)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "admin.system.users" done (size: 3210)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmRRoles" done (size: 648)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ApiProfileTransactions15m" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.EmitterState" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "admin.system.version" done (size: 577)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ApiAnalyzerEvents" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Blocklist" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ApiUsageStatistics1d" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ConfigTemplate" done (size: 17610)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.AuditLog2" done (size: 272539)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.VulnerabilitiesByEndpoint" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Ignorelist" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ApiProfileTransactions" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.NotificationSubscription" done (size: 375)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.RuntimeEvents" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.APIKeys" done (size: 96)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Site" done (size: 6274)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Rules" done (size: 2831)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Vulnerabilities" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.CustomerConfig" done (size: 290)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Mutelist" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Actor" done (size: 14360233)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.CorrelationEvents" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Whitelist" done (size: 35245)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.EtlLock" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.WLRules" done (size: 6306)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.VulnerabilitiesBySite" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.HMLock" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ApiUsageStatistics" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.SysInfo" done (size: 4939812)
2025-04-21T17:43:41Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dropping tmp collections
2025-04-21T17:44:11Z W [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] drop tmp users and roles: connect to primary: ping: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }
2025-04-21T17:44:41Z E [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] failed to get last write: get NodeInfo data: cmd: isMaster: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }
2025-04-21T17:44:41Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] mark RS as error `dump: dump namespaces: customer_data.EntityDenormalized: cursor: connection(localhost:27017[-319]) socket was unexpectedly closed: EOF`: <nil>
2025-04-21T17:44:41Z E [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] backup: dump: dump namespaces: customer_data.EntityDenormalized: cursor: connection(localhost:27017[-319]) socket was unexpectedly closed: EOF
2025-04-21T17:44:42Z I [cfg/psmdb-psmdb-db-cfg-0.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] mark RS as error `check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)`: <nil>
2025-04-21T17:44:42Z I [cfg/psmdb-psmdb-db-cfg-0.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] mark backup as error `check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)`: <nil>
2025-04-21T17:44:42Z E [cfg/psmdb-psmdb-db-cfg-0.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] backup: check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)
Any thoughts or ideas would be greatly appreciated. This is a new environment that we are building out that has not gone live yet but this is definitely a hold up for going live.
Thanks