Hey all,
Recently started having issues with psmdb backup running in K8s. Physical backups seem to work without issue but logical backups keep failing and producing errors. Initial problem was running up against the 33 second default for backup.timeouts.startingStatus
. I upped this to 300 seconds but now every time I run a backup, either through the cron config, or manually from the config server, I keep getting errors on rs2 specifically. Various outputs are below, would love some advice if anyone has any ideas.
Log from backup agent for cfg0
2025-04-21T17:42:18.000+0000 I got command backup [name: 2025-04-21T17:42:17Z, compression: zstd (level: 2)] <ts: 1745257337>, opid: 680683790dd8b6d4e9d0ea1f
2025-04-21T17:42:18.000+0000 I got epoch {1745193610 67}
2025-04-21T17:42:18.000+0000 I [backup/2025-04-21T17:42:17Z] backup started
2025-04-21T17:42:18.000+0000 D [backup/2025-04-21T17:42:17Z] waiting for balancer off
2025-04-21T17:42:19.000+0000 D [backup/2025-04-21T17:42:17Z] balancer is disabled
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] wait for tmp users {1745257417 413}
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] dumping up to 4 collections in parallel
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.chunks.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.collections.zst" [size hint: 20480 (20.00KB); part size: 10485760 (10.0
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.settings.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10.00MB
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.databases.zst" [size hint: 53248 (52.00KB); part size: 10485760 (10.00M
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.databases" done (size: 24601)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.collections" done (size: 193)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.settings" done (size: 273)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.shards.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.chunks" done (size: 969)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.tags" done (size: 0)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/config.version.zst" [size hint: 20480 (20.00KB); part size: 10485760 (10.00MB)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmRestores" done (size: 0)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmRRoles.zst" [size hint: 0 (0.00B); part size: 10485760 (10.00MB)]
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmAgents.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10.00MB
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.shards" done (size: 958)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "config.version" done (size: 37)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.system.version.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmPITR" done (size: 0)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmConfig.zst" [size hint: 36864 (36.00KB); part size: 10485760 (10.00MB
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmRRoles" done (size: 648)
2025-04-21T17:43:37.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.system.users.zst" [size hint: 45056 (44.00KB); part size: 10485760 (10.0
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.system.version" done (size: 510)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmAgents" done (size: 4018)
2025-04-21T17:43:37.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmLockOp" done (size: 0)
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmConfig" done (size: 1040)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmCmd.zst" [size hint: 53248 (52.00KB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmRUsers.zst" [size hint: 0 (0.00B); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.system.users" done (size: 13207)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmOpLog.zst" [size hint: 90112 (88.00KB); part size: 10485760 (10.00MB)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmBackups.zst" [size hint: 282624 (276.00KB); part size: 10485760 (10.0
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmCmd" done (size: 53343)
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmPITRChunks" done (size: 0)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmLog.zst" [size hint: 9117696 (8.70MB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmRUsers" done (size: 13207)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.system.roles.zst" [size hint: 20480 (20.00KB); part size: 10485760 (10.0
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmBackups" done (size: 139806)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/admin.pbmLock.zst" [size hint: 32768 (32.00KB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmOpLog" done (size: 191108)
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.system.roles" done (size: 648)
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmLock" done (size: 850)
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmLog" done (size: 52433970)
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/meta.pbm" [size hint: 0 (0.00B); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/metadata.json" [size hint: 13443 (13.13KB); part size: 10485760 (10.00MB)]
2025-04-21T17:43:38.000+0000 I [backup/2025-04-21T17:42:17Z] dump finished, waiting for the oplog
2025-04-21T17:44:41.000+0000 I [backup/2025-04-21T17:42:17Z] dropping tmp collections
2025-04-21T17:44:41.000+0000 D [backup/2025-04-21T17:42:17Z] uploading "2025-04-21T17:42:17Z/cfg/oplog/20250421174218-232.20250421174441-354.zst" [size hint: -1 (unknown); par
2025-04-21T17:44:41.000+0000 I [backup/2025-04-21T17:42:17Z] created chunk 2025-04-21T17:42:18 - 2025-04-21T17:44:41
2025-04-21T17:44:42.000+0000 I [backup/2025-04-21T17:42:17Z] mark RS as error `check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)`: <nil
2025-04-21T17:44:42.000+0000 I [backup/2025-04-21T17:42:17Z] mark backup as error `check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)`:
2025-04-21T17:44:42.000+0000 D [backup/2025-04-21T17:42:17Z] set balancer on
2025-04-21T17:44:42.000+0000 E [backup/2025-04-21T17:42:17Z] backup: check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)
2025-04-21T17:44:42.000+0000 D [backup/2025-04-21T17:42:17Z] releasing lock
pbm status
Cluster:
========
rs2:
- psmdb-psmdb-db-rs2-0.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs2-2.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017 [P]: pbm-agent [v2.9.1] OK
rs1:
- psmdb-psmdb-db-rs1-0.psmdb-psmdb-db-rs1.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs1-1.psmdb-psmdb-db-rs1.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs1-2.psmdb-psmdb-db-rs1.percona-psmdb-prod.svc.cluster.local:27017 [P]: pbm-agent [v2.9.1] OK
cfg:
- psmdb-psmdb-db-cfg-0.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-cfg-1.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017 [P]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-cfg-2.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
rs0:
- psmdb-psmdb-db-rs0-0.psmdb-psmdb-db-rs0.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs0-1.psmdb-psmdb-db-rs0.percona-psmdb-prod.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK
- psmdb-psmdb-db-rs0-2.psmdb-psmdb-db-rs0.percona-psmdb-prod.svc.cluster.local:27017 [P]: pbm-agent [v2.9.1] OK
PITR incremental backup:
========================
Status [OFF]
Currently running:
==================
Snapshot backup "2025-04-21T17:42:17Z", started at 2025-04-21T17:42:18Z. Status: error. [op id: 680683790dd8b6d4e9d0ea1f]
Backups:
========
S3 us-east-2 s3:///<my_bucket>/logical
Snapshots:
2025-04-21T17:42:17Z 0.00B <logical> [ERROR: check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)] [2025-04-21T17:44:42Z]
pbm describe-backup 2025-04-21T17:42:17Z
Error: get snapshot size: missed file
pbm logs -e backup/2025-04-21T17:42:17Z -t 0 | grep rs2
2025-04-21T17:42:19Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] backup started
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "admin.system.roles" done (size: 648)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmRUsers" done (size: 3210)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "admin.system.users" done (size: 3210)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "admin.pbmRRoles" done (size: 648)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ApiProfileTransactions15m" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.EmitterState" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "admin.system.version" done (size: 577)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ApiAnalyzerEvents" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Blocklist" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ApiUsageStatistics1d" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ConfigTemplate" done (size: 17610)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.AuditLog2" done (size: 272539)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.VulnerabilitiesByEndpoint" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Ignorelist" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ApiProfileTransactions" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.NotificationSubscription" done (size: 375)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.RuntimeEvents" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.APIKeys" done (size: 96)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Site" done (size: 6274)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Rules" done (size: 2831)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Vulnerabilities" done (size: 0)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.CustomerConfig" done (size: 290)
2025-04-21T17:43:38Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Mutelist" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Actor" done (size: 14360233)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.CorrelationEvents" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.Whitelist" done (size: 35245)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.EtlLock" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.WLRules" done (size: 6306)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.VulnerabilitiesBySite" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.HMLock" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.ApiUsageStatistics" done (size: 0)
2025-04-21T17:43:39Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dump collection "customer_data.SysInfo" done (size: 4939812)
2025-04-21T17:43:41Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] dropping tmp collections
2025-04-21T17:44:11Z W [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] drop tmp users and roles: connect to primary: ping: server selection error: server selection timeout, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }
2025-04-21T17:44:41Z E [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] failed to get last write: get NodeInfo data: cmd: isMaster: server selection error: server selection timeout, current topology: { Type: Single, Servers: [{ Addr: localhost:27017, Type: Unknown, Last error: dial tcp [::1]:27017: connect: connection refused }, ] }
2025-04-21T17:44:41Z I [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] mark RS as error `dump: dump namespaces: customer_data.EntityDenormalized: cursor: connection(localhost:27017[-319]) socket was unexpectedly closed: EOF`: <nil>
2025-04-21T17:44:41Z E [rs2/psmdb-psmdb-db-rs2-1.psmdb-psmdb-db-rs2.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] backup: dump: dump namespaces: customer_data.EntityDenormalized: cursor: connection(localhost:27017[-319]) socket was unexpectedly closed: EOF
2025-04-21T17:44:42Z I [cfg/psmdb-psmdb-db-cfg-0.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] mark RS as error `check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)`: <nil>
2025-04-21T17:44:42Z I [cfg/psmdb-psmdb-db-cfg-0.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] mark backup as error `check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)`: <nil>
2025-04-21T17:44:42Z E [cfg/psmdb-psmdb-db-cfg-0.psmdb-psmdb-db-cfg.percona-psmdb-prod.svc.cluster.local:27017] [backup/2025-04-21T17:42:17Z] backup: check cluster for dump done: convergeCluster: backup on shard rs2 failed with: %!s(<nil>)
Also seeing a break in PMM for that replicaset when the backup kicks off inidicating that a given member of the rs is down briefly and then comes back after the error.