Backup issue with Physical vs Logical and config changes using Helm

We have a 1TB database with some collections in the 300GB size.

When using Logical backup it simply exit -1 after counting the number of objects
2023-07-04T00:00:26.268+0000 archive prelude admin.system.roles
2023-07-04T00:00:26.268+0000 archive prelude admin.system.version
2023-07-04T00:00:26.280+0000 writing admin.system.users to archive on stdout
2023-07-04T00:00:26.369+0000 Mux open namespace admin.system.users
2023-07-04T00:00:26.369+0000 counted 5 documents in admin.system.users
2023-07-04T00:00:26.372+0000 done dumping admin.system.users (5 documents)
2023-07-04T00:00:26.372+0000 Mux close namespace admin.system.users
2023-07-04T00:00:26.373+0000 writing admin.system.roles to archive on stdout
2023-07-04T00:00:26.470+0000 Mux open namespace admin.system.roles
2023-07-04T00:00:26.470+0000 counted 2 documents in admin.system.roles
2023-07-04T00:00:26.472+0000 done dumping admin.system.roles (2 documents)
2023-07-04T00:00:26.472+0000 Mux close namespace admin.system.roles
2023-07-04T00:00:26.472+0000 writing admin.system.version to archive on stdout
2023-07-04T00:00:26.480+0000 Mux open namespace admin.system.version
2023-07-04T00:00:26.481+0000 counted 3 documents in admin.system.version
2023-07-04T00:00:26.489+0000 done dumping admin.system.version (3 documents)
2023-07-04T00:00:26.489+0000 dumping up to 4 collections in parallel
2023-07-04T00:00:26.489+0000 Mux close namespace admin.system.version
2023-07-04T00:00:26.489+0000 writing RaceRunner-prod.UserStep to archive on stdout
2023-07-04T00:00:26.568+0000 writing RaceRunner-prod.HistoryTemp to archive on stdout
2023-07-04T00:00:26.570+0000 writing RaceRunner-prod.Feed to archive on stdout
2023-07-04T00:00:26.571+0000 writing RaceRunner-prod.RunHistory to archive on stdout
2023-07-04T00:00:26.574+0000 Mux open namespace RaceRunner-prod.HistoryTemp
2023-07-04T00:00:26.574+0000 counted 249706000 documents in RaceRunner-prod.HistoryTemp
2023-07-04T00:00:26.975+0000 Mux open namespace RaceRunner-prod.Feed
2023-07-04T00:00:26.976+0000 counted 108302000 documents in RaceRunner-prod.Feed
2023-07-04T00:00:27.070+0000 Mux open namespace RaceRunner-prod.UserStep
2023-07-04T00:00:27.070+0000 counted 116714000 documents in RaceRunner-prod.UserStep
2023-07-04T00:00:27.075+0000 Mux open namespace RaceRunner-prod.RunHistory
2023-07-04T00:00:27.075+0000 counted 62410000 documents in RaceRunner-prod.RunHistory
2023/07/04 00:01:24 [entrypoint] pbm-agent exited with code -1
2023/07/04 00:01:24 [entrypoint] restart in 5 sec
2023/07/04 00:01:29 [entrypoint] starting pbm-agent
2023-07-04T00:01:29.000+0000 I pbm-agent:
Version: 2.0.4
Platform: linux/amd64
GitCommit: 785ee592ade9eb86be656af0af4da73b2f6055e1
GitBranch: release-2.0.4
BuildTime: 2023-02-15_17:02_UTC
GoVersion: go1.19

Then I changed the tasks to the following:

  • name: daily-s3-us-east
    enabled: true
    schedule: “0 0 * * *”
    keep: 3
    type: physical
    storageName: s3-us-east
    compressionType: zstd
  • name: weekly-s3-us-east
    enabled: true
    schedule: “0 0 * * 0”
    keep: 5
    type: physical
    storageName: s3-us-east
    compressionType: zstd

Executed the helm upgrade …

The pods didn’t do a rollover (is there something else to do for them to rollover to the new config?)

If I start a manual backup with:

kind: PerconaServerMongoDBBackup

  • delete-backup
    name: backup2
    clusterName: justmove-db-psmdb-db
    storageName: s3-us-east
    type: physical
    compressionType: zstd

The backup works perfectly.

Thank for your help

I’m hitting this issue, logical doesn’t seem to work with lots of databases & collections, no matter the size, my cluster has like 5GB. The same setup works fine on smaller workloads. Any help is appreciated.

pbm: 2.3.1

│ 2024-02-05T13:38:33.000+0000 I [backup/2024-02-05T13:38:32Z] backup started                                                                                                                                                                                │
│ 2024/02/05 13:38:49 [entrypoint] `pbm-agent` exited with code -1                                                                                                                                                                                           │
│ 2024/02/05 13:38:49 [entrypoint] restart in 5 sec                                                                                                                                                                                                          │
│ 2024/02/05 13:38:54 [entrypoint] starting `pbm-agent`