`pbm delete-backup` stuck due to locks in Mongodb

Deleting external backups gets stuck

pbm.log (528.5 KB)

~/xiaobao/mongodb/mongodb-operator/percona-server-mongodb-operator/deploy (deeproute/dev ✗) kubectl exec dev-rs-mdb-rs0-0 -it  -n default -c backup-agent – bash
[mongodb@dev-rs-mdb-rs0-0 /]$ mongo --username databaseAdmin --password VvkaPhy6VKNO6lMN admin^C
[mongodb@dev-rs-mdb-rs0-0 /]$ pbm status
Cluster:

rs0:

dev-rs-mdb-rs0-0.dev-rs-mdb-rs0.default.svc.cluster.local:27017 [P]: pbm-agent [v2.9.1] OK

dev-rs-mdb-rs0-1.dev-rs-mdb-rs0.default.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK

dev-rs-mdb-rs0-2.dev-rs-mdb-rs0.default.svc.cluster.local:27017 [S]: pbm-agent [v2.9.1] OK

PITR incremental backup:

Status [ON]
! ERROR while running PITR backup: 2025-10-28T03:20:23.000+0000 E [rs0/dev-rs-mdb-rs0-1.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [pitr] init: catchup: oplog has insufficient range, some records since the last saved ts {1761611284 718} are missing. Run pbm backup to create a valid starting point for the PITR

Currently running:

(none)

Backups:

S3 us-east-1 http://10.3.11.84:80/test-backup/test-rs-mdb-pbm
Snapshots:
2025-10-27T12:10:17Z 0.00B  [restore_to_time: 2025-10-27T12:10:20Z]
2025-10-27T11:56:22Z 0.00B  [restore_to_time: 2025-10-27T11:56:24Z]
2025-10-27T11:46:11Z 0.00B  [restore_to_time: 2025-10-27T11:46:13Z]
2025-10-27T11:39:28Z 0.00B  [restore_to_time: 2025-10-27T11:39:30Z]
2025-10-27T10:09:21Z 0.00B  [restore_to_time: 2025-10-27T10:09:23Z]
2025-10-27T09:54:40Z 0.00B  [restore_to_time: 2025-10-27T09:54:42Z]
2025-10-27T09:10:57Z 0.00B  [restore_to_time: 2025-10-27T09:11:00Z]
PITR chunks [48.44GB]:
2025-10-27T03:38:51Z - 2025-10-28T00:28:04Z (no base snapshot)
[mongodb@dev-rs-mdb-rs0-0 /]$ pbm delete-backup 2025-10-27T12:10:17Z
Snapshots:

“2025-10-27T12:10:17Z” [size: 0.00B type: , restore time: 2025-10-27T12:10:20Z]
Are you sure you want to delete backup? [y/N] y
Waiting for delete to be done …^C

Event:

[mongodb@dev-rs-mdb-rs0-0 /]$ pbm logs --event=delete
2025-10-28T02:58:51Z I [rs0/dev-rs-mdb-rs0-0.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [delete/2025-10-28T02:46:51Z] deleting backup
2025-10-28T02:58:51Z I [rs0/dev-rs-mdb-rs0-0.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [delete/2025-10-28T02:46:51Z] done
2025-10-28T03:00:28Z I [rs0/dev-rs-mdb-rs0-2.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [delete/2025-10-27T12:16:14Z] deleting backup
2025-10-28T03:00:29Z I [rs0/dev-rs-mdb-rs0-2.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [delete/2025-10-27T12:16:14Z] done
2025-10-28T03:21:12Z I [rs0/dev-rs-mdb-rs0-1.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [delete/2025-10-27T12:10:17Z] deleting backup

[mongodb@dev-rs-mdb-rs0-0 /]$ pbm logs --event=pitr
2025-10-28T03:44:01Z I [rs0/dev-rs-mdb-rs0-2.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [pitr] oplog slicer is paused for lock [Delete, opid: 690036a7c676f5c56d02ceb8]
2025-10-28T03:44:14Z I [rs0/dev-rs-mdb-rs0-1.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [pitr] oplog slicer is paused for lock [Delete, opid: 690036a7c676f5c56d02ceb8]
2025-10-28T03:44:15Z I [rs0/dev-rs-mdb-rs0-0.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [pitr] oplog slicer is paused for lock [Delete, opid: 690036a7c676f5c56d02ceb8]
2025-10-28T03:44:16Z I [rs0/dev-rs-mdb-rs0-2.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [pitr] oplog slicer is paused for lock [Delete, opid: 690036a7c676f5c56d02ceb8]
2025-10-28T03:44:29Z I [rs0/dev-rs-mdb-rs0-1.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [pitr] oplog slicer is paused for lock [Delete, opid: 690036a7c676f5c56d02ceb8]
2025-10-28T03:44:30Z I [rs0/dev-rs-mdb-rs0-0.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [pitr] oplog slicer is paused for lock [Delete, opid: 690036a7c676f5c56d02ceb8]
2025-10-28T03:44:31Z I [rs0/dev-rs-mdb-rs0-2.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [pitr] oplog slicer is paused for lock [Delete, opid: 690036a7c676f5c56d02ceb8]
2025-10-28T03:44:44Z I [rs0/dev-rs-mdb-rs0-1.dev-rs-mdb-rs0.default.svc.cluster.local:27017] [pitr] oplog slicer is paused for lock [Delete, opid: 690036a7c676f5c56d02ceb8]

can you check the contents of admin.pbmLock* collections?

This is a mongodb cluster used for some testing and verification.
After the problem occurred, I found that there was something wrong with s3 service. I delete the cluster and rebuilt it.
I will spend some time to reproduce this problem later.

I encountered a similar situation while run command pbm oplog-replay.

  • pbm cmd
    [mongodb@db-recover-rs0-0 /]$ pbm oplog-replay --start 2025-10-30T07:23:50 --end 2025-10-30T07:56:51 --wait
    Starting oplog replay ‘2025-10-30T07:23:50 - 2025-10-30T07:56:51’…Started.
    Waiting to finish…
  • pbmLock collection
    rs0:PRIMARY> db.pbmLock.find().forEach(printjson);
    {
    “_id” : ObjectId(“69031bcfc2b4816f1ed6b798”),
    “type” : “replay”,
    “replset” : “rs0”,
    “node” : “db-recover-rs0-1.db-recover-rs0.default.svc.cluster.local:27017”,
    “opid” : “69031bce1f14ed1452c86fc4”,
    “epoch” : Timestamp(1761810799, 42),
    “hb” : Timestamp(1761811882, 67)
    }
  • pbm logs
    2025-10-30T08:03:27Z I [rs0/db-recover-rs0-0.db-recover-rs0.default.svc.cluster.local:27017] got command replay <ts: 1761811406>, opid: 69031bce1f14ed1452c86fc4
    2025-10-30T08:03:27Z I [rs0/db-recover-rs0-0.db-recover-rs0.default.svc.cluster.local:27017] [replay/2025-10-30T08:03:26.72270159Z/69031bce1f14ed1452c86fc4] time: 2025-10-30T07:23:50Z-2025-10-30T07:56:51Z
    2025-10-30T08:03:27Z I [rs0/db-recover-rs0-0.db-recover-rs0.default.svc.cluster.local:27017] [replay/2025-10-30T08:03:26.72270159Z/69031bce1f14ed1452c86fc4] node in not suitable for restore
    2025-10-30T08:03:27Z I [rs0/db-recover-rs0-1.db-recover-rs0.default.svc.cluster.local:27017] got command replay <ts: 1761811406>, opid: 69031bce1f14ed1452c86fc4
    2025-10-30T08:03:27Z I [rs0/db-recover-rs0-1.db-recover-rs0.default.svc.cluster.local:27017] [replay/2025-10-30T08:03:26.72270159Z/69031bce1f14ed1452c86fc4] time: 2025-10-30T07:23:50Z-2025-10-30T07:56:51Z
    2025-10-30T08:03:27Z I [rs0/db-recover-rs0-1.db-recover-rs0.default.svc.cluster.local:27017] [replay/2025-10-30T08:03:26.72270159Z/69031bce1f14ed1452c86fc4] oplog replay started
    2025-10-30T08:03:27Z D [rs0/db-recover-rs0-1.db-recover-rs0.default.svc.cluster.local:27017] [replay/2025-10-30T08:03:26.72270159Z/69031bce1f14ed1452c86fc4] waiting for ‘starting’ status
    2025-10-30T08:03:27Z I [rs0/db-recover-rs0-2.db-recover-rs0.default.svc.cluster.local:27017] got command replay <ts: 1761811406>, opid: 69031bce1f14ed1452c86fc4
    2025-10-30T08:03:27Z I [rs0/db-recover-rs0-2.db-recover-rs0.default.svc.cluster.local:27017] [replay/2025-10-30T08:03:26.72270159Z/69031bce1f14ed1452c86fc4] time: 2025-10-30T07:23:50Z-2025-10-30T07:56:51Z
    2025-10-30T08:03:27Z I [rs0/db-recover-rs0-2.db-recover-rs0.default.svc.cluster.local:27017] [replay/2025-10-30T08:03:26.72270159Z/69031bce1f14ed1452c86fc4] node in not suitable for restore
    2025-10-30T08:03:28Z I [rs0/db-recover-rs0-1.db-recover-rs0.default.svc.cluster.local:27017] [replay/2025-10-30T08:03:26.72270159Z/69031bce1f14ed1452c86fc4] moving to state running
    2025-10-30T08:03:30Z I [rs0/db-recover-rs0-1.db-recover-rs0.default.svc.cluster.local:27017] [replay/2025-10-30T08:03:26.72270159Z/69031bce1f14ed1452c86fc4] starting oplog replay
    2025-10-30T08:03:30Z D [rs0/db-recover-rs0-1.db-recover-rs0.default.svc.cluster.local:27017] [replay/2025-10-30T08:03:26.72270159Z/69031bce1f14ed1452c86fc4] + applying {rs0 pbmPitr/rs0/20251030/20251030071645-1.20251030072645-1567.oplog.gz gzip {1761808605 1} {1761809205 1567} 12736479}
    2025-10-30T08:03:36Z I [rs0/db-recover-rs0-2.db-recover-rs0.default.svc.cluster.local:27017] [pitr/] oplog slicer is paused for lock [Oplog replay, opid: 69031bce1f14ed1452c86fc4]
    2025-10-30T08:03:37Z I [rs0/db-recover-rs0-0.db-recover-rs0.default.svc.cluster.local:27017] [pitr/] oplog slicer is paused for lock [Oplog replay, opid: 69031bce1f14ed1452c86fc4]