Percona Operator cannot release backup lock – all subsequent backups stuck in “waiting”

bofh · October 20, 2025, 2:12pm

Description:

The Percona Server for MongoDB Operator becomes unable to perform new backups due to the message

“Another backup is holding the lock.”

There appears to be no way to clear this backup lock, even after deleting related Kubernetes resources and restarting the operator.

Steps to Reproduce:

1. Create manual backup

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBBackup
metadata:
  name: manual-BACKUPVERSION
spec:
  psmdbCluster: x-y-mongodb-cluster
  storageName: x-y-backup

1. Restore manual backup

apiVersion: psmdb.percona.com/v1
kind: PerconaServerMongoDBRestore
metadata:
  name: manual-restore
spec:
  clusterName: x-y-mongodb-cluster
  backupName: manual-2025-10-16-10-45-44

1. **Attemp to run additional backups **
  Manual backup or backup by cron job

Version:

Percona Operator for MongoDB: 1.20.1
Platform: Red Hat OpenShift on AWS (ROSA)
OpenShift Version: 4.19.14

Logs:

INFO    Acquiring the backup lock       {"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"manual-2025-10-20-15-11-13","namespace":"x-y"}, "namespace": "x-y", "name": "manual-2025-10-20-15-11-13", "reconcileID": "xxx"}

INFO    Another backup is holding the lock      {"controller": "psmdbbackup-controller", "controllerGroup": "psmdb.percona.com", "controllerKind": "PerconaServerMongoDBBackup", "PerconaServerMongoDBBackup": {"name":"manual-2025-10-20-15-11-13","namespace":"x-y"}, "namespace": "x-y", "name": "manual-2025-10-20-15-11-13", "reconcileID": "xxx", "holder": "manual-2025-10-16-10-45-44-776ab071-f45d-4d11-a45c-8f04d0e2f20b"}

Expected Result:

New backups should start and complete successfully after the previous backup/restore has finished.

Actual Result:

All subsequent backups enter the waiting or error state, and the operator logs show:

kubectl get perconaservermongodbbackups.psmdb.percona.com

| **NAME**                         | **CLUSTER**         | **STORAGE** | **DESTINATION** | **TYPE** | **STATUS** | **COMPLETED** | **AGE** |
| -------------------------------- | ------------------- | ----------- | --------------- | -------- | ---------- | ------------- | ------- |
| cron-mongod-20251020130000-5hl7x | x-y-mongodb-cluster | x-y-backup  | —               | —        | error    | —             | 85 m    |
| cron-mongod-20251020140000-zg9lb | x-y-mongodb-cluster | x-y-backup  | —               | —        | error    | —             | 25 m    |
| manual-2025-10-20-15-11-13       | —                   | x-y-backup  | —               | —        | error    | —             | 73 m    |
| manual-2025-10-20-16-24-33       | —                   | x-y-backup  | —               | —        | waiting  | —             | 30 s    |

kubectl logs percona-server-mongodb-operator-x-y

INFO    Acquiring the backup lock
INFO    Another backup is holding the lock {"holder": "manual-2025-10-16-10-45-44-776ab071-f45d-4d11-a45c-8f04d0e2f20b"}

Even after deleting all backup and restore resources, the lock remains.

Additional Information:

Deleted the perconaservermongodbbackups.psmdb.percona.com backup CR
Deleted the perconaservermongodbrestores.psmdb.percona.com restore CR
Deleted all related PVCs
Deleted the MongoDB StatefulSet
Restarted the percona-server-mongodb-operator deployment and pod
Verified no entries exist in:
- db.pbmLock
- db.pbmLockOp

Despite these steps, new backups continue to report “Another backup is holding the lock”.

Sami_Ahlroos · October 21, 2025, 11:59am

Hi,

That lock is a Kubernetes Lease. You should see psmdb-clusterName-backup-lock if you run kubectl get lease. Could you try deleting it, and see if that fixes the issue?
I haven’t been able to reproduce this problem on my side, unfortunately.

Slava_Sarzhan · October 21, 2025, 3:23pm

Hi @bofh, could you please check your operator’s log and “grep” the ‘delete lease’ error?

bofh · October 21, 2025, 4:10pm

Thanks @Sami_Ahlroos

I found the kubernetes lease with kubectl get lease and was able to delete it with kubectl delete lease

Queed backup-jobs were able to finish successfully and new jobs are running fine..

@Slava_Sarzhan
Have been able to reproduce the behavior again with manual backup / restore procedure.

Here are the logs with the “release lease” error:

2025-10-21T15:58:21.220Z ERROR failed to release the lock {“controller”: “psmdbbackup-controller”, “controllerGroup”: “``psmdb.percona.com``”, “controllerKind”: “PerconaServerMongoDBBackup”, “PerconaServerMongoDBBackup”: {“name”:“manual-2025-10-21-17-58-10”,“namespace”:“x-y”}, “namespace”: “x-y”, “name”: “manual-2025-10-21-17-58-10”, “reconcileID”: “049d05ee-08d6-4b34-a9be-d743bf66cac6”, “error”: “get lease: ``Lease.coordination.k8s.io`` "psmdb–backup-lock" not found”, “errorVerbose”: “``Lease.coordination.k8s.io`` "psmdb–backup-lock" not found\nget lease\``ngithub.com/percona/percona-server-mongodb-operator/pkg/k8s.ReleaseLease\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/k8s/lease.go:52\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup.(*ReconcilePerconaServerMongoDBBackup).Reconcile.func1\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup/perconaservermongodbbackup_controller.go:159\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup.(*ReconcilePerconaServerMongoDBBackup).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodbbackup/perconaservermongodbbackup_controller.go:221\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[…]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[…]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:334\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[…]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[…]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:255\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1700”``}

Afterwards deleted the lease manually:
kubectl delete lease psmdb-x-y-mongodb-cluster-backup-lock

2025-10-21T16:03:58.338Z INFO Releasing backup lock {“controller”: “psmdbbackup-controller”, “controllerGroup”: “``psmdb.percona.com``”, “controllerKind”: “PerconaServerMongoDBBackup”, “PerconaServerMongoDBBackup”: {“name”:“cron-x-y-mongod-20251021160000-qtl7k”,“namespace”:“x-y”}, “namespace”: “x-y”, “name”: “cron-x-y-mongod-20251021160000-qtl7k”, “reconcileID”: “9672d4fa-621f-4c0c-bd25-797b6f2c4fad”, “lease”: “psmdb-x-y-mongodb-cluster-backup-lock”}

Slava_Sarzhan · October 21, 2025, 5:49pm

@bofh, pelase check your RBAC and make sure that you have:

- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete

I can’t reproduce it using GKE. We will test it on Red Hat OpenShift on AWS (ROSA) in the next few days. We will update you.

P.S. Do you have a cluster-wide deployment or a namespace-scoped one?

bofh · October 22, 2025, 11:24am

On ROSA, there is no cluster-wide deployment available; therefore, we use a namespace-scoped deployment.

Regarding the RBAC permissions: should these rights be granted to the operator account or to the service account used by the MongoDB service?

Currently, the operator account has these permissions, while the MongoDB service account (default) does not.

percona-server-mongodb-operator

get: yes
list: yes
watch: yes
create: yes
update: yes
patch: yes
delete: yes

default

get: no
list: no
watch: no
create: no
update: no
patch: no
delete: no

Julio_Pasinatto · October 27, 2025, 12:33pm

Hi @bofh

I wasn’t able to reproduce this issue on ROSA with OpenShift 4.19.14.
Could you share a bit more about how you deployed the operator? For example, did you install it from the repo, using Helm charts, or through OperatorHub (Community or Certified bundle)?

Just a heads-up — PSMDB Operator v1.20.1 hasn’t been tested with OpenShift 4.19.x. It’s only been verified on 4.14–4.18, so while that might not be the exact cause, it could lead to some unexpected issues.

Topic		Replies	Views
Percona K8S Operator, scheduled backup fails because another operation is running Percona Backup for MongoDB percona , closed-no-reply	0	177	June 17, 2024
CrashLoopBackOff after deploying Mongodb Operator psmdb-operator-1.16.3 Percona Operator for MongoDB	2	271	August 26, 2024
`pbm delete-backup` stuck due to locks in Mongodb Percona Backup for MongoDB percona	4	221	November 21, 2025
Can't start backup. Previous backup is still in progress Percona Operator for PostgreSQL postgresql	6	455	February 26, 2026
Backup not working with Percona Distribution for MongoDB Operator Percona Backup for MongoDB	2	778	June 30, 2021