Percona postgresql operator frequent crashing due to leader election failure

Description:

The Percona PostgreSQL operator is crashing due to leader election failure - it’s unable to communicate with the Kubernetes API server to maintain its leadership lease. if i try to change the operator lease duration time , renewal time or , retry i am unable to change it. Also i am using one replica of my operator i do not need a leader election and there is no way to disable it either, chruch data pgo has a way to do this bu setting PGO_CONTROLLER_LEASE_NAME to empty

Steps to Reproduce:

During cluster node autoscaling up when the Kubernetes API server throttles the postgresql operator crashes i am using percona postgresql operator 2.8.0 . when the operator continiously restarts since the lease is not renewed it crashes and gives the below error and crashes frequently Version:

percona postgresql 2.8.0

Logs:

..E1122 19:58:05.112391 Failed to update lock optimistically:Put “https://10.43.0.1:443/apis/coordination.k8s.io/v1/namespaces/default/leases/08db3feb.percona.com?timeout=5s”:net/http: request canceled (Client.Timeout exceeded while awaiting headers)

E1122 19:58:10.112084 error retrieving resource lock default/08db3feb.percona.com:Get “https://10.43.0.1:443/apis/coordination.k8s.io/v1/namespaces/default/leases/08db3feb.percona.com?timeout=5s”:context deadline exceeded

panic: leader election lost.

Expected Result:

No restart should occur or it need to retry or give a way to change lease retry logic or allow to change lease renewal duration. it would be more helpful need a way to turn of leader election and lease like crunchy data pgo

Actual Result:

Frequent crashing

Additional Information:

kubernetes version 1.34.1-k3s1

Hi @vasanthnataraj , I agree that we need to have a more flexible configuration for the leader election process for our operators. We need to make a possible to configure these options by users:

LeaseDuration: 60 * time.Second,
RenewDeadline: 40 * time.Second,
RetryPeriod: 10 * time.Second,

I will create a task to add the improvement. Thanks.

The task number is Jira