Description:
Hi All,
I deployed Percona operator for MongoDB using the helm chart and I noticed that the operator is restarting intermittently and crashloopbackoff due to the following leader election error logs,
Steps to Reproduce:
Here are the installation steps that I follow
helm install psmdb-db psmdb-db-1.16.3.tgz -n psmdb --create-namespace || true
Values.yaml:
replicaCount: 1
image:
repository: percona/percona-server-mongodb-operator
tag: 1.16.2
pullPolicy: IfNotPresent
disableTelemetry: true
watchAllNamespaces: true
rbac:
create: true
serviceAccount:
create: true
imagePullSecrets:
nameOverride: “”
fullnameOverride: “”
env:
resyncPeriod: 5s
tolerations:
logStructured: true
logLevel: “DEBUG”
Version:
psmd-operator - helm chart version 1.16.3
psmd-operator version 1.16.2
psmd-db - helm chart version 1.16.3
psmd-db version 1.16.2
kubernetes version: 1.29.5
kubectl version: v1.31.0
Logs:
{“level”:“info”,“ts”:1723890615.2566066,“logger”:“setup”,“msg”:“Manager starting up”,“gitCommit”:“13627d423321257e18b77d270af922c6cd17c8f0”,“gitBranch”:“release-1-16-2”,“goVersion”:“go1.22.5”,“os”:“linux”,“arch”:“amd64”}
{“level”:“info”,“ts”:1723890615.2936742,“msg”:“server version”,“platform”:“kubernetes”,“version”:“v1.29.5”}
{“level”:“info”,“ts”:1723890615.3044705,“logger”:“controller-runtime.metrics”,“msg”:“Starting metrics server”}
{“level”:“info”,“ts”:1723890615.3045084,“msg”:“starting server”,“name”:“health probe”,“addr”:“[::]:8081”}
{“level”:“info”,“ts”:1723890615.3045788,“logger”:“controller-runtime.metrics”,“msg”:“Serving metrics server”,“bindAddress”:“:8080”,“secure”:false}
I0817 10:30:15.305233 1 leaderelection.go:250] attempting to acquire leader lease psmdb-operator/08db0feb.percona.com…
I0817 10:30:31.245952 1 leaderelection.go:260] successfully acquired lease psmdb-operator/08db0feb.percona.com
{“level”:“debug”,“ts”:1723890631.2460227,“logger”:“events”,“msg”:“psmdb-operator-7dcdd78d99-f7n86_aca193f9-5ee0-484d-b50e-3ee0d499819b became leader”,“type”:“Normal”,“object”:{“kind”:“Lease”,“namespace”:“psmdb-operator”,“name”:“08db0feb.percona.com”,“uid”:“ae92da5a-afd1-4749-a980-258e93120b61”,“apiVersion”:“coordination.k8s.io/v1",“resourceVersion”:“470766”},“reason”:"LeaderElection”}
{“level”:“info”,“ts”:1723890631.2465332,“msg”:“Starting EventSource”,“controller”:“psmdbbackup-controller”,“source”:“kind source: *v1.PerconaServerMongoDBBackup”}
{“level”:“info”,“ts”:1723890631.2465675,“msg”:“Starting EventSource”,“controller”:“psmdbbackup-controller”,“source”:“kind source: *v1.Pod”}
{“level”:“info”,“ts”:1723890631.2465749,“msg”:“Starting Controller”,“controller”:“psmdbbackup-controller”}
{“level”:“info”,“ts”:1723890631.246808,“msg”:“Starting EventSource”,“controller”:“psmdbrestore-controller”,“source”:“kind source: *v1.PerconaServerMongoDBRestore”}
{“level”:“info”,“ts”:1723890631.2468803,“msg”:“Starting EventSource”,“controller”:“psmdbrestore-controller”,“source”:“kind source: *v1.Pod”}
{“level”:“info”,“ts”:1723890631.2468925,“msg”:“Starting Controller”,“controller”:“psmdbrestore-controller”}
{“level”:“info”,“ts”:1723890631.2470984,“msg”:“Starting EventSource”,“controller”:“psmdb-controller”,“source”:“kind source: *v1.PerconaServerMongoDB”}
{“level”:“info”,“ts”:1723890631.2471287,“msg”:“Starting Controller”,“controller”:“psmdb-controller”}
{“level”:“info”,“ts”:1723890631.3539019,“msg”:“Starting workers”,“controller”:“psmdb-controller”,“worker count”:1}
{“level”:“info”,“ts”:1723890631.35402,“msg”:“Starting workers”,“controller”:“psmdbrestore-controller”,“worker count”:1}
{“level”:“info”,“ts”:1723890631.3540661,“msg”:“Starting workers”,“controller”:“psmdbbackup-controller”,“worker count”:1}
E0817 10:42:41.055838 1 leaderelection.go:340] Failed to update lock optimitically: Put “https://10.0.0.1:443/apis/coordination.k8s.io/v1/namespaces/psmdb-operator/leases/08db0feb.percona.com”: context deadline exceeded, falling back to slow path
E0817 10:42:41.055947 1 leaderelection.go:347] error retrieving resource lock psmdb-operator/08db0feb.percona.com: client rate limiter Wait returned an error: context deadline exceeded
I0817 10:42:41.055963 1 leaderelection.go:285] failed to renew lease psmdb-operator/08db0feb.percona.com: timed out waiting for the condition
{“level”:“debug”,“ts”:1723891361.056032,“logger”:“events”,“msg”:“psmdb-operator-7dcdd78d99-f7n86_aca193f9-5ee0-484d-b50e-3ee0d499819b stopped leading”,“type”:“Normal”,“object”:{“kind”:“Lease”,“namespace”:“psmdb-operator”,“name”:“08db0feb.percona.com”,“uid”:“ae92da5a-afd1-4749-a980-258e93120b61”,“apiVersion”:“coordination.k8s.io/v1",“resourceVersion”:“474711”},“reason”:"LeaderElection”}
{“level”:“error”,“ts”:1723891361.0560184,“logger”:“setup”,“msg”:“problem running manager”,“error”:“leader election lost”,“stacktrace”:“main.main\n\t/go/src/github.com/percona/percona-server-mongodb-operator/cmd/manager/main.go:161\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271”}
Actual Result:
Operator pod crash loop back off
Additional Information:
Any idea how to fix this issue?
Is this operator version “1.16.3” production ready?