Controller.psmdb-controller Reconciler error

Hi guys.

Recently I installed the Mongo Percona Operator 1.13 version on Openshift and almost montly just the mongos instances is lost connection in database.
Reading Percona Operator logs I saw this:

*2023-04-24T16:58:04.386Z ERROR controller.psmdb-controller Reconciler error {“name”: “mongo-cluster”, “namespace”: “mongodb-ebagagem”, “error”: “reconcile StatefulSet for cfg: failed to run smartUpdate: failed to stop balancer: failed to get mongos connection: ping mongo: server selection error: context deadline exceeded, current topology: { Type: Unknown, Servers: [{ Addr: mongo-cluster-mongos.mongodb-ebagagem.svc.cluster.local:27017, Type: Unknown }, ] }”, “errorVerbose”: "reconcile StatefulSet for cfg: failed to run smartUpdate: failed to stop balancer: failed to get mongos connection: ping mongo: server selection error: context deadline exceeded, current topology: { Type: Unknown, Servers: [{ Addr: mongo-cluster-mongos.mongodb-ebagagem.svc.cluster.local:27017, Type: Unknown }, ] }\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermong…

*46sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem

47/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266

*48sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2

this seems to be causing a series of cascading errors, ending at mongos pod, where the database turn down. Log mongos:

“id”:20883, “ctx”:“conn18227628”,“msg”:“Interrupted operation as its client disconnected”,“attr”:{“opId”:51384861}}

15335{“t”:{“$date”:“2023-04-24T20:09:20.010+00:00”},“s”:“I”, “c”:“NETWORK”, “id”:22944, “ctx”:“conn18227629”,“msg”:“Connection ended”,“attr”:{“remote”:“127.0.0.1:52016”,“uuid”:“53938424-f855-4643-b99a-faa1f69e5e55”,“connectionId”:18227629,“connectionCount”:2}}

Event mongos:

(combined from similar events): Readiness probe failed: {“level”:“info”,“msg”:“Running Kubernetes readiness check for mongos”,“time”:“2023-04-24T20:07:31Z”} {“level”:“error”,“msg”:“Member failed Kubernetes readiness check: run listDatabases: (Unauthorized) command listDatabases requires authentication”,“time”:“2023-04-24T20:07:31Z”}

Is there anybody can help me?

Regards.

Well, even that no one answered me I would like to share the solution in this case.

I figure out that the error was ocurring because during the cluster instalation I make a mistake creating other secret. So a mongo namespace had two or more secret to the users environment. Sometimes for some reason the operator confused what secret file to use as reference. Then the database service stopped.

The solution was delete all secrets leaving just the secret created during installation by default.
I also had to change at instance mongo yaml from operator, appointing what secret I should to use.

Thank you for sharing the solution @fjareta! Real community spirit!

I’m not sure though how operator can pickup some random secret. Seems you used the name that is used by default in the operator or specified one in Custom Resource.