Hey guys.
I have in my Openshift environment the percona mongodb operator installed in version 1.12.1 that an error occurs almost monthly where only the mongos pods present a strange error that seems to be related to some mongo user not being able to list or read the databases inside the pod.
I searched about this error in the forum and in the releases as well but didn’t find anything very specific. Does anyone know what this error could be?
I have a percona mongo operator on version 1.11.0 and I don’t have this kind of problem on another cluster.
I installed version 1.13.1 on another newer cluster and so far I haven’t had the same issue.
Regards.
Updating the topic, I realized that the error probably started from Percona Operator.
Reading the log I saw that Mongo-cfg Pod changed the name for some reason:
/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
502023-04-24T16:58:04.388Z INFO controller_psmdb StatefulSet is changed, starting smart update {“name”: “mongo-cluster-cfg”}
*512023-04-24T16:58:14.461Z ERROR controller.psmdb-controller Reconciler error {“name”: “mongo-cluster”, “namespace”: “mongodb-ebagagem”, “error”: “reconcile StatefulSet for cfg: failed to run smartUpdate: failed to stop balancer: failed to get mongos connection: ping mongo: server selection error: context deadline exceeded, current topology: { Type: Unknown, Servers: [{ Addr: myservice:port, Type: Unknown }, ] }”, “errorVerbose”: "reconcile StatefulSet for cfg: failed to run smartUpdate: failed to stop balancer: failed to get mongos connection: ping mongo: server selection error: context deadline exceeded, current topology: { Type: Unknown, Servers: [{ Addr: myservice:port, Type: Unknown }, ] }\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermong…
*52sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem
53/go/src/github.com/percona/percona-server-mongodb-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
*54sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2
Overall, the error persists at 1.13 version.
Well, even that no one answered me I would like to share the solution in this case.
I figure out that the error was ocurring because during the cluster instalation I make a mistake creating other secret. So a mongo namespace had two or more secret to the users environment. Sometimes for some reason the operator confused what secret file to use as reference. Then the database service stopped.
The solution was delete all secrets leaving just the secret created during installation by default.
I also had to change at instance mongo yaml from operator, appointing what secret I should to use.