Upgrade operator from 1.12.0 to 1.14.0 Reconciler error

Hi, I’m trying to upgrade our Percona operator for MongoDB from 1.12.0 to 1.14.0. The operator is deployed using the 1.12.1 → 1.13.3 → 1.14.2 Helm charts. The PSMDB which the operator is responsible for is 2 shards, each with 3 mongods, MongoDB version 4.4.15.
This is running on an EKS k8s cluster v1.21.14.
I’ve done the following operations to upgrade:

  • Update the operator helm chart version from 1.12.1 to 1.13.3.
  • Apply server-side the CRDs to 1.13.3.
  • Change crVersion of the PSMDB to 1.13.0

This upgrade was successful. Afterward, I tried to upgrade to 1.14:

  • Update the operator helm chart version from 1.13.3 to 1.14.2.
  • Apply server-side the CRDs to 1.14.2.
    At that point, even though the cluster seems to be fine and responsive, the operator is getting into an endless loop of the following kind of errors:
2023-05-02T11:50:10.179Z	ERROR	failed to reconcile cluster	{"controller": "psmdb-controller", "object": {"name":"mongodb-raw-qa","namespace":"mongodb-raw-cluster"}, "namespace": "mongodb-raw-cluster", "name": "mongodb-raw-qa", "reconcileID": "aefd3f93-f3f7-4cb1-9d94-338c0d1751e0", "replset": "rs1", "error": "failed to get mongos connection: ping mongo: timed out while checking out a connection from connection pool: context deadline exceeded; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 1", "errorVerbose": "timed out while checking out a connection from connection pool: context deadline exceeded; maxPoolSize: 100, connections in use by cursors: 0, connections in use by transactions: 0, connections in use by other operations: 1\nping mongo\ngithub.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo.Dial\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/psmdb/mongo/mongo.go:64\ngithub.com/percona/percona-server-mongodb-operator/pkg/psmdb.MongosClient\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/psmdb/client.go:70\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).mongosClientWithRole\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/connections.go:30\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:172\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:487\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\nfailed to get mongos connection\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).reconcileCluster\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/mgo.go:174\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:487\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"}
github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile
	/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:489
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:122
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:323
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.4/pkg/internal/controller/controller.go:235

The error is showing on both replsets, and also on starting the balancer:

failed to start balancer: failed to get mongos connection: ping mongo: timed out while checking out...

I’m able to connect to the mongods and to the mongos and get data, so not sure what might the problem be. Would love your help, thanks!

3 Likes

Update: I’ve discovered that the same problem is occurring even if not upgrading from the previous versions. Creating the PSMDB in chart version 1.14.1, operator chart version 1.14.2 anew still produces the same kind of errors as mentioned above, so it’s not related to the actual upgrade from previous versions, but to the operator version itself.
Would really appreciate any leads on that, thank you

2 Likes

What status does it show for the custom resource perconaservermongodbs?

@sohahm The status keeps alternating between ready and initializing.
Since my last update we’ve isolated the problem to be the field sharding.mongos.expose.exposeType. When we change the PMSDB’s field to be ClusterIP instead of LoadBalancer, the error messages stop.
We suspect it might be a bug in the new operator version, so we’ve created a Jira bug with more details on the problem here: [K8SPSMDB-925] Operator Errors in the Operator Log when using Mongos with servicePerPod and LoadBalancer services - Percona JIRA