Also, we use cert-manager/step-issuer with our own CA to create tls-certificates.
As we are using gitops, we have no influence about which objects are created first and which one later. So I think we run into a kind of “race condition” when deploying our cluster:
Sometimes the operator creates certificates/secrets with its own CA - and then starts to deploy the replicaset.
“Later”, the cert-manager/step-issuer picks up my certficates and overwrites the secrets. (The result of the secrets is similar to Transport encryption (TLS/SSL) - Percona Operator for MongoDB) .
When this happens, the creation of the cluster by the operator stucks - because the secrets are changing while the operator wants to deploy the rest of the cluster. Mongos are not created then.
When deploying the certificates first (and be sure the cert-manager creates the secrets first) - and the customResource afterwards , there is no problem.
But, as just said, this is not possible when using gitops, as we have no impact about that.
What’s the way to go here?
Is there a possibility to disable creation of the secrets by the operator?
We reference the tls-secrets in the cr.yaml under secrets:, like
What is the recommended way for the naming of these tls-secrets? Do the names have to have a “reference” to the name of the CR?
Steps to Reproduce:
Deploy a custom resource for psmdb. Then create TLS-certfificates/secrets with the same name used by the operator (overwrite them) or use cert-manager with a third party issuer to do that.
b. mongos deployment is not even created
c. Logs in operator pod showing the following error:
2023-11-20T09:37:19.182Z ERROR Reconciler error {"controller": "psmdb-controller", "object": {"name":"my-cluster-name","namespace":"default"}, "namespace": "default", "name": "my-cluster-name", "reconcileID": "fe6adb10-d63c-4d4f-9d73-3ef69f6ddd86", "error": "reconcile StatefulSet for cfg: failed to run smartUpdate: failed to check active jobs: getting PBM object: create PBM connection to my-cluster-name-rs0-0.my-cluster-name-rs0.default.svc.cluster.local:27017,my-cluster-name-rs0-1.my-cluster-name-rs0.default.svc.cluster.local:27017,my-cluster-name-rs0-2.my-cluster-name-rs0.default.svc.cluster.local:27017: get config server connection URI: mongo: no documents in result", "errorVerbose": "reconcile StatefulSet for cfg: failed to run smartUpdate: failed to check active jobs: getting PBM object: create PBM connection to my-cluster-name-rs0-0.my-cluster-name-rs0.default.svc.cluster.local:27017,my-cluster-name-rs0-1.my-cluster-name-rs0.default.svc.cluster.local:27017,my-cluster-name-rs0-2.my-cluster-name-rs0.default.svc.cluster.local:27017: get config server connection URI: mongo: no documents in result\ngithub.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb.(*ReconcilePerconaServerMongoDB).Reconcile\n\t/go/src/github.com/percona/percona-server-mongodb-operator/pkg/controller/perconaservermongodb/psmdb_controller.go:423\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.1/pkg/internal/controller/controller.go:227
This problem is only reproducible when you deploy change the certificate in flight, when the cluster is still initializing. If you change the cert after it is ready - it is fine.
@rdxmbr which gitops solution do you use?
afaik argocd has waves that allow you to define the order of resource creation.
Meanwhile we will look into it and see what can be done.