MongoDB operator creates/overwrites "external" secret by it self

Environment:
Kubernetes 1.20.5
Mongodb operator 1.11
Mongodb 4.4.10
Deployment of op and db via Helm chart

Config:
simple replicat set with 3 nodes (no shards)
user-secret provided by “external secrets” (values from Vault) under db deployment

Problem:
The operator deployment works without any issues.
Deployment of the database works as well.
BUT:
The operator mostly creates the external secret itself with wrong values (hardcoded users and random passwords from percona-server-mongodb-operator/secrets.go at main · percona/percona-server-mongodb-operator · GitHub, line 79 etc)

This leads to a running cluster with “wrong passwords and users” where we ran into login problem after a restore (of course).

Question:
Why is the secret created at all when an external secret is defined ?
It is not clear to us and from our point of view, this should never happens.

It seems to be a timing problem when the secret is not in place when the operator starts deploying the database. When deploying the secret manually in advance, the beahviour is correct (what we could test).
It takes a few second for external secret to provide the secret. Sometimes, the operator states “secret not found” in the logs which is correct, but mostely, the operator creates the secret itself.

This behaviour prevents us from auto-deployment via pipelines, too, because the operator is not cluster-wide-aware yet and has to be deployed per db instance (where each runs in it’s own namespace).
Any suggestion very welcome.
Thx in advance
/Frank

2 Likes

We had a separate conversation with @frank2b about it.
I will try to explain the problem here with examples.

If secrets.users is defined:

  secrets:
    users: my-cluster-name-secrets

The Operator is going to check if secret my-cluster-name-secrets is in place, if not - it is going to create it with random user credentials.

But the problem appears, when you use gitops or any IaaC tooling, where you do not have control over the ordering of object creation. So you want to create a secret and deploy the database, but for some reason secret is created 2 seconds after the Custom Resource. This is a race condition.

I assume the desire here is the following behavior for the situation when secret is specified in cr.yaml:

  • Operator checks if secret is in k8s
  • if not - it waits for it (checks every reconcile loop)

We can make this controllable with some flag not to introduce breaking changes into current behavior.

P.S. it is valid not only for users, but for other secret objects.

Would be great to hear community thoughts about this problem.

2 Likes

I have run into the same problem with our deployment. Our administrators deploy our system in order using helm which deploys a set of secrets used across the platform. When we deploy mongo (via the operator) we reference the existing secret. Everything works fine until they delete the mongo deployment, then when updating our secrets (via helm) they fail because mongo has delete our labels so it won’t change the existing secret.

I would love to know any work arounds other than deleting the secret after the deletion of the mongodb deployment.

seems we have a similiar problem with creating the tls-certs: Is it possible to rotate MongoDB user/password without interrupting the operator?

I assume the desire here is the following behavior for the situation when secret is specified in cr.yaml:

Operator checks if secret is in k8s
if not - it waits for it (checks every reconcile loop)

:+1: