MongoDB operator creates/overwrites "external" secret by it self

Kubernetes 1.20.5
Mongodb operator 1.11
Mongodb 4.4.10
Deployment of op and db via Helm chart

simple replicat set with 3 nodes (no shards)
user-secret provided by “external secrets” (values from Vault) under db deployment

The operator deployment works without any issues.
Deployment of the database works as well.
The operator mostly creates the external secret itself with wrong values (hardcoded users and random passwords from percona-server-mongodb-operator/secrets.go at main · percona/percona-server-mongodb-operator · GitHub, line 79 etc)

This leads to a running cluster with “wrong passwords and users” where we ran into login problem after a restore (of course).

Why is the secret created at all when an external secret is defined ?
It is not clear to us and from our point of view, this should never happens.

It seems to be a timing problem when the secret is not in place when the operator starts deploying the database. When deploying the secret manually in advance, the beahviour is correct (what we could test).
It takes a few second for external secret to provide the secret. Sometimes, the operator states “secret not found” in the logs which is correct, but mostely, the operator creates the secret itself.

This behaviour prevents us from auto-deployment via pipelines, too, because the operator is not cluster-wide-aware yet and has to be deployed per db instance (where each runs in it’s own namespace).
Any suggestion very welcome.
Thx in advance

1 Like

We had a separate conversation with @frank2b about it.
I will try to explain the problem here with examples.

If secrets.users is defined:

    users: my-cluster-name-secrets

The Operator is going to check if secret my-cluster-name-secrets is in place, if not - it is going to create it with random user credentials.

But the problem appears, when you use gitops or any IaaC tooling, where you do not have control over the ordering of object creation. So you want to create a secret and deploy the database, but for some reason secret is created 2 seconds after the Custom Resource. This is a race condition.

I assume the desire here is the following behavior for the situation when secret is specified in cr.yaml:

  • Operator checks if secret is in k8s
  • if not - it waits for it (checks every reconcile loop)

We can make this controllable with some flag not to introduce breaking changes into current behavior.

P.S. it is valid not only for users, but for other secret objects.

Would be great to hear community thoughts about this problem.

1 Like