How to force a rolling update / restart on the PerconaMongoDB cluster

Currently we have the issue with Operator 1.7 that we have a lot of collections and index files open. After 100’000s (27.7h) the file_manager in MongoDB starts to act and release the files. So if 200’000 files are open and according to the file_manager config (hardcoded by Percona) it releases down to 250 files, MongoDB tries to release them basically at once which causes the liveness probe to fail (I’ve tuned it already to the max). So that means all 3 pods of the replica set get killed and the startup takes like 1h to read all the files again.

Our current workaround is to change a value in cr.yaml and apply it (kubectl apply -f my_changed_cr.yaml). The mongod pods restart each after the other - so no interruption. Also the file_manager counter gets resetted and the replica set runs stable for the next 24h

Since this is a little bit painful, is there a way how we can apply a value to the PerconaMongoDB specifially with a kubectl patch or so ? How can we set a specific value in “PerconaMongoDB” … I mean I just dont get how to access PerconaMongoDB in order to set for example the request / cpu.

Or does a command exist, where I can say “pls safely restart all my mongod pods w/o service interruption”

Just to get an idea …

kubectl patch statefulset xxx-mongodb-cluster-rs0 -p '{"spec":{....}}'

but the issue is, it seems it has to go to the PerconaMongoDB, if patching the statefulset then there is no PRIMARY/SECONDARY selection going on and therefore the cluster is at some point unavailable.

replicaset.apps/xxx-mongodb-cluster-mongos-664db85886 3 3 3 16d
replicaset.apps/percona-server-mongodb-operator-586b769b44 1 1 1 63d
statefulset.apps/xxx-mongodb-cluster-cfg 3/3 16d
statefulset.apps/xxx-mongodb-cluster-rs0 3/3 16d

1 Like

May be this example helps:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: restart-mongodb01-cluster-cronjob
  namespace: ns-mongodb-01
spec:
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      activeDeadlineSeconds: 600
      backoffLimit: 2
      template:
        spec:
          containers:
          - command:
            - kubectl
            - rollout
            - restart
            - psmdb/xxx-mongodb01-cluster
            image: bitnami/kubectl
            name: kubectl
          restartPolicy: Never
          serviceAccountName: percona-server-mongodb-operator
  schedule: "* */2 * * *"

So if this runs, then the CronJob logs following error

error: no kind "PerconaServerMongoDB" is registered for version "psmdb.percona.com/v1" in scheme "k8s.io/kubectl/pkg/scheme/scheme.go:28"

Since the CRDs are there I dont understand what the problem is ?

1 Like

for example if you need to “restart” via a cron job

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: restart-mongodb-cronjob
  namespace: dev-env
spec:
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      activeDeadlineSeconds: 600
      backoffLimit: 2
      template:
        spec:
          containers:
          - args:
            - 'MY_CPU=$(expr 400 + $RANDOM % 100)m;echo $MY_CPU;kubectl -n dev-env patch psmdb luz-mongodb-cluster --type=''json'' -p=''[{"op": "replace", "path": "/spec/replsets/0/resources/requests/cpu", "value":"''$MY_CPU''"}]'''
            command:
            - /bin/sh
            - -c
            image: bitnami/kubectl
            name: kubectl
          restartPolicy: Never
          serviceAccountName: percona-server-mongodb-operator
  schedule: '0 0 * * *'

1 Like

@jamoser thank you for sharing the solution!

But I have a feeling that this is a patch-approach. The root cause is still there.
I’ll talk to our MongoDB team to see what can be done here.

What do you think?

1 Like

I’ll talk to our MongoDB team to see what can be done here.

Thanks … the above solution is really a hack (but works very nicely on PROD).