GKE / MongoDB cluster under "stress" not accepting changes from cr.yaml

jamoser · April 30, 2025, 7:43am

Hello

Just recently a MongoDB cluster did not have enough memory and failed right after the start when the first request hit the cluster. And the pods restarted “endlessly”.

I adjusted memory setting in cr.yaml and applied. But the changes re. memory were not reflected. And the pods kept going on to restart due to OOM.

If you do this with kubernetes deployment/statefulset/etc - the changes get immediately accepted and that is how pods under stress (for ex. OOM) can be rescued.

But it seems this is NOT the case with the Percona MongoDB cluster.

This issue I have noticed a couple of years ago. And it is still there - sharding and not sharding variants.

The only way to get this resolved is to set replicas = 0 (on kubernetes level !) and then set replicas = 3. But a) you have an interruption and b) currently there is still a problem of the unclean shutdown which causes a recovery when pod/mongodb engine is starting up - and this can take “hours”.

Regards
John

Slava_Sarzhan · April 30, 2025, 9:50am

Hi @jamoser, how can I reproduce it? Please provide STR.

jamoser · April 30, 2025, 2:48pm

Hello

You can simulate it very simply. Have a cluster with some meaningful amount of data and then set memory limit too low. Then the pods will crash due to OOM. In kubernetes you will see the restart count going up.

After like 20-30 restarts (and having queries hitting the cluster), try to apply the memory settings (incr memory to previous stable).

In many cases it fails to apply the new memory settings. It “feels” like the Percona custom resource / Operator prefers “healing” the cluster instead of applying the memory settings to the pods (which would heal the cluster).

Regards
John

Topic		Replies	Views
Percona Operator for MongoDB not able to apply changes on running instance Percona Operator for MongoDB	16	2877	October 19, 2023
Percona Cluster crashed and does not want to startup Percona Operator for MongoDB	4	800	July 14, 2021
Mongodb node loops on restart with an OOM Percona Operator for MongoDB	2	289	July 8, 2024
Pods occasionally fail readiness check, can't find out why, but cluster otherwise works? Percona Operator for MongoDB percona , mongodb	4	156	March 14, 2025
How to force a rolling update / restart on the PerconaMongoDB cluster Percona Operator for MongoDB	4	1576	November 3, 2021

GKE / MongoDB cluster under "stress" not accepting changes from cr.yaml

Related topics