Non-sharding mode, how to keep the pods "sticky" to a node to avoid complete outage


It’s more a Kubernetes question …

Assuming you run the MongoDB cluster on multiple nodes. Now there are a lot of writes which all go to the master/writer pod of the Statefulset. In some cases Kubernetes might try the reschedule the “writer pod” to a different node, if it creates too much load on the node. The pod gets rescheduled and the “Writer” is moved to another pod - but the problem arises again, rescheduling … and so forth until ALL of the pods got rescheduled. Since the startup of each pod/mongdb takes some time, at some point the whole cluster is NOT AVAILABLE any more.

Is there an elegant and RECOMMENDED way to make the pods sticky so that the other pods on the node get rescheduled.

Hey @jamoser ,

well, there are few ways that I can think of, but there are multiple cases.

  1. If the Pod is getting killed by OutOfMemory killer (OOM), than it is not much you can do except ensuring that it has enough resources.
  2. If the node goes under pressure and starts evicting Pods, the eviction does not takes into account Disruption budgets or grace termination periods. But at the same time, again it would first evict Burstable pods. So the recommendation here would be to use guaranteed pods (requests = limits) for mongodb.

Hope this helps. I’m open to jump into a call to discuss it. You know where to find me :wink: