Startup of replicaset takes very long

Hello

I have replicas = 3 and during startup, why does one pod after the other get started?

Lets say 1 pod takes 30min (after a “forced” shutdown), then you have to wait 3x30min. Any other options?

Regards
John

Hey @jamoser .

why does one pod after the other get started

It is the default behaviour of a StatefulSet. It uses OrderedReady Pod Management.

It is possible to change it by setting .spec.podManagementPolicy. But:

  1. I’m not sure that it is safe for replica set. I will need to consult with our experts.
  2. I don’t see it being supported in the Operator right now (you can’t set this option in the custom resource).

But also I’m curious - why it takes 30min to start a single Pod? What is the main driver there?

Hello

There is the functionality

pause: true | false

to shutdown the cluster / start the cluster. Sometimes (and very often) it just does not work. The only way how to “gracefully” shutdown is, to set the replicas on the kubernetes statefulset to → 0

I would assume that the newly introduced terminationGracePeriod (on Percona level) would be active and therefore wait until MongoDB is able to shut down - seems it’s not the case.

Also it seems you do not do a flush before the MongoDB shutdown.

This all results that the MongoDB in the new pod has first do a crash recovery. This can take in our case for ex. on a balanced disk with 100’000 collections, very very long.

So imo something is not clean regarding shutdown of the replica pods.

Regards
John

1 Like

Hi @jamoser Do you use custom value for terminationGracePeriod or not?

I would assume that there is a default value (hopefully the same value as in kubernetes = 30 sec). And that is sufficient so that all pods resp. mongodb’s shutdown properly. I can see that in the logs.

Hi, on one hand, you can see that MongoDB is shutting down properly, but at the same time, the database is going through crash recovery. Am I right?

Yes - her an example of a cluster which was idle for more then 1 week. Shutdown sequence (rs 0):

[I will provide a better example]