Hello,
I am testing the mongodb operator with 3 replica sets. I have created a new empty cluster with 3 replica sets. Then on purpose I bash in one of the replica set pods and delete some files inside /data/db to simulate disk corruption on that pod. The pod goes in “CrashLoopBackOff” status and never recovers.
NAME READY STATUS RESTARTS AGE
pod/my-cluster-name-rs0-0 1/1 Running 0 55m
pod/my-cluster-name-rs0-1 1/1 Running 0 53m
pod/my-cluster-name-rs0-2 0/1 CrashLoopBackOff 12 (62s ago) 52m
pod/percona-client 1/1 Running 0 130m
pod/percona-server-mongodb-operator-5dd88ff7f7-pxrs8 1/1 Running 0 142m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 443/TCP 3h10m
service/my-cluster-name-rs0 ClusterIP None 27017/TCP 142m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/percona-server-mongodb-operator 1/1 1 1 142m
NAME DESIRED CURRENT READY AGE
replicaset.apps/percona-server-mongodb-operator-5dd88ff7f7 1 1 1 142m
NAME READY AGE
statefulset.apps/my-cluster-name-rs0 2/3 142m
I assumed, when the operator detects that one pod is crashing, it will reinitialize it and replicate the data from the working pods. But this pod stays CrashLoopBackOff state and does not recover.
Is there any way to configure the operator, if some replica set pod is crashing several times, to re initiate it automatically ? In this case if I delete the pod manually, it will be crated again in Ready state.